CN112862715A - Real-time and controllable scale space filtering method - Google Patents

Real-time and controllable scale space filtering method Download PDF

Info

Publication number
CN112862715A
CN112862715A CN202110172012.2A CN202110172012A CN112862715A CN 112862715 A CN112862715 A CN 112862715A CN 202110172012 A CN202110172012 A CN 202110172012A CN 112862715 A CN112862715 A CN 112862715A
Authority
CN
China
Prior art keywords
image
network
filtering
edge image
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110172012.2A
Other languages
Chinese (zh)
Other versions
CN112862715B (en
Inventor
郭晓杰
付园斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202110172012.2A priority Critical patent/CN112862715B/en
Publication of CN112862715A publication Critical patent/CN112862715A/en
Application granted granted Critical
Publication of CN112862715B publication Critical patent/CN112862715B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • G06T5/70
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Abstract

The invention discloses a real-time and controllable scale space filtering method, which outputs a plurality of filtering results with different scales circularly by designing a recurrent neural network model. The cyclic neural network is to obtain a new output by taking the output of the neural network as the input of the network again, and then obtain a new output again by taking the new output as the input of the neural network again, so as to reciprocate until the condition of cycle termination is met. The recurrent neural network model includes a guide network G and a stripping network P.

Description

Real-time and controllable scale space filtering method
Technical Field
The invention relates to the field of image processing based on deep learning, in particular to a real-time and controllable scale space filtering method.
Background
The existing traditional methods such as L0[1], RGF [2], RTV [3], muGIF [4], etc. need to go through many iterative optimization operations to obtain reasonable filtering results, resulting in very slow speed and consuming a lot of time during execution; the deep learning method [13] [14] [15] [16] can solve the disadvantage of slow speed of the traditional method, because the deep neural network only needs one-time forward propagation operation in the testing stage, but the current deep learning method can only obtain a filtering result of one scale, cannot obtain a plurality of images of different scales at one time, and needs to set different hyper-parameters and retrain for the smoothing results of different scales. PIO [5] can adjust the hyper-parameters of the network to obtain filtering results of different scales without retraining the neural network. However, for all the conventional methods and deep learning methods at present, the manner of obtaining filtering results of different scales needs to be implemented by adjusting hyper-parameters (such as weights of different loss functions, number of layers of a network, and other parameters) of a model, and is not intuitive enough. Because the hyper-parameter of the model is only a number and a scalar, the spatial and semantic information of the image cannot be reflected intuitively. In other words, the hyper-parameters cannot intuitively control which parts of the image need to be smoothed and which parts should be maintained.
The importance of multi-scale representation of images has long been well-validated, thereby leading to the idea of scale-space filtering. Similar to the human eye observing an object, the perceived features are not the same when the distance from the object is different, i.e. for the same object in the visual range, the features are not the same when the imaged size is different, i.e. the scale is different. An intuitive example of the scale space is different scales of a map, a map with a large scale can know the position information of a province or even a country, and a map with a small scale can only know the position information of a town in detail. According to human vision mechanism and scale space theory, people usually obtain different information on different scales, but when an unknown scene is analyzed by a vision system of a computer, the computer has no way to know the scale of an object in an image in advance, so that the description of the image under multiple scales needs to be considered at the same time to know the optimal scale of the object of interest. Many times, therefore, the images are constructed as a series of image sets of different scales in which features of interest are detected. In short, a larger scale provides more structural information about a scene, while a smaller scale reflects more texture information.
In various existing documents, various image filtering methods attempt to decompose an image into single-scale structural and texture portions. However, for different images and tasks, it is difficult to have a reasonable way to determine which scales are correct or best. Rather than eliminating the size ambiguity and finding an optimal image separation strategy, the images are separated in an organized, natural and efficient manner to obtain results that vary in size for the user to pick. Establishing a multi-scale representation of an image should be a very useful function, as users wish to adjust and select the most satisfactory results in various multimedia, computer vision and graphics applications. Applications of scale-space filtering in multimedia, computer vision, and graphics include image restoration, image stylization, stereo matching, optical flow, and semantic flow, among others.
Over the past few decades, the computer vision and multimedia fields have increasingly focused on organizing images hierarchically or multi-dimensionally through research that has perceived the human eye as an external world principle. Taking image segmentation as an example, an image may be spatially segmented into a set of object instances or superpixels, which may serve as a basis for subsequent further processing. In contrast to image segmentation, another hierarchical image organization scheme is being explored from the perspective of information extraction. This task is called image separation to distinguish it from image segmentation.
The work of Image smoothing via unsupervised learning explores how to directly learn data by using a deep neural network (without supervising data) to generate an ideal filtering effect. The advantage of using deep learning is that it can not only produce good filtering effect, but also have faster speed. However, deep learning often requires real filtered images (GT) as supervised data during training, and these supervised data are often difficult to obtain. Manually labeling the supervised data of the training images is time consuming and laborious. To solve this problem, this work designed the training objective as a loss function, similar to optimization-based methods, to train deep neural networks in an unsupervised, label-free situation. The major contributions of this work can be summarized as:
(1) an unsupervised image smoothing framework is proposed. Unlike previous approaches, the proposed framework does not require real tags for supervised training and can achieve the desired results by learning from any sufficiently diverse set of image data.
(2) Can obtain filtering results equivalent to or even better than the flag drum of the prior method, and designs a new image smoothing loss function which is based on a spatial adaptive Lp flat criterion and an edge image keeping regularizer.
(3) The proposed method is based on a convolutional neural network, which is far less computationally intensive than most previous methods. For example, processing a 1280 × 720 image on a modern GPU requires only 5 ms.
The specific method of this work is to take the original image as the input of the deep neural network, and the deep convolutional neural network outputs the smoothing result of the original image. This work designed a full convolution neural network with hole convolution and hopping connections. This network contains a total of 26 convolutional layers, all of which use a 33 convolution kernel and output 64 feature maps (except the last layer of the network which outputs 3 channels of images), all of which are followed by a batch normalization layer (batch normalization) and a ReLU activation function. The third convolutional layer down-samples the feature map to half the size using a convolution with step size 2, and the third to last convolutional layer restores the feature map to the same size as the input image using a deconvolution.
Image smoothing generally requires acquiring information about the context of an image, and this work gradually enlarges the image's field of view by introducing a hole convolution with an exponentially increasing dilation rate. More specifically, each two adjacent residual blocks have the same expansion ratio, and the expansion ratio of the last two residual blocks is multiplied by two.
The network structure of this work also adopts a residual learning mode, namely, the last layer of the website outputs a residual image, and the addition of the residual image and the input image is the final filtering result.
The purpose of image smoothing is to reduce unimportant image details while maintaining the original image structure, and this works as the overall loss function formula for image smoothing as follows:
ε=εdf·εfe·εe
wherein epsilondIs a data retention term, εfIs a regular term, epsiloneThe edge image retention term is a weight that balances the different penalty function terms.
The data retention term minimizes the difference between the input image and the output filtered image to ensure structural similarity. An input image is denoted by I, an output image is denoted by T (note that T here means different meaning from T in the present embodiment, and T means that the output image is valid only in the section of the conventional embodiment), and in the RGB color space, a simple data holding item can be defined as:
Figure BDA0002939014330000031
where i represents the index of the pixel and N is the total number of all pixels. During the filtering process, some important edge images may be lost or weakened in the smoothing process because the object of pixel value smoothing and the object of edge image retention have a certain degree of conflict. To address this problem, this work proposed an explicit edge image retention constraint that preserves the important edge image pixels. Before this constraint is put forward, the concept of a guide image is first introduced, which refers to the edge image response of an image in appearance. One simple form of edge image response is the local gradient values:
Figure BDA0002939014330000032
wherein, n (i) represents the domain of the ith pixel point, and c represents the c color channel. For output filtering result T sideThe edge image response may also be denoted as e (t). The edge image retention term is defined by minimizing the difference in edge image response of the leading edge images e (i) and e (t). Let B be a binary image, where B i1 denotes an important edge image point, and the edge image retention term is defined as:
Figure BDA0002939014330000033
herein, the
Figure BDA0002939014330000034
Is the total number of all significant edge image points. The definition of the important edge image is more subjective and diverse, and needs to depend on different application scenes. An ideal way to obtain the binary map B would be to use manual tagging of user preferences. However, manual marking at the pixel level is quite time consuming and laborious. This work uses existing edge image detection techniques to obtain B. This process is not described in excess because it is not a major contribution to the work. Given enough training images and edge images B, the depth network will explicitly learn the information of the significant edge images by minimizing the edge image retention terms and reflect the features of the significant edge images into the filtering results.
To achieve better quality and greater flexibility, this work proposed a new smoothing/flattening term and a spatially variable Lp norm. To remove unwanted image details, the smoothing or flattening term ensures the smoothness of the filtering result by penalizing the gradients between adjacent pixels:
Figure BDA0002939014330000041
wherein N ish(i) Representing adjacent pixels in a h window around the ith pixel point, wi,jIs the weight of each pixel pair. Weight wi,jThe calculation can be carried out through the spatial position relation or the pixel value relation, and the calculation modes are respectively as follows:
Figure BDA0002939014330000042
Figure BDA0002939014330000043
wherein σrAnd σsC denotes the channel of the image, and x and y are the coordinates of the pixel, respectively, for calculating the standard deviation of the gaussian function of the pixel values and the spatial position differences. It is not easy to determine the P value of Lp. To locate in the algorithm which regions in the image use what p-values, this work uses edge image guided images to define the p-value of each image pixel i and its corresponding weight as:
Figure BDA0002939014330000044
wherein p islarge2 and psmall0.8 is two values of p, c1And c2Are two non-negative thresholds. It can be seen that the value of p is not determined by the input image, but is conditioned on the output filtering result.
The reason for determining the p-value in this way is that when minimizing the loss function, L is used first0.8Norm until p is due tosmallApplying L again, which results in some over-sharpened artifacts in the output image due to the 0.8 regularization term2The norm suppresses the artifact. These pseudo-structures are identified as: the edge image response of pixel I on the original image I is low (e.g., E)i(I)<c1Shown) but in the output image T (E)i(T)-Ei(i)>c2) A significantly enhanced structure without L2The norm can achieve a strong smoothing effect, but also produces stair-stepping artifacts. On the other hand, in the absence of L0.8Norm due to L2Due to the existence of the norm, an optimized image is very fuzzy, and a plurality of important structures are not well reserved. In contrast, with a complete regularization termA better visual effect can be obtained.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a real-time and controllable scale space filtering method, belongs to a deep learning method, only needs one forward propagation operation during execution, is high in speed, and can obtain a plurality of filtering results of different scales of one image. The multi-scale filtering result is controlled by introducing the edge image of the image, and the intuitiveness is stronger than that of controlling the smoothing degree by only using the hyper-parameter.
The purpose of the invention is realized by the following technical scheme:
a real-time and controllable scale space filtering method is based on a recurrent neural network model consisting of a guide network G and a stripping network P, and comprises the following steps:
(1) using original image I as input of guide network G, utilizing guide network to output several edge images G with different scales of image in circulating modetThen, the edge image G of the image is processedtAnd the filtering result It-1Input together into the stripping network P; wherein t represents the t-th step of the cycle; t is 1,2,3 … … T, wherein I0I denotes the original drawing;
(2) stripping the network P at the edge image GtUnder the guidance of (2), outputting the next filtering result It
(3) Filtering result ItRecycled ground and edge image Gt+1Taken together as input to the stripping network to again obtain a new filtering result It+1Repeating the operation until the cycle number reaches the set total cycle number T; in which the filtering result I output by the stripping network P istAnd the input edge image GtMaintaining the same picture structure, i.e. at GtUnder the guide of (3), image peeling is performed.
Further, the peeling network P can peel off each component from the image hierarchically: the structure/edge image included in the filtering result of each step is a subset of the structure/edge image included in the filtering result of the previous step.
Further, in the above-mentioned case,for edge image GtOf the edge image, which is in the filtering result ItThe gradient of the pixels corresponding to the same position should remain unchanged; for GtOf the non-edge image, which is in the filtering result ItThe more thoroughly the corresponding co-located pixels are smoothed the better.
Further, an edge image GtCan be obtained by any one of the existing depth or non-depth edge image detection methods.
Further, the loop of each step in the filtering method is implemented by a single forward propagation operation or by iterating two or three steps of operations.
Furthermore, the stripping network P can supervise and learn any one of the existing filtering methods, and can also train in an unsupervised manner; at each step of the cycle, the core of the stripping network P is to accept the output I of the previous stept-1As input, at the edge image GtIs guided by (2) to perform image filtering, thereby obtaining a filtered image fromt-1Middle peeling off to obtaint
Further, the filtering result of the input image is obtained by setting a hyper-parameter, and the guide network G is used for establishing the hyper-parameter and the edge image GtThe relation between the two; setting different hyper-parameters for different scales, wherein each hyper-parameter corresponds to an edge image Gt
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
1. the invention formally defines a general image separation problem, and focuses on a specific member in an image separation series task by introducing a concept of scale space filtering: and (5) stripping the hierarchical image. And the initial image separation problem is simplified through theoretical analysis, so that the original complex problem is converted into a series of small sub-problems on the basis of the theoretical analysis, and the complexity is greatly reduced.
2. The method belongs to one deep learning method, only needs one forward propagation operation during execution, has high speed, and can obtain a plurality of filtering results of different scales of one image. The invention introduces the edge image of the image to control the multi-scale filtering result, and the intuitiveness is stronger than that of controlling the smoothing degree by only utilizing the hyper-parameter.
3. Compared with the numerical value hyper-parameter through adjusting the model, the invention adopts a more intuitive and flexible method to generate the filtering results with different scales, namely, the edge image with perception significance is adopted to replace the hyper-parameter.
4. Many of the tasks currently available expect the model to be able to perform in a real-time manner. As a core function of the model, in addition to useful functions, the efficiency of the model is also crucial. The invention designs a lightweight recurrent neural network model, namely a hierarchical image stripping network, so that the task of hierarchical image stripping can be efficiently and effectively completed, and the supervised and unsupervised conditions can be flexibly processed. The size of the image is about 3.5Mb, and on a GTX 2080Ti GPU, the speed of repeatedly processing 1080p images each time exceeds 60fps, so that the image processing method has very strong practical value.
5. The method can realize the hierarchical organization of the image, finally obtain the multi-scale representation of the image, and can acquire the interested information in different scales aiming at the image.
Drawings
FIGS. 1a and 1b are schematic diagrams illustrating a general image filtering; fig. 1c to 1g refer to schematic diagrams for performing scale-space filtering.
Fig. 2 is a schematic diagram of a framework structure and a working process of the recurrent neural network model in this embodiment. The numbers below each network block in the figure represent the number of channels output by the corresponding convolution module in the neural network, and the letters K, S and D represent the size of the convolution kernel, the step size of the convolution and the expansion rate of the hole convolution, respectively.
Fig. 3a to 3c show an input image, a guide map and a final result map during application of the method of the present invention, respectively.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Over the past few decades, the computer vision and multimedia fields have increasingly focused on organizing images hierarchically or multi-dimensionally through research that has perceived the human eye as an external world principle. Taking image segmentation as an example, an image may be spatially segmented into a set of object instances or superpixels, which may serve as a basis for subsequent further processing. Unlike image segmentation, this embodiment introduces another level of image organization from the perspective of information extraction. And is referred to as image separation to distinguish it from image segmentation.
The object of the invention is to hierarchically organize images resulting in a multi-scale representation of the images. Specifically, given an image I, a plurality of components C satisfying a hierarchical relationship are gradually peeled off from the imageiThe original image I is obtained by adding these components at the pixel level. Unlike image segmentation (considered from the overall space), the present embodiment decomposes an image into a series of components from the perspective of scale-space filtering. The term image filtering, also called image smoothing, refers to removing image texture while keeping the main structure of the image unchanged. The structure of the image is generally an edge image part and can reflect the overall contour and shape of an object in the image; the texture of an image refers to a visual pattern that is distributed within an object, repeating in a regular or irregular manner. Fig. 1a shows an example of image filtering, in which images of the edges of a dog, five sense organs of the dog, a rope pulling the dog and a square frame can be seen as the structure of the image, and dense and numb small squares distributed inside the body of the dog can be seen as texture details. It is a very subjective matter to determine whether a pixel in an image belongs to a structure or a texture, for example, in fig. 1a and 1b, the eyes and mouth inside a dog are determined to be a structure and thus are not smoothed but maintained, but the eyes and mouth of the dog can be determined to be texture smooth-out completely, and the two ways do not have a good or bad score depending on how the user wants to smooth the image, in other words, the result of image filtering is not single and can be selected in many different ways. As shown in the figure1c to 1g, the scale space filtering means that a plurality of filtering results with different scales and different degrees are obtained after a single input image is processed, and the image is described in a multi-scale manner, so that interesting information is obtained in different scales.
The core of the invention is to design a real-time and controllable scale space filtering method. To simplify the description, let
Figure BDA0002939014330000071
Figure BDA0002939014330000072
And is
Figure BDA0002939014330000073
ItAnd PtSatisfy I ═ It+PtHere, n represents that a total of n components are separated from the original image I. In order to perform scale-space filtering efficiently, the present embodiment provides a recurrent neural network model to cyclically output a plurality of filtering results of different scales. The cyclic neural network is to obtain a new output by taking the output of the neural network as the input of the network again, and then obtain a new output again by taking the new output as the input of the neural network again, so as to reciprocate until the condition of cycle termination is met. The recurrent neural network model can be trained both in a supervised and unsupervised manner. The recurrent neural network model includes a guide network G and a stripping network P. The specific method is that the original image I is used as the input of the guide network, and the guide network is used to output a plurality of edge images G with different scales of the image in a circulating modetThen, the edge image G of the image is processedtAnd the filtering result I of the previous stept-1(T is 1,2,3 … … T; the result of filtering in the zeroth step is the original image, i.e. I0I) are input together into a stripping network, which strips the edge image GtUnder the guidance of (2), outputting the next filtering result ItNext step of filtering result ItRecycled ground and edge image Gt+1Taken together as input to the stripped network, to again obtain new filteringResults It+1This operation is repeated until the number of cycles reaches the set total number of cycles T. It is emphasized that the filtering result I of the output of the stripping networktEdge image G to be inputtMaintaining the same picture structure, i.e. at GtUnder the guide of (3), image peeling is performed.
Specifically, definition 1 (image separation): given an image I, I is separated to obtain its constituent C ═ C1,C2,…,CnAre multiplied by
Figure BDA0002939014330000081
This is called image decomposition, and n denotes the separation of n components from the image.
The object of the present embodiment can be represented by definition 1;
Figure BDA0002939014330000082
representing a first order gradient of the decomposed components. The gradient may represent the structure and detail information of the image, and is obtained by subtracting adjacent pixels of the image, that is, each pixel value of the image gradient is the result of subtracting a pixel at a corresponding position in the original image and its adjacent pixels. Is calculated by the formula
Figure BDA0002939014330000083
Where x, y are the position indices of the image pixels. In general, the gradient corresponding to the edge image portion of the image is large. Due to a plurality of composition components CiThe complex relationship between them, it is very difficult to resolve multiple components directly from the image, and in order to make the problem easier to handle, image stripping based on the following sequence reduces it to a series of sub-problems that iteratively strip two components from the image.
Theorem 1 (image stripping of the sequence): suppose for any t, [ C ]t,It]Are all from It-1In such a way that the two components are iteratively separated from It-1Middle peeling off to obtain CtThe result obtained is the same as the result of separating all components directly from the image.
And (3) proving that: genus dissected according to hierarchyThe nature of the Chinese herbal medicine is that,
Figure BDA0002939014330000084
should be
Figure BDA0002939014330000085
A subset of non-zero elements. The properties that are retained by the structure can then be obtained,
Figure BDA0002939014330000086
and
Figure BDA0002939014330000087
should not have a correlation between them, can be expressed as
Figure BDA0002939014330000088
Now given It=It+1+Ct+1And
Figure BDA0002939014330000089
can obtain the product
Figure BDA00029390143300000810
And
Figure BDA00029390143300000811
by analogy, can obtain
Figure BDA00029390143300000812
Figure BDA00029390143300000813
And
Figure BDA00029390143300000814
theorem 1 proves the syndrome.
Hierarchical image stripping is a special member of the image separation series of tasks, which progressively separates/strips out the components of an image in a progressive manner, not all at once from the original. The above analysis transforms the initial problem into a sequential one, naturally inspiring solutions with recurrent neural networks. Each step cycle may be viewed as performing a controlled image stripping operation of the retaining structure. The goal to be completed for each step cycle can generally be written in the form:
Figure BDA00029390143300000815
wherein phi (-) and psi (-) are with respect to CtAnd Itσ denotes a hyper-parameter that controls the filtering/stripping strength. The formula represents C output at each step of the cycletAnd ItShould be such that the objective function Φ (C) of the optimizationt)+Ψ(It) A solution to the minimum is taken. Many conventional methods, e.g. L0[1]],RGF[2],RTV[3],muGIF[4]And the like, operations involving very large calculation amount such as inverse calculation of a matrix are slow, which limits the application of the traditional method in real time in practical scenes. When using the deep learning method, a deep neural network can be trained from the input and output perspective to simulate the effect of the conventional method (training the deep neural network using the result of the conventional method as supervised data). Once training is completed, the neural network only needs one forward propagation operation in execution, and the required operation overhead is greatly reduced. However, the use of the numerical parameter α to control the degree of filtering/stripping is not intuitive, and it is difficult to adjust the hyper-parameter α to obtain the desired result, after all α is only a number, a scalar. In contrast, guidance information that is meaningful and intuitive in visual perception is more practical. Among the many visual meaningful cues that can be selected, edge images are a good choice because they are very simple and can intuitively reflect the semantic features of an image and the overall outline of an image. Unfortunately, in general, at each stage of the loop, the edge image used as a guide is unknown. To solve this problem, the invention makes it possible to predict in advance a reasonable edge image for each phase of the cycle, i.e. G, using the bootstrap network Gt←G(It-1) And then using the edge image obtained by prediction to guide the stripping network P from It-1Middle peeling off to obtaint. With the above considerations in mind, the present invention therefore proposes a round robin strategy, cyclically with GtAs a guide from It-1Middle peeling off to obtaintI.e., [ C ]t,It]←P(It-1,Gt). t denotes the t-th step of the cycle. Notably, G istNot only can be output by a guide network, but also allows a user to create an edge image G in a customized mannert. All the image pixel values used in the invention have the value ranges of [0, 1]]For a value range of [0,255]Can be obtained by simple standardization operation to obtain [0, 1]]The image of (2).
As shown in fig. 2, the recurrent neural network model according to this embodiment logically includes two modules, one is a bootstrap network G, and the other is a stripping network P. The stripping network P is an edge image GtPerformed as a condition, edge image GtEither output by the bootstrap network G or provided by the user. By this logical partitioning, the boot network and the peel network can be decoupled to a large extent, further simplifying the problem. In addition, through the logical division, the solution space of the original problem is greatly restricted, so that the solution space becomes smaller, which is beneficial to the simplification and training of the model.
First, regarding the stripping network; the stripped network can not only learn the effect of any one of the existing conventional filtering methods with supervision (using the result of the conventional method as supervision during training), but also train in an unsupervised manner. At each step of the cycle, the core task of the stripping network is to accept the output I of the previous stept-1As input, at the edge image GtIs guided by (2) to perform image filtering, thereby obtaining a filtered image fromt-1Middle peeling off to obtaint. Edge image G whether used as a guidetWhat appears to be reasonable, peeling result ItAll should strictly comply with GtThe guidance of (2). Since this section mainly introduces the peeling network, assume that the edge image GtAre existing, known, and the following section will give details on how the bootstrap network is designed to get GtIn (1). Then based on It-1=It+CtThis hard constraint, the stripped network, may be represented byt-1As input, output ItOr Ct. As long as ItOr CtOne of the two is output and the other can simply be through It-1And subtracting the output result. Furthermore, to better consider the context information of the image, the stripped network needs to have a larger receptive field. Thus, as shown in FIG. 2, a hole convolution is introduced to gradually increase the receptive field, where the dilation rate of the hole convolution increases exponentially. Although the purpose of increasing the receptive field can be achieved by designing a deeper network, the parameter number of the very deep network is very large, and in order to save the storage overhead of the model, the receptive field is enlarged without increasing the parameter number of the network by deepening the network layer number, but by introducing the hole convolution.
And second to boot the network. For input image It-1The result of the filtering is, in general, by setting the hyper-parameter σtObtained (for different scales, σ)tDifferent), can be represented as
Figure BDA0002939014330000091
F denotes a specific filtering method and,
Figure BDA0002939014330000092
is the target result of image filtering. What the bootstrap network needs to do is to establish the sigma of the valuetAnd the corresponding guide map (edge image in the present invention) having a visual perception meaning. In order to achieve this object,
Figure BDA0002939014330000093
gradient of (2)
Figure BDA0002939014330000094
Can be used as a pair GtTo train a bootstrap network. However, there is some difference between the image gradient and the edge image in the true sense. A reasonable edge image should be considered semantically, and at the same time be binary (binary image takes value of 0 or 1).The edge image considering the semantic meaning can reflect the target which can be perceived by human eyes in the image of the image, and the binary edge image can avoid the ambiguity when judging whether a certain pixel belongs to the edge image. To reduce the difference between the gradient and the edge image in the true sense, on the one hand, similar to the stripped network, the steering network also employs hole convolution to increase the receptive field to better learn the contextual characteristics of the image. The perception of the neural network on semantically meaningful objects can be obviously improved by using the context information obtained by the cavity convolution learning; on the other hand, a Sigmoid activation function is added to the last layer of the network to force the input as close to binary as possible. More details can be taken from the part of the guiding network in fig. 2. However, real supervision data
Figure BDA0002939014330000101
Not always available during the training phase. Therefore, when
Figure BDA0002939014330000102
In the absence of the presence of the agent,
Figure BDA0002939014330000103
gradient of (2)
Figure BDA0002939014330000104
Should use I0Gradient of (2)
Figure BDA0002939014330000105
To approximate. Result of approximation
Figure BDA0002939014330000106
May not be particularly precise as long as the overall structure of the image can be reflected. By repeatedly using I0Gradient of (2)
Figure BDA0002939014330000107
Of a series of different dimensions
Figure BDA0002939014330000108
Can be constructed to train a reasonable steering network.
Why the edge image G is not directly obtained by the ready-made edge image detection methodtSuch as conventional edge image detection operators Roberts, Sobel and Canny, or other edge image detection methods based on deep learning. Theoretically, the existing edge image detection method is feasible, but the existing method has the defects of sensitivity to noise, inaccurate edge image positioning, too thick edge image obtained by detection, unsatisfied hierarchical property and the like. In addition, the basic solution idea of the deep learning method is to learn the multi-scale features of a single input image, and then fuse the multi-scale features to obtain a final predicted edge image. The existing deep learning method needs to acquire multi-scale features by means of a pre-trained image classification network such as AlexNet or VGG 16. The specific method of these existing deep learning methods is to fuse features output by different layers of the pre-trained image classification network by using 1 × 1 convolution operation, and then obtain a predicted edge image by using the fused features, where the features of different layers of the pre-trained image classification network can be regarded as features with multiple scales. As shown in fig. 2, the framework of this embodiment can overcome the problems of the existing deep learning method, because the present invention reuses the parameters of the network and cyclically outputs the feature F output in the previous stept-1Re-used as input to obtain the next feature Ft。FtCompared with Ft-1Has a larger receptive field.
How the neural network model in this embodiment is trained is specifically as follows: for different scales, different σ needs to be sett,ItCan be seen as setting different sigmatAs a result of the filtering of (2), each sigmatWill have an edge image GtCorresponding to it. T represents the total number of cycles, and in this embodiment, T is temporarily set to 4. It should be noted that the recurrent neural network may be cycled any number of times, not limited to T times. The interval of the filtering degree at each step is formed bytControl, σtOr may be adjusted according to particular needs.
The loss function during network training can be simultaneously applied to supervised and unsupervised training modes, including guiding consistency loss, peeling reconstruction loss, peeling maintenance loss and peeling consistency loss. In connection with the supervised image stripping,
Figure BDA0002939014330000111
and
Figure BDA0002939014330000112
is present, obtainable, wherein
Figure BDA0002939014330000113
Can be generated by any existing image filtering method and used as supervision in neural network training; for unsupervised image stripping,
Figure BDA0002939014330000114
the specific form of the loss function is described below. The guiding consistency loss is to ensure the edge image G output by the guiding networktAnd
Figure BDA0002939014330000115
and the consistency is maintained. Let 1 denote that one pixel value is all 1 and the size and ItThe same image, ° represents the Hadamard product, or pixel-by-pixel multiplication, i.e. the corresponding multiplication of pixels at the same position in both images. If it is not
Figure BDA0002939014330000116
Is not available and can be obtained
Figure BDA0002939014330000117
Wherein G isgrTo further enhance the important edge images in the image. GgrCan be manually marked edge images or pairs
Figure BDA0002939014330000118
And (5) binarization result. For simplicity of description, the following description is used uniformly
Figure BDA0002939014330000119
To represent
Figure BDA00029390143300001110
The boot consistency loss can be expressed as:
Figure BDA00029390143300001111
wherein the content of the first and second substances,
Figure BDA00029390143300001112
||·||1denotes a 1 norm, βgIs a constant that balances the two terms in the loss function. The cycle consistency loss is compared
Figure BDA00029390143300001113
And
Figure BDA00029390143300001114
the sizes of the two are determined as GtWhether the pixel at a certain position in the image belongs to the edge image or not.
Stripping reconstruction losing result I of desired outputtAnd
Figure BDA00029390143300001115
as close as possible in color space. Make | · | non-conducting phosphor2Is a 2 norm in the form:
Figure BDA00029390143300001116
the purpose of the peel retention loss is to retain ItThe gradient of the structural moiety in (1) is unchanged. I istPixel and G identified as belonging to a structuretThe pixels whose median value is close to 1 have the same position, in other words,Itstructural pixel of (2) corresponds to GtA pixel having a middle pixel value close to 1 (considered to belong to an edge image). Since the gradient of an image can naturally reflect the structural information of the image, the peel retention loss is defined as ItAnd
Figure BDA00029390143300001117
is the distance between the gradients of (1), i.e.
Figure BDA00029390143300001118
And
Figure BDA00029390143300001119
the distance between:
Figure BDA0002939014330000121
the loss of peel consistency severely constrains the image peeling process such that the result of peeling is in accordance with GtAnd the consistency is maintained. Peeling consistency loss pair ItEach pixel of (a) is smoothed to a different degree. For ItThe punishment of the pixels belonging to the structure is small, and the punishment of the pixels belonging to the texture is large. The specific form of peel consistency loss is:
Figure BDA0002939014330000122
where e is used to avoid a denominator of 0. In order to stabilize the training and increase the convergence rate during the training, the bootstrap network and the stripper network are trained independently of each other. Since the stripping network requires GtThe lead network is trained with the lead consistency loss, the parameters of the trained lead network are fixed, and the peel network is trained with the peel reconstruction loss, the peel retention loss, and the peel consistency loss.
Further, to demonstrate the significant progress of the method of the present invention, the following is illustrated with reference to some experimental results:
firstly, in order to fully utilize the multi-scale characteristics of the method, a new strategy is provided for applying the method to the significance detection task. The image saliency detection means that a computer algorithm is utilized to simulate the visual characteristics of human beings so as to extract a salient region in an image. The salient regions of an image can be regarded as the parts most noticeable to the human eye when looking at an image, and in general, the salient, high-contrast and color-changing regions of an image are noticeable to the human eye. Saliency detection is closely related to the nature of what the human visual system selectively processes, and its goal is to locate important and salient regions or objects in an image, which is an important and popular research direction in the field of computer vision. This example first utilizes the existing significance detection model CSF 6]And EGNet [7 ]]The original image I and four filtering results (total 5) generated by the method of the invention are respectively subjected to significance detection, and then DUTS-TR [8 ]]A lightweight network (only 91KB) was trained on the dataset to predict a better saliency map from these five saliency detections. The invention refers to a method for evaluating the significance detection accuracy in the related documents, and the significance detection quality is measured by using the average absolute error, and the calculation formula is MAE (S)o,Sgt):=mean(|So-SgtIn which S)oSignificance map, S, output for the modelgtIs a true saliency map (marked by a human). The evaluation dataset of this example is a public significance detection dataset: ECSSD [9],PASCAL-S[10],HKU-IS[11],SOD[12]And DUTS-TE [8 ]]. The results show that the method can effectively improve the performance of the existing saliency detection model, because some features useful for saliency detection may be more prominent in different scales, and the removal of unwanted textures in the image helps to enhance the contrast of a salient region. In addition to saliency detection, the present invention can flexibly improve the performance of many other visual and graphical models.
In addition, the embodiment is compared with other filtering methods. Conventional methods for comparison include L0[1]],RTV[3],RGF[2],SD[17],muGIF[4],realLS[18]And enBF [19 ]](ii) a The deep learning method comprises DEAF [15]],FIP[16]And PIO 5]. In the visual results, Ours-S (A), (B), (C), (D
Figure BDA0002939014330000131
Produced by muGIF) and Ours represent models resulting from supervised and unsupervised training, respectively. For evaluating the quality of the image, gradient correlation coefficients are used
Figure BDA0002939014330000132
To evaluate the degree of irrelevancy between the stripped-out image texture and structure. For fairness, the hyper-parameters of the contrast method are also carefully adjusted so that all methods achieve a similar degree of filtering/smoothing, which is calculated by
Figure BDA0002939014330000133
TABLE 1 quantitative comparison of GCC and execution speed for each method model
Figure BDA0002939014330000136
Note 1: quantitative comparison with GCC. For a fair comparison, the smoothing/filtering degree of all comparison methods is controlled to 0.146 ± 0.01. The best results are bolded. Smaller GCC values indicate better results.
Note 2: run-time contrast when processing 1080p (1627 × 1080) images. The time of the CPU is not marked in the table, and the time of the GPU is used
Figure BDA0002939014330000137
And (4) marking.
As can be seen from the quantitative results in Table 1, the method of the present invention ranks first in the index of GCC compared to other methods, which illustrates that it is obtained using the recurrent neural network framework of the present invention
Figure BDA0002939014330000134
And
Figure BDA0002939014330000135
the mutually orthogonal property is well satisfied. In addition, whether running on a CPU or GPU, the recurrent neural network model executes much faster than conventional methods. Based on the advantages of the deep learning technology, the cyclic neural network model and the PIO can achieve real-time speed when processing 1080p images. In terms of visual effect, it was observed that L0, RGF and PIO had very poor visual effects and PIO also had a very severe color shift problem when the degree of filtering/smoothing was increased. While RTV and muGIF perform relatively well, both methods do not completely smooth or preserve certain areas of the image, in contrast, the present method achieves visually pleasing results in both smoothing the texture details of the image and preserving the main texture/edge images of the image. It is worth mentioning here that, except for the method of the present invention, neither the conventional method nor the depth learning method can generate a filtering result conforming to the indication of the guide image by using an edge image whose scale changes step by step or an edge image provided/edited by a user as the guide image. As shown in fig. 3a to 3c, model flexibility is verified using a manually edited guide map. The guide map is formed by combining four edge images with different scales. Only the method of the present invention successfully outputs a filtering result structurally consistent with the guide image, as compared with other methods.
The invention also evaluates the edge image output by the guide network. Because the framework involved in the invention has the property of multi-scale, the edge image output by the guide network at each step of the loop can be used for constructing an edge image confidence map. The edge image confidence map is also an edge image, except that the value of each pixel in the edge image confidence map is not necessarily close to 0 or 1, and there may be many pixel values around 0.5, because the value of each pixel in the edge image confidence map can be regarded as the probability whether the pixel belongs to the edge image, and the larger the value of the pixel in the edge image confidence map is, the more likely the pixel belongs to the edge image. In particular, the present invention relates to a method for producing,the real edge image artificially marked in the BSDS500 data set is used as G in the process of unsupervised learninggrTraining a guide network, enabling the guide network to iterate for 24 times during execution, and averaging edge images obtained by iterating for 24 times to obtain an edge image confidence map. The constructed edge image confidence map is quantitatively evaluated with an accuracy-recall curve. The edge image confidence map is processed for non-maximum suppression before evaluation.
Finally, an ablation experiment for the recurrent neural network model of the present invention is presented. Since the guiding network has only one guiding consistency loss, it is no longer necessary or necessary to perform an ablation analysis on the loss function of the guiding network. For with It-1As an input, there are two execution modes for the stripping network: one is output ItThe other is output Ct. The invention is inclined to output CtI.e. Ct←P(It-1,Gt). The reason for this is CtContaining the information ratio ItLess and simpler distribution.
In addition, the technical scheme of the invention can be replaced by the following steps:
alternative one: edge image GtThe method is not obtained by guiding network output, but by any one of the existing deep or non-deep edge image detection methods.
Alternative scheme two: each step of loop is not a single forward propagation operation, but needs to iterate two or three steps again, which is equivalent to embedding a small loop in a large loop. The cyclic neural network model of the invention has only one forward operation in each step of cycle, and a small cycle cannot be embedded in a large cycle.
Alternative scheme three: the interaction mode of the guide network and the stripping network is not to take the output of the guide network as the input of the stripping network, but to directly use the guide network to output the model parameters of the stripping network.
The present invention is not limited to the above-described embodiments. The foregoing description of the specific embodiments is intended to describe and illustrate the technical solutions of the present invention, and the above specific embodiments are merely illustrative and not restrictive. Those skilled in the art can make many changes and modifications to the invention without departing from the spirit and scope of the invention as defined in the appended claims.
Reference documents:
[1]Li Xu,Cewu Lu,Yi Xu,and Jiaya Jia.Image smoothing via L0 gradient minimization.TOG,30(6):112,2011.
[2]Qi Zhang,Xiaoyong Shen,Li Xu,and Jiaya Jia.Rolling guidance filter.In ECCV,2014.
[3]L.Xu,Q.Yan,Y.Xia,and J.Jia.Structure extraction from texture via relative total variation.TOG.,31(6):139,2012.
[4]X.Guo,Y.Li,J.Ma,and H.Ling.Mutually guided image filtering.TPAMI,42(3):694–707,2020.
[5]Qingnan Fan,Dongdong Chen,Lu Yuan,Gang Hua,Nenghai Yu,and Baoquan Chen.A general decoupled learning framework for parameterized image operators.TPAMI,2019.
[6]Shang-Hua Gao,Yong-Qiang Tan,Ming-Ming Cheng,Chengze Lu,Yunpeng Chen,and Shuicheng Yan.Highly efficient salient object detection with 100k parameters.In ECCV,2020.
[7]Jia-Xing Zhao,Jiang-Jiang Liu,Deng-Ping Fan,Yang Cao,Jufeng Yang,and Ming-Ming Cheng.Egnet:edge guidance network for salient object detection.In ICCV,Oct 2019.
[8]Chuan Yang,Lihe Zhang,Ruan Xiang Lu,Huchuan,and Ming-Hsuan Yang.Saliency detection via graph-based manifold ranking.In CVPR,pages 3166–3173,2013.
[9]Q.Yan,L.Xu,J.Shi,and J.Jia.Hierarchical saliency detection.In CVPR,pages 1155–1162,2013.
[10]Y.Li,X.Hou,C.Koch,J.M.Rehg,and A.L.Yuille.The secrets of salient object segmentation.In CVPR,pages 280–287,2014.
[11]Guanbin Li and Y.Yu.Visual saliency based on multiscale deep features.In CVPR,pages 5455–5463,2015.
[12]David Martin,Charless Fowlkes,Doron Tal,and Jitendra Malik.A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics.In ICCV,2001.
[13]Michal Gharbi,Gaurav Chaurasia,Sylvain Paris,and Frdo Durand.Deep joint demosaicking and denoising.TOG,35(6):1–12,2016.
[14]Sifei Liu,Jinshan Pan,and Ming-Hsuan Yang.Learning recursive filters for low-level vision via a hybrid neural network.In ECCV,2016.
[15]Li Xu,Jimmy S.J.Ren,Qiong Yan,Renjie Liao,and Jiaya Jia.Deep edge-aware filters.In ICML,2015.
[16]Qifeng Chen,Jia Xu,and Vladlen Koltun.Fast image processing with fully-convolutional networks.In ICCV,pages 2516–2525,2017.
[17]Bumsub Ham,Minsu Cho,and Jean Ponce.Robust guided image filtering using nonconvex potentials.TPAMI,40(1):192–207,2017.
[18]Wei Liu,Pingping Zhang,Xiaolin Huang,Jie Yang,Chunhua Shen,and Ian Reid.Real-time image smoothing via iterative least squares.TOG,39(3):28,2020.
[19]Wei Liu,Pingping Zhang,Xiaogang Chen,Chunhua Shen,Xiaolin Huang,and Jie Yang.Embedding bilateral filter in least squares for efficient edge-preserving image smoothing.TCSVT,30(1):23–35,2020.
[20]John Canny.A computational approach to edge detection.TPAMI,8(6):679–698,1986.
[21]David R Martin,Charless C Fowlkes,and Jitendra Malik.Learning to detect natural image boundaries using local brightness,color,and texture cues.TPAMI,26(5):530–549,2004.
[22]Pablo Arbelaez,Michael Maire,Charless Fowlkes,and Jitendra Malik.Contour detection and hierarchical image segmentation.TPAMI,33(5):898–916,2011.
[23]Zhile Ren and Gregory Shakhnarovich.Image segmentation by cascaded region agglomeration.In CVPR,pages 2011–2018,2013.
[24]Piotr Doll′ar and C Lawrence Zitnick.Structured forests for fast edge detection.In CVPR,pages 1841–1848,2013.
[25]Wei Shen,Xinggang Wang,Yan Wang,Xiang Bai,and Zhijiang Zhang.DeepContour:A deep convolutional feature learned by positive-sharing loss for contour detection.In CVPR,pages 3982–3991,2015.
[26]Saining Xie and Zhuowen Tu.Holistically-nested edge detection.In CVPR,2015.
[27]Yun Liu,Ming-Ming Cheng,Xiaowei Hu,Kai Wang,and Xiang Bai.Richer convolutional features for edge detection.In CVPR,2017.

Claims (7)

1. a real-time and controllable scale space filtering method is characterized in that based on a recurrent neural network model composed of a guide network G and a stripping network P, the method comprises the following steps:
(1) using original image I as input of guide network G, utilizing guide network to output several edge images G with different scales of image in circulating modetThen, the edge image G of the image is processedtAnd the filtering result It-1Input together into the stripping network P; wherein t represents the t-th step of the cycle; t is 1,2,3 … … T, wherein I0I denotes the original drawing;
(2) stripping the network P at the edge image GtUnder the guidance of (2), outputting the next filtering result It
(3) Filtering result ItRecycled ground and edge image Gt+1Taken together as input to the stripping network to again obtain a new filtering result It+1Repeating the operation until the cycle number reaches the set total cycle number T; in which the filtering result I output by the stripping network P istAnd the input edge image GtMaintaining the same picture structure, i.e. at GtUnder the guide of (3), image peeling is performed.
2. A real-time and controllable scale-space filtering method according to claim 1, wherein the peeling network P is capable of peeling each component from the image hierarchically: the structure/edge image included in the filtering result of each step is a subset of the structure/edge image included in the filtering result of the previous step.
3. A real-time and controllable scale-space filtering method according to claim 1, characterized in that for edge images GtOf the edge image, which is in the filtering result ItThe gradient of the pixels corresponding to the same position should remain unchanged; for GtOf the non-edge image, which is in the filtering result ItThe more thoroughly the corresponding co-located pixels are smoothed the better.
4. A real-time and controllable scale space filtering method according to claim 1, characterized in that the edge image GtCan be obtained by any one of the existing depth or non-depth edge image detection methods.
5. A real-time and controllable scale space filtering method according to claim 1, wherein the loop of each step in the filtering method is implemented by a single forward propagation operation or by an iterative two-step or three-step operation.
6. A real-time and controllable scale space filtering method according to claim 1, wherein the peeling network P can learn any one of the existing filtering methods supervised and also can train in an unsupervised manner; at each step of the cycle, the core of the stripping network P is to accept the output I of the previous stept-1As input, at the edge image GtIs guided by (2) to perform image filtering, thereby obtaining a filtered image fromt-1Middle peeling off to obtaint
7. A real-time and controllable scale space filtering method according to claim 1, wherein the filtering result of the input image is obtained by setting hyper-parameters,the guide network G is used for establishing a hyper-parameter and sum-edge image GtThe relation between the two; setting different hyper-parameters for different scales, wherein each hyper-parameter corresponds to an edge image Gt
CN202110172012.2A 2021-02-08 2021-02-08 Real-time and controllable scale space filtering method Active CN112862715B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110172012.2A CN112862715B (en) 2021-02-08 2021-02-08 Real-time and controllable scale space filtering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110172012.2A CN112862715B (en) 2021-02-08 2021-02-08 Real-time and controllable scale space filtering method

Publications (2)

Publication Number Publication Date
CN112862715A true CN112862715A (en) 2021-05-28
CN112862715B CN112862715B (en) 2023-06-30

Family

ID=75989229

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110172012.2A Active CN112862715B (en) 2021-02-08 2021-02-08 Real-time and controllable scale space filtering method

Country Status (1)

Country Link
CN (1) CN112862715B (en)

Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100074552A1 (en) * 2008-09-24 2010-03-25 Microsoft Corporation Removing blur from an image
CN102521798A (en) * 2011-11-11 2012-06-27 浙江捷尚视觉科技有限公司 Image automatic recovering method for cutting and selecting mask structure based on effective characteristic
CN105931192A (en) * 2016-03-21 2016-09-07 温州大学 Image texture filtering method based on weighted median filtering
CN107622481A (en) * 2017-10-25 2018-01-23 沈阳东软医疗系统有限公司 Reduce the method, apparatus and computer equipment of CT picture noises
CN107633490A (en) * 2017-09-19 2018-01-26 北京小米移动软件有限公司 Image processing method, device and storage medium
CN107844751A (en) * 2017-10-19 2018-03-27 陕西师范大学 The sorting technique of guiding filtering length Memory Neural Networks high-spectrum remote sensing
CN108280831A (en) * 2018-02-02 2018-07-13 南昌航空大学 A kind of acquisition methods and system of image sequence light stream
CN108492308A (en) * 2018-04-18 2018-09-04 南昌航空大学 A kind of determination method and system of variation light stream based on mutual structure guiding filtering
CN109118451A (en) * 2018-08-21 2019-01-01 李青山 A kind of aviation orthography defogging algorithm returned based on convolution
CN109272539A (en) * 2018-09-13 2019-01-25 云南大学 The decomposition method of image texture and structure based on guidance figure Total Variation
CN109450406A (en) * 2018-11-13 2019-03-08 中国人民解放军海军航空大学 A kind of filter construction based on Recognition with Recurrent Neural Network
CN109978764A (en) * 2019-03-11 2019-07-05 厦门美图之家科技有限公司 A kind of image processing method and calculate equipment
CN110009580A (en) * 2019-03-18 2019-07-12 华东师范大学 The two-way rain removing method of single picture based on picture block raindrop closeness
CN110246099A (en) * 2019-06-10 2019-09-17 浙江传媒学院 It is a kind of keep structural edge image remove texture method
CN110276721A (en) * 2019-04-28 2019-09-24 天津大学 Image super-resolution rebuilding method based on cascade residual error convolutional neural networks
CN110689021A (en) * 2019-10-17 2020-01-14 哈尔滨理工大学 Real-time target detection method in low-visibility environment based on deep learning
CN110910317A (en) * 2019-08-19 2020-03-24 北京理工大学 Tongue image enhancement method
CN110991463A (en) * 2019-11-04 2020-04-10 同济大学 Multi-scale guided filtering feature extraction method under guide of super-pixel map
CN111275642A (en) * 2020-01-16 2020-06-12 西安交通大学 Low-illumination image enhancement method based on significant foreground content
CN111462012A (en) * 2020-04-02 2020-07-28 武汉大学 SAR image simulation method for generating countermeasure network based on conditions
CN111626330A (en) * 2020-04-23 2020-09-04 南京邮电大学 Target detection method and system based on multi-scale characteristic diagram reconstruction and knowledge distillation
CN111639471A (en) * 2020-06-01 2020-09-08 浙江大学 Electromagnetic interference filter design method based on recurrent neural network
CN112132753A (en) * 2020-11-06 2020-12-25 湖南大学 Infrared image super-resolution method and system for multi-scale structure guide image

Patent Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100074552A1 (en) * 2008-09-24 2010-03-25 Microsoft Corporation Removing blur from an image
CN102521798A (en) * 2011-11-11 2012-06-27 浙江捷尚视觉科技有限公司 Image automatic recovering method for cutting and selecting mask structure based on effective characteristic
CN105931192A (en) * 2016-03-21 2016-09-07 温州大学 Image texture filtering method based on weighted median filtering
CN107633490A (en) * 2017-09-19 2018-01-26 北京小米移动软件有限公司 Image processing method, device and storage medium
CN107844751A (en) * 2017-10-19 2018-03-27 陕西师范大学 The sorting technique of guiding filtering length Memory Neural Networks high-spectrum remote sensing
CN107622481A (en) * 2017-10-25 2018-01-23 沈阳东软医疗系统有限公司 Reduce the method, apparatus and computer equipment of CT picture noises
CN108280831A (en) * 2018-02-02 2018-07-13 南昌航空大学 A kind of acquisition methods and system of image sequence light stream
CN108492308A (en) * 2018-04-18 2018-09-04 南昌航空大学 A kind of determination method and system of variation light stream based on mutual structure guiding filtering
CN109118451A (en) * 2018-08-21 2019-01-01 李青山 A kind of aviation orthography defogging algorithm returned based on convolution
CN109272539A (en) * 2018-09-13 2019-01-25 云南大学 The decomposition method of image texture and structure based on guidance figure Total Variation
CN109450406A (en) * 2018-11-13 2019-03-08 中国人民解放军海军航空大学 A kind of filter construction based on Recognition with Recurrent Neural Network
CN109978764A (en) * 2019-03-11 2019-07-05 厦门美图之家科技有限公司 A kind of image processing method and calculate equipment
CN110009580A (en) * 2019-03-18 2019-07-12 华东师范大学 The two-way rain removing method of single picture based on picture block raindrop closeness
CN110276721A (en) * 2019-04-28 2019-09-24 天津大学 Image super-resolution rebuilding method based on cascade residual error convolutional neural networks
CN110246099A (en) * 2019-06-10 2019-09-17 浙江传媒学院 It is a kind of keep structural edge image remove texture method
CN110910317A (en) * 2019-08-19 2020-03-24 北京理工大学 Tongue image enhancement method
CN110689021A (en) * 2019-10-17 2020-01-14 哈尔滨理工大学 Real-time target detection method in low-visibility environment based on deep learning
CN110991463A (en) * 2019-11-04 2020-04-10 同济大学 Multi-scale guided filtering feature extraction method under guide of super-pixel map
CN111275642A (en) * 2020-01-16 2020-06-12 西安交通大学 Low-illumination image enhancement method based on significant foreground content
CN111462012A (en) * 2020-04-02 2020-07-28 武汉大学 SAR image simulation method for generating countermeasure network based on conditions
CN111626330A (en) * 2020-04-23 2020-09-04 南京邮电大学 Target detection method and system based on multi-scale characteristic diagram reconstruction and knowledge distillation
CN111639471A (en) * 2020-06-01 2020-09-08 浙江大学 Electromagnetic interference filter design method based on recurrent neural network
CN112132753A (en) * 2020-11-06 2020-12-25 湖南大学 Infrared image super-resolution method and system for multi-scale structure guide image

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LIXIANG ZHEN;LU YONG GE;XUZHI MING;YAOJING PING: "《RADAR Cross Section Measurement And Imaging RRelated To Ship Target In The Sea Environment》", 《PROCEDIA COMPUTER SCIENCE》, 6 February 2019 (2019-02-06) *
张永新作, 新华出版社 *
张燕咏,张莎,张昱等: "《基于多模态融合的自动驾驶感知及计算》", 《计算机研究与发展》 *
张燕咏,张莎,张昱等: "《基于多模态融合的自动驾驶感知及计算》", 《计算机研究与发展》, 31 December 2020 (2020-12-31) *

Also Published As

Publication number Publication date
CN112862715B (en) 2023-06-30

Similar Documents

Publication Publication Date Title
Li et al. A closed-form solution to photorealistic image stylization
Tian et al. Deep learning on image denoising: An overview
Li et al. Single image dehazing via conditional generative adversarial network
Zhu et al. A fast single image haze removal algorithm using color attenuation prior
Zhou et al. UGIF-Net: An efficient fully guided information flow network for underwater image enhancement
CN108875935B (en) Natural image target material visual characteristic mapping method based on generation countermeasure network
Brox et al. Unsupervised segmentation incorporating colour, texture, and motion
Liu et al. Interactive image segmentation based on level sets of probabilities
Atoum et al. Color-wise attention network for low-light image enhancement
Basaran et al. An efficient framework for visible–infrared cross modality person re-identification
WO2015192115A1 (en) Systems and methods for automated hierarchical image representation and haze removal
CN111488865A (en) Image optimization method and device, computer storage medium and electronic equipment
CA3137297C (en) Adaptive convolutions in neural networks
KR102311796B1 (en) Method and Apparatus for Deblurring of Human Motion using Localized Body Prior
CN111681198A (en) Morphological attribute filtering multimode fusion imaging method, system and medium
Feng et al. URNet: A U-Net based residual network for image dehazing
CN113379707A (en) RGB-D significance detection method based on dynamic filtering decoupling convolution network
CN115880720A (en) Non-labeling scene self-adaptive human body posture and shape estimation method based on confidence degree sharing
Qu et al. UMLE: unsupervised multi-discriminator network for low light enhancement
Wang et al. Adaptive shape prior in graph cut segmentation
Yuan et al. Explore double-opponency and skin color for saliency detection
CN116342377A (en) Self-adaptive generation method and system for camouflage target image in degraded scene
Guo et al. Progressive Domain Translation Defogging network for real-world fog images
CN113627342B (en) Method, system, equipment and storage medium for video depth feature extraction optimization
CN112862715B (en) Real-time and controllable scale space filtering method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant