CN115861343A - Method and system for representing arbitrary scale image based on dynamic implicit image function - Google Patents
Method and system for representing arbitrary scale image based on dynamic implicit image function Download PDFInfo
- Publication number
- CN115861343A CN115861343A CN202211590183.8A CN202211590183A CN115861343A CN 115861343 A CN115861343 A CN 115861343A CN 202211590183 A CN202211590183 A CN 202211590183A CN 115861343 A CN115861343 A CN 115861343A
- Authority
- CN
- China
- Prior art keywords
- coordinate
- image
- slice
- processing
- dynamic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000012545 processing Methods 0.000 claims abstract description 61
- 238000010586 diagram Methods 0.000 claims abstract description 30
- 239000013598 vector Substances 0.000 claims description 37
- 238000012549 training Methods 0.000 claims description 14
- 230000004913 activation Effects 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 2
- 238000013473 artificial intelligence Methods 0.000 abstract description 10
- 238000004364 calculation method Methods 0.000 abstract description 7
- 230000006870 function Effects 0.000 description 29
- 238000005516 engineering process Methods 0.000 description 12
- 230000001537 neural effect Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Abstract
The invention discloses an arbitrary scale image representation method and system based on a dynamic implicit image function, wherein the method comprises the steps of obtaining an image to be processed; carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional characteristic diagram; and inputting the two-dimensional characteristic diagram into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and carrying out pixel value prediction processing through a double-stage multilayer perceptron to obtain an image pixel value. The embodiment of the invention can reduce the calculation cost of continuous representation of the image, improve the processing performance and can be widely applied to the technical field of artificial intelligence.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an arbitrary scale image representation method and system based on a dynamic implicit image function.
Background
Digital images are two-dimensional representations of the real world in the digital world, but the continuous physical world is often quantified in a sensor while stored in a computer as a matrix of discrete pixels. If the images can be expressed in a continuous form, images of arbitrary resolution can be acquired in a continuous space, thereby ensuring the accuracy of the described scene of the image. Although the continuous representation method for the images in the related art has excellent performance in the aspect of continuous image representation, the calculation cost increases in a square order along with the increase of the image magnification, so that super-resolution reconstruction at any scale is time-consuming and huge. In view of the above, there is a need to solve the technical problems in the related art.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and a system for representing an image with an arbitrary scale based on a dynamic implicit image function, so as to reduce the computation cost and improve the processing performance.
In one aspect, the present invention provides a method for representing an image with an arbitrary scale based on a dynamic implicit image function, including:
acquiring an image to be processed;
carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional characteristic diagram;
and inputting the two-dimensional characteristic diagram into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and carrying out pixel value prediction processing through a two-stage multilayer perceptron to obtain an image pixel value.
Optionally, the performing dynamic coordinate slicing processing on the two-dimensional feature map includes:
inputting an image magnification;
acquiring a feature vector from the two-dimensional feature map, determining the feature vector as a hidden code, and grouping coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate group;
and slicing the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.
Optionally, the slicing the feature coordinate set according to the image magnification to obtain a coordinate slice includes:
determining slice intervals according to the image magnification;
and dividing the characteristic coordinate group according to the slice interval to obtain a coordinate slice, wherein the coordinate slice is used for sharing the same hidden code for all coordinates in the slice.
Optionally, the performing, by the dual-stage multi-layer perceptron, pixel value prediction processing includes:
inputting a coordinate slice and a slice hidden code;
carrying out first-stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;
acquiring a coordinate to be predicted, wherein the coordinate to be predicted is any coordinate in the coordinate slice;
and carrying out second-stage processing on the slice implicit vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.
Optionally, the dual-stage multi-layer perceptron comprises a hidden layer, the hidden layer consisting of a linear layer and an activation function.
Optionally, before the pre-trained encoder performs implicit encoding processing on the image to be processed to obtain the two-dimensional feature map, the method further includes pre-training the encoder and the dynamic implicit image network, specifically including:
acquiring a training image;
performing pixel prediction processing on the training image through the encoder and the dynamic implicit image network to obtain a predicted pixel value;
determining a pixel loss value according to the pixel value of the training image and the predicted pixel value;
and updating the weight parameters of the encoder and the dynamic implicit image network according to the pixel loss value to obtain the trained encoder and the trained dynamic implicit image network.
In another aspect, an embodiment of the present invention further provides a system, including:
the first module is used for acquiring an image to be processed;
the second module is used for carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional characteristic diagram;
and the third module is used for inputting the two-dimensional characteristic diagram into a dynamic implicit image network, performing dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and performing pixel value prediction processing through a double-stage multilayer perceptron to obtain an image pixel value.
Optionally, the third module comprises:
the first submodule is used for carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram;
and the second sub-module is used for carrying out pixel value prediction processing through the two-stage multi-layer perceptron.
Optionally, the first sub-module comprises:
a first unit for inputting an image magnification;
the second unit is used for acquiring a feature vector from the two-dimensional feature map, determining the feature vector as a hidden code, and performing grouping processing on coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate group;
and the third unit is used for carrying out slicing processing on the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.
Optionally, the second sub-module comprises:
a fourth unit for inputting a coordinate slice and a slice hidden code;
a fifth unit, configured to perform first-stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;
a sixth unit, configured to acquire a coordinate to be predicted, where the coordinate to be predicted is any coordinate in the coordinate slice;
and the seventh unit is used for carrying out second-stage processing on the slice hidden vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects: in the embodiment of the invention, the two-dimensional characteristic diagram is input into a dynamic implicit image network, and dynamic coordinate slicing processing is carried out on the two-dimensional characteristic diagram, so that the neural network can execute many-to-many mapping from coordinate slices to pixel value slices, a decoder can predict all pixel values corresponding to the coordinate slices by using an implicit code only once, and the calculation cost is reduced; and the pixel value prediction processing is carried out through the two-stage multilayer perceptron to obtain the pixel value of the image, so that the decoder can use the coordinates with non-fixed number as input, thereby reducing the number of hidden layers and improving the processing performance.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart of an arbitrary scale image representation method based on a dynamic implicit image function according to an embodiment of the present application;
FIG. 2 is an overall frame diagram of a dynamic implicit image function provided in an embodiment of the present application;
FIG. 3 is an exemplary diagram of a coordinate slice provided by an embodiment of the present application;
fig. 4 is a structural diagram of a two-stage multilayer sensor according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The method and the system for representing the image with any scale based on the dynamic implicit image function provided by the embodiment of the application mainly relate to the artificial intelligence technology. Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence basic technology generally comprises technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, a big data processing technology, an operation/interaction system, electromechanical integration and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Specifically, the method and system for representing an image at any scale based on a dynamic implicit image function provided in the embodiment of the present application may employ a computer vision technique and a machine learning/deep learning technique in the field of artificial intelligence to analyze and process the image, so as to obtain a continuous image representation of the image. It can be understood that, for different tasks, the methods provided in the embodiments of the present application may all be executed in application scenarios of corresponding artificial intelligence systems; in addition, the specific time for executing the methods can be in any link in the operation flow of the artificial intelligence system.
Implicit neural representation techniques implicit neural representation can capture details of an object with a small number of parameters, and its differentiable nature allows back propagation through a neural rendering model, as compared to explicit representation. However, when the implicit neural representation is applied to a two-dimensional visual task, each pixel point is required to be predicted independently, and a large amount of calculation cost and a long running time are required.
Local Implicit Image Function (LIIF), a novel Implicit representation of an Image, uses a multi-layered perceptron to infer pixel values at each coordinate.
In the related art, although the LIIF can provide stable performance in an arbitrary scale super-resolution task of 30 times at maximum, its calculation cost rapidly increases as the magnification increases.
In view of this, referring to fig. 1, an embodiment of the present invention provides an arbitrary scale image representation method based on a dynamic implicit image function, including:
s101, acquiring an image to be processed;
s102, carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional feature map;
s103, inputting the two-dimensional characteristic diagram into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and carrying out pixel value prediction processing through a double-stage multilayer perceptron to obtain an image pixel value.
In the embodiment of the present invention, a Dynamic Implicit Image Function (DIIF) is proposed, which is a fast and effective method for representing an Image at any scale. Referring to FIG. 2, I in Representing the input image, the encoder maps the input image to a two-dimensional feature map as its DIIF representation. Given the resolution of the real image, the hidden code z can be obtained from the two-dimensional characteristic diagram * And coordinate slices around the covert codeWherein X 1st Representing the first coordinate, X, of a coordinate slice last Representing the tail coordinates of the coordinate slice. The decoding function then uses the above information to predict all pixel values of the coordinate slice, i.e. the prediction of pixel values of the coordinates is performed by a two-stage multi-layer perceptron (or called coarse-to-fine multi-layer perceptron), and the slice hidden vector H is predicted by the first stage (coarse stage) * And with the coordinate X to be predicted i The two phases are used as the input of the second phase (fine phase), and the pixel value I of the coordinate to be predicted is output out-i . In the embodiment of the invention, in the training stage, the predicted pixel value I is used out-i And pixel value I of real image gt-i The loss function is calculated, the encoder and the decoding function are jointly trained in the self-supervised super-resolution task, and the learned network parameters are shared by all images. Embodiments of the present invention enable neural networks to perform many-to-many mapping from coordinate slices to pixel value slices by using image coordinate grouping and slicing strategies, rather than individually predicting pixel values at a given coordinate at a time. The embodiment of the invention further provides a two-stage multi-layer Perceptron (C2F-MLP) which is used for executing the dynamic coordinate-based multi-layer PerceptronImage decoding of the slice strategy enables the number of coordinates in each slice to vary with magnification, and DIIF using the dynamic coordinate slice strategy can significantly reduce the computational cost required for large scale super-resolution. Experimental results show that compared with the existing super-resolution method with any scale, the DIIF achieves the best calculation efficiency and super-resolution performance.
Further, as a preferred embodiment, the performing dynamic coordinate slicing processing on the two-dimensional feature map includes:
inputting an image magnification;
acquiring a feature vector from the two-dimensional feature map, determining the feature vector as an implicit code, and performing grouping processing on coordinates in the two-dimensional feature map according to the implicit code to obtain a feature coordinate group;
and slicing the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.
In the embodiment of the invention, a vector is selected from the two-dimensional feature map as the hidden code, and the coordinates closer to the hidden code than other hidden codes in the two-dimensional feature map are grouped according to the hidden code to obtain the feature coordinate group. The hidden code can be shared in one coordinate set by the characteristic coordinate set, so that the decoder can predict all pixel values corresponding to the coordinate set by using the hidden code only once. The number of coordinates in one coordinate set is proportional to the magnification, so the larger the magnification, the more computation cost can be saved. The coordinate grouping requires the decoder to predict all pixel values of the coordinate set at the same time, which places a heavy burden on the decoder when performing super-resolution on a large scale. The embodiment of the invention provides a reasonable solution that the characteristic coordinate set is sliced according to the image magnification to obtain the coordinate slice, one coordinate set is divided into a plurality of coordinate slices, and the hidden code input is only shared in the coordinate slice but not in the whole coordinate set.
Further as a preferred embodiment, the slicing the feature coordinate set according to the image magnification to obtain a coordinate slice includes:
determining slice intervals according to the image magnification;
and dividing the characteristic coordinate group according to the slice interval to obtain a coordinate slice, wherein the coordinate slice is used for sharing the same hidden code for all coordinates in the slice.
Where appropriate slice spacing is set to achieve the best performance and efficiency balance, the simplest approach is to fix coordinate slices, which in any case uses fixed slice spacing. However, this strategy retains the square order increase characteristic of computational cost as magnification increases. In addition, there are two problems of spatial discontinuity and redundant coordinates inside the coordinate slice. To address these problems, embodiments of the present invention propose dynamic coordinate slicing to adjust the slice spacing as the magnification changes. A first strategy that may be employed by embodiments of the present invention is linear-order coordinate slicing, which sets the slice interval to a magnification factor. Using linear order coordinate slices, the computational cost of DIIF increases linearly with increasing magnification. Another strategy is to set the slice interval to the square of the magnification, called constant order coordinate slice. With constant order coordinate slices, the computational cost of DIIF is determined only by the resolution of the input image, which remains constant as the magnification increases. In the embodiment of the invention, the characteristic coordinate groups are divided according to the slicing intervals to obtain the coordinate slices, and the coordinate slices are used for sharing the same hidden code for all coordinates in the slices. Referring to FIG. 3, FIG. 3 is a drawing of coordinate slices grouped at a magnification of 4 and with a slice spacing of 4, Z * Representing a covert code, X 1st Representing the first coordinate, X, of a coordinate slice last Representing the tail coordinates of the coordinate slice.
Further, as a preferred embodiment, the pixel value prediction processing by the two-stage multi-layer perceptron includes:
inputting a coordinate slice and a slice hidden code;
carrying out first-stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;
acquiring a coordinate to be predicted, wherein the coordinate to be predicted is any coordinate in the coordinate slice;
and carrying out second-stage processing on the slice hidden vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.
Among them, in order to implement a dynamic coordinate slicing strategy, a decoder needs to have scalability that uses a non-fixed number of coordinates as input and outputs corresponding pixel values. However, normal MLP only allows the use of fixed length vectors as inputs. To solve this problem, the embodiment of the present invention proposes a dual-stage multi-layered perceptron (C2F-MLP) as a decoder, divided into a first stage (coarse stage) for predicting slice hidden vectors and a second stage (fine stage) for predicting pixel values. In the embodiment of the invention, the hidden layer in the rough stage takes the boundary coordinates of the coordinate slice and the corresponding hidden code as input to generate the hidden vector of the slice. The slice hidden vector contains information of all pixel values in the slice and is used as input for the fine phase. The computational cost of the coarse phase is determined by the number of coordinate slices, which is much smaller than the number of output coordinates due to the dynamic coordinate slicing strategy used. The coarse phase also allows the decoding function to exploit spatial relationships within the slice, which makes its prediction of pixel values more accurate. The concealment layer of the fine phase takes as input the slice concealment vector output by the coarse phase and any coordinates in a given coordinate slice to predict the pixel values at that coordinate. The fine phase is designed to independently predict pixel values at the coordinates to be predicted. The decoding function employed by the refinement stage can be expressed as:
I(X * )=f θ (z * ,[x tl -v * ,…,x rb -v * ]);
wherein I is a pixel value, X * =[x tl ,…,x rb ]Is a given coordinate slice, f θ Is a decoder, z * Is a hidden code corresponding to a coordinate slice, v * Is the coordinate of the covert code, x tl And x rb Respectively the head and tail coordinates of the coordinate slice.
Since the length of the sliced implicit vector is shorter than the length of the implicit code and the number of concealment layers in the fine stage is smaller, the computational cost required for the fine stage of DIIF is significantly lower than for the LIIF decoder.
Further as a preferred embodiment, the dual-stage multi-layer perceptron comprises a hidden layer, which consists of a linear layer and an activation function.
Referring to fig. 4, C2F-MLP divides the decoder into a coarse phase for predicting the slice hidden vectors and a fine phase for predicting the pixel values. The hidden layer of the C2F-MLP consists of a linear layer with a dimension of 256, followed by the ReLU activation function. In the coarse phase, the implicit code z * First coordinate X of a coordinate slice 1st Tail coordinate of coordinate slice X last Taking the area a of the pixel point under the current magnification factor as input, and outputting to obtain a coordinate hidden vector H lt~rb . In the fine stage, a coordinate implicit vector and a coordinate X to be predicted are input I The output is given as I i . To predict the RGB values, the fine phase finally uses an output linear layer of dimension 3.
Further as a preferred embodiment, before the image to be processed is implicitly encoded by the pre-trained encoder to obtain the two-dimensional feature map, the method further includes pre-training the encoder and the dynamic implicit image network, specifically including:
acquiring a training image;
performing pixel prediction processing on the training image through the encoder and the dynamic implicit image network to obtain a predicted pixel value;
determining a pixel loss value according to the pixel value of the training image and the predicted pixel value;
and updating the weight parameters of the encoder and the dynamic implicit image network according to the pixel loss value to obtain the trained encoder and dynamic implicit image network.
In an embodiment of the invention, the training phase uses the predicted pixel values and the pixel values of the real image to calculate the pixel level loss. The encoder and decoding functions are jointly trained in the self-supervised super resolution task, while the learned network parameters are shared by all images.
In another aspect, an embodiment of the present invention further provides a system, including:
the first module is used for acquiring an image to be processed;
the second module is used for carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional characteristic diagram;
and the third module is used for inputting the two-dimensional characteristic diagram into a dynamic implicit image network, performing dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and performing pixel value prediction processing through a double-stage multilayer perceptron to obtain an image pixel value.
Optionally, the third module comprises:
the first submodule is used for carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram;
and the second sub-module is used for carrying out pixel value prediction processing through the two-stage multi-layer perceptron.
Optionally, the first sub-module comprises:
a first unit for inputting an image magnification;
the second unit is used for acquiring a feature vector from the two-dimensional feature map, determining the feature vector as a hidden code, and grouping coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate group;
and the third unit is used for carrying out slicing processing on the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.
Optionally, the second sub-module comprises:
a fourth unit for inputting a coordinate slice and a slice hidden code;
a fifth unit, configured to perform a first-stage processing on the coordinate slice and the slice implicit code to obtain a slice implicit vector;
a sixth unit, configured to acquire a coordinate to be predicted, where the coordinate to be predicted is any coordinate in the coordinate slice;
and the seventh unit is used for carrying out second-stage processing on the slice hidden vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.
The invention provides an arbitrary scale image representation method and system based on a dynamic implicit image function, which are used for quickly and effectively representing an arbitrary scale image. In DIIF, a pixel-based image is represented as a two-dimensional feature map, and a decoding function takes a coordinate slice and a local feature vector as inputs to predict a corresponding set of pixel values. By sharing local feature vectors inside coordinate slices, DIIF can perform large-scale super-resolution reconstruction at very low computational cost. Experimental results show that the super-resolution performance and the calculation efficiency of the DIIF are superior to those of the existing super-resolution method with any scale in all scaling factors. DIIF can save up to 87% of the computational cost compared to LIIF and consistently has better PSNR performance. DIIF can be efficiently applied to scenes that need to present images in real time at an arbitrary resolution. By applying the embodiment of the invention, the arbitrary zooming function in the image viewing/editing software can be realized, the low-resolution image can be amplified and restored, and the high-resolution image can be compressed and stored.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those of ordinary skill in the art will be able to practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. An arbitrary scale image representation method based on a dynamic implicit image function is characterized by comprising the following steps:
acquiring an image to be processed;
carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional characteristic diagram;
and inputting the two-dimensional characteristic diagram into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and carrying out pixel value prediction processing through a double-stage multilayer perceptron to obtain an image pixel value.
2. The method of claim 1, wherein the dynamic coordinate slicing of the two-dimensional feature map comprises:
inputting an image magnification;
acquiring a feature vector from the two-dimensional feature map, determining the feature vector as a hidden code, and grouping coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate group;
and slicing the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.
3. The method of claim 2, wherein said slicing the set of feature coordinates according to the image magnification to obtain coordinate slices comprises:
determining slice intervals according to the image magnification;
and dividing the characteristic coordinate group according to the slice interval to obtain a coordinate slice, wherein the coordinate slice is used for sharing the same hidden code for all coordinates in the slice.
4. The method of claim 1, wherein the processing for pixel value prediction by the dual-stage multi-layer perceptron comprises:
inputting a coordinate slice and a slice hidden code;
carrying out first-stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;
acquiring a coordinate to be predicted, wherein the coordinate to be predicted is any coordinate in the coordinate slice;
and carrying out second-stage processing on the slice hidden vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.
5. The method of claim 1, wherein the dual-stage multi-layered perceptron comprises a hidden layer, the hidden layer consisting of a linear layer and an activation function.
6. The method according to any one of claims 1 to 5, wherein before the performing the implicit coding process on the image to be processed by the pre-trained encoder to obtain the two-dimensional feature map, the method further includes pre-training the encoder and the dynamic implicit image network, specifically including:
acquiring a training image;
performing pixel prediction processing on the training image through the encoder and the dynamic implicit image network to obtain a predicted pixel value;
determining a pixel loss value according to the pixel value of the training image and the predicted pixel value;
and updating the weight parameters of the encoder and the dynamic implicit image network according to the pixel loss value to obtain the trained encoder and the trained dynamic implicit image network.
7. An arbitrary scale image representation system based on a dynamic implicit image function, the system comprising:
the first module is used for acquiring an image to be processed;
the second module is used for carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional characteristic diagram;
and the third module is used for inputting the two-dimensional characteristic diagram into a dynamic implicit image network, performing dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and performing pixel value prediction processing through a double-stage multilayer perceptron to obtain an image pixel value.
8. The system of claim 7, wherein the third module comprises:
the first submodule is used for carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram;
and the second sub-module is used for carrying out pixel value prediction processing through the two-stage multi-layer perceptron.
9. The system of claim 8, wherein the first sub-module comprises:
a first unit for inputting an image magnification;
the second unit is used for acquiring a feature vector from the two-dimensional feature map, determining the feature vector as a hidden code, and performing grouping processing on coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate group;
and the third unit is used for carrying out slicing processing on the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.
10. The system of claim 8, wherein the second sub-module comprises:
a fourth unit for inputting a coordinate slice and a slice hidden code;
a fifth unit, configured to perform first-stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;
a sixth unit, configured to acquire a coordinate to be predicted, where the coordinate to be predicted is any coordinate in the coordinate slice;
and the seventh unit is used for carrying out second-stage processing on the slice implicit vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211590183.8A CN115861343A (en) | 2022-12-12 | 2022-12-12 | Method and system for representing arbitrary scale image based on dynamic implicit image function |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211590183.8A CN115861343A (en) | 2022-12-12 | 2022-12-12 | Method and system for representing arbitrary scale image based on dynamic implicit image function |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115861343A true CN115861343A (en) | 2023-03-28 |
Family
ID=85672081
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211590183.8A Pending CN115861343A (en) | 2022-12-12 | 2022-12-12 | Method and system for representing arbitrary scale image based on dynamic implicit image function |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115861343A (en) |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014197994A1 (en) * | 2013-06-12 | 2014-12-18 | University Health Network | Method and system for automated quality assurance and automated treatment planning in radiation therapy |
US20180260981A1 (en) * | 2017-03-07 | 2018-09-13 | Children's Medical Center Corporation | Registration-based motion tracking for motion-robust imaging |
US20190171476A1 (en) * | 2013-08-20 | 2019-06-06 | Teleputers, Llc | System and Method for Self-Protecting Data |
CN111784570A (en) * | 2019-04-04 | 2020-10-16 | Tcl集团股份有限公司 | Video image super-resolution reconstruction method and device |
KR102193108B1 (en) * | 2019-10-10 | 2020-12-18 | 서울대학교산학협력단 | Observation method for two-dimensional river mixing using RGB image acquired by the unmanned aerial vehicle |
CN112163655A (en) * | 2020-09-30 | 2021-01-01 | 上海麦广互娱文化传媒股份有限公司 | Dynamic implicit two-dimensional code and generation and detection method and device thereof |
CN112419150A (en) * | 2020-11-06 | 2021-02-26 | 中国科学技术大学 | Random multiple image super-resolution reconstruction method based on bilateral up-sampling network |
CN112446489A (en) * | 2020-11-25 | 2021-03-05 | 天津大学 | Dynamic network embedded link prediction method based on variational self-encoder |
WO2021122850A1 (en) * | 2019-12-17 | 2021-06-24 | Canon Kabushiki Kaisha | Method, device, and computer program for improving encapsulation of media content |
WO2021183336A1 (en) * | 2020-03-09 | 2021-09-16 | Schlumberger Technology Corporation | Fast front tracking in eor flooding simulation on coarse grids |
WO2021216747A1 (en) * | 2020-04-21 | 2021-10-28 | Massachusetts Institute Of Technology | Real-Time Photorealistic 3D Holography with Deep Neural Networks |
EP3907695A1 (en) * | 2019-08-14 | 2021-11-10 | Samsung Electronics Co., Ltd. | Electronic apparatus and control method thereof |
CN113689539A (en) * | 2021-07-06 | 2021-11-23 | 清华大学 | Dynamic scene real-time three-dimensional reconstruction method and device based on implicit optical flow field |
CN113947521A (en) * | 2021-10-14 | 2022-01-18 | 展讯通信(上海)有限公司 | Image resolution conversion method and device based on deep neural network and terminal equipment |
US20220027723A1 (en) * | 2020-07-27 | 2022-01-27 | Robert Bosch Gmbh | Hardware compute fabrics for deep equilibrium models |
US11308657B1 (en) * | 2021-08-11 | 2022-04-19 | Neon Evolution Inc. | Methods and systems for image processing using a learning engine |
CN114897912A (en) * | 2022-04-24 | 2022-08-12 | 广东工业大学 | Three-dimensional point cloud segmentation method and system based on enhanced cyclic slicing network |
CN115049556A (en) * | 2022-06-27 | 2022-09-13 | 安徽大学 | StyleGAN-based face image restoration method |
-
2022
- 2022-12-12 CN CN202211590183.8A patent/CN115861343A/en active Pending
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014197994A1 (en) * | 2013-06-12 | 2014-12-18 | University Health Network | Method and system for automated quality assurance and automated treatment planning in radiation therapy |
US20190171476A1 (en) * | 2013-08-20 | 2019-06-06 | Teleputers, Llc | System and Method for Self-Protecting Data |
US20180260981A1 (en) * | 2017-03-07 | 2018-09-13 | Children's Medical Center Corporation | Registration-based motion tracking for motion-robust imaging |
CN111784570A (en) * | 2019-04-04 | 2020-10-16 | Tcl集团股份有限公司 | Video image super-resolution reconstruction method and device |
EP3907695A1 (en) * | 2019-08-14 | 2021-11-10 | Samsung Electronics Co., Ltd. | Electronic apparatus and control method thereof |
KR102193108B1 (en) * | 2019-10-10 | 2020-12-18 | 서울대학교산학협력단 | Observation method for two-dimensional river mixing using RGB image acquired by the unmanned aerial vehicle |
WO2021122850A1 (en) * | 2019-12-17 | 2021-06-24 | Canon Kabushiki Kaisha | Method, device, and computer program for improving encapsulation of media content |
WO2021183336A1 (en) * | 2020-03-09 | 2021-09-16 | Schlumberger Technology Corporation | Fast front tracking in eor flooding simulation on coarse grids |
WO2021216747A1 (en) * | 2020-04-21 | 2021-10-28 | Massachusetts Institute Of Technology | Real-Time Photorealistic 3D Holography with Deep Neural Networks |
US20220027723A1 (en) * | 2020-07-27 | 2022-01-27 | Robert Bosch Gmbh | Hardware compute fabrics for deep equilibrium models |
CN112163655A (en) * | 2020-09-30 | 2021-01-01 | 上海麦广互娱文化传媒股份有限公司 | Dynamic implicit two-dimensional code and generation and detection method and device thereof |
CN112419150A (en) * | 2020-11-06 | 2021-02-26 | 中国科学技术大学 | Random multiple image super-resolution reconstruction method based on bilateral up-sampling network |
CN112446489A (en) * | 2020-11-25 | 2021-03-05 | 天津大学 | Dynamic network embedded link prediction method based on variational self-encoder |
CN113689539A (en) * | 2021-07-06 | 2021-11-23 | 清华大学 | Dynamic scene real-time three-dimensional reconstruction method and device based on implicit optical flow field |
US11308657B1 (en) * | 2021-08-11 | 2022-04-19 | Neon Evolution Inc. | Methods and systems for image processing using a learning engine |
CN113947521A (en) * | 2021-10-14 | 2022-01-18 | 展讯通信(上海)有限公司 | Image resolution conversion method and device based on deep neural network and terminal equipment |
CN114897912A (en) * | 2022-04-24 | 2022-08-12 | 广东工业大学 | Three-dimensional point cloud segmentation method and system based on enhanced cyclic slicing network |
CN115049556A (en) * | 2022-06-27 | 2022-09-13 | 安徽大学 | StyleGAN-based face image restoration method |
Non-Patent Citations (11)
Title |
---|
HUANRONG ZHANG; JIE XIAO; ZHI JIN: ""Multi-scale Image Super-Resolution via A Single Extendable Deep Network"", 《 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING》, 16 December 2020 (2020-12-16) * |
LUKE LOZENSKI; MARK A. ANASTASIO; UMBERTO VILLA: ""A Memory-Efficient Self-Supervised Dynamic Image Reconstruction Method Using Neural Fields"", 《IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING》, 21 September 2022 (2022-09-21) * |
NING NI; HANLIN WU; LIBAO ZHANG: ""A Memory-Efficient Self-Supervised Dynamic Image Reconstruction Method Using Neural Fields"", 《2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)》, 16 October 2022 (2022-10-16) * |
XIN HUANG; QI ZHANG; YING FENG; HONGDONG LI; XUAN WANG; QING WANG: ""HDR-NeRF: High Dynamic Range Neural Radiance Fields"", 《2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》, 27 September 2022 (2022-09-27) * |
XUECAI HU, HAOYUAN MU, XIANGYU ZHANG, ZILEI WANG, TIENIU TAN, AND JIAN SUN.: ""Meta-sr: A magnification-arbitrary network for super-resolution"", 《HTTPS://DOI.ORG/10.48550/ARXIV.1903.00875》, 3 March 2019 (2019-03-03) * |
YINBO CHEN; SIFEI LIU; XIAOLONG WANG: ""Learning Continuous Image Representation with Local Implicit Image Function"", 《2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》, 2 November 2021 (2021-11-02) * |
朱方: ""3D场景表征—神经辐射场(NeRF)近期成果综述"", 《 中国传媒大学学报(自然科学版) 》, 20 October 2022 (2022-10-20) * |
李哲远;陈翔宇;乔宇;董超;井焜: ""注意力机制在单图像超分辨率中的分析研究"", 《集成技术》, 15 September 2022 (2022-09-15) * |
李征;金迪;黄雪原;袁科: ""基于隐式反馈的推荐研究综述"", 《河南大学学报(自然科学版)》, 16 May 2022 (2022-05-16) * |
王亚刚;王萌;: "基于Softplus+HKELM的彩色图像超分辨率算法", 计算机与数字工程, no. 01, 20 January 2020 (2020-01-20) * |
程德强;蔡迎春;陈亮亮;宋玉龙;: "边缘修正的多尺度卷积神经网络重建算法", 激光与光电子学进展, no. 09, 28 March 2018 (2018-03-28) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110782490B (en) | Video depth map estimation method and device with space-time consistency | |
CN113034380B (en) | Video space-time super-resolution method and device based on improved deformable convolution correction | |
CN109756690B (en) | Light-weight video interpolation method based on feature-level optical flow | |
CN110533712A (en) | A kind of binocular solid matching process based on convolutional neural networks | |
CN108491763B (en) | Unsupervised training method and device for three-dimensional scene recognition network and storage medium | |
CN111062395B (en) | Real-time video semantic segmentation method | |
CN112396645B (en) | Monocular image depth estimation method and system based on convolution residual learning | |
CN114731408A (en) | System, device and method for video frame interpolation using structured neural network | |
CN108776971A (en) | A kind of variation light stream based on layering nearest-neighbor determines method and system | |
CN109903315B (en) | Method, apparatus, device and readable storage medium for optical flow prediction | |
CN115294282A (en) | Monocular depth estimation system and method for enhancing feature fusion in three-dimensional scene reconstruction | |
CN115187638B (en) | Unsupervised monocular depth estimation method based on optical flow mask | |
CN113095254A (en) | Method and system for positioning key points of human body part | |
CN115272437A (en) | Image depth estimation method and device based on global and local features | |
CN115205150A (en) | Image deblurring method, device, equipment, medium and computer program product | |
CN117115786B (en) | Depth estimation model training method for joint segmentation tracking and application method | |
CN114359554A (en) | Image semantic segmentation method based on multi-receptive-field context semantic information | |
CN115861343A (en) | Method and system for representing arbitrary scale image based on dynamic implicit image function | |
CN116452599A (en) | Contour-based image instance segmentation method and system | |
CN114037731A (en) | Neural network optical flow estimation method, device and medium realized by FPGA | |
CN110490235B (en) | Vehicle object viewpoint prediction and three-dimensional model recovery method and device facing 2D image | |
Chen et al. | Adaptive hybrid composition based super-resolution network via fine-grained channel pruning | |
KR102057395B1 (en) | Video generation method using video extrapolation based on machine learning | |
Lee et al. | Qff: Quantized fourier features for neural field representations | |
Liu et al. | Building effective large-scale traffic state prediction system: Traffic4cast challenge solution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |