CN115861343A - Method and system for representing arbitrary scale image based on dynamic implicit image function - Google Patents

Method and system for representing arbitrary scale image based on dynamic implicit image function Download PDF

Info

Publication number
CN115861343A
CN115861343A CN202211590183.8A CN202211590183A CN115861343A CN 115861343 A CN115861343 A CN 115861343A CN 202211590183 A CN202211590183 A CN 202211590183A CN 115861343 A CN115861343 A CN 115861343A
Authority
CN
China
Prior art keywords
coordinate
image
slice
processing
dynamic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211590183.8A
Other languages
Chinese (zh)
Inventor
金枝
何宗耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Sun Yat Sen University Shenzhen Campus
Original Assignee
Sun Yat Sen University
Sun Yat Sen University Shenzhen Campus
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University, Sun Yat Sen University Shenzhen Campus filed Critical Sun Yat Sen University
Priority to CN202211590183.8A priority Critical patent/CN115861343A/en
Publication of CN115861343A publication Critical patent/CN115861343A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses an arbitrary scale image representation method and system based on a dynamic implicit image function, wherein the method comprises the steps of obtaining an image to be processed; carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional characteristic diagram; and inputting the two-dimensional characteristic diagram into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and carrying out pixel value prediction processing through a double-stage multilayer perceptron to obtain an image pixel value. The embodiment of the invention can reduce the calculation cost of continuous representation of the image, improve the processing performance and can be widely applied to the technical field of artificial intelligence.

Description

Method and system for representing arbitrary scale image based on dynamic implicit image function
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an arbitrary scale image representation method and system based on a dynamic implicit image function.
Background
Digital images are two-dimensional representations of the real world in the digital world, but the continuous physical world is often quantified in a sensor while stored in a computer as a matrix of discrete pixels. If the images can be expressed in a continuous form, images of arbitrary resolution can be acquired in a continuous space, thereby ensuring the accuracy of the described scene of the image. Although the continuous representation method for the images in the related art has excellent performance in the aspect of continuous image representation, the calculation cost increases in a square order along with the increase of the image magnification, so that super-resolution reconstruction at any scale is time-consuming and huge. In view of the above, there is a need to solve the technical problems in the related art.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and a system for representing an image with an arbitrary scale based on a dynamic implicit image function, so as to reduce the computation cost and improve the processing performance.
In one aspect, the present invention provides a method for representing an image with an arbitrary scale based on a dynamic implicit image function, including:
acquiring an image to be processed;
carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional characteristic diagram;
and inputting the two-dimensional characteristic diagram into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and carrying out pixel value prediction processing through a two-stage multilayer perceptron to obtain an image pixel value.
Optionally, the performing dynamic coordinate slicing processing on the two-dimensional feature map includes:
inputting an image magnification;
acquiring a feature vector from the two-dimensional feature map, determining the feature vector as a hidden code, and grouping coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate group;
and slicing the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.
Optionally, the slicing the feature coordinate set according to the image magnification to obtain a coordinate slice includes:
determining slice intervals according to the image magnification;
and dividing the characteristic coordinate group according to the slice interval to obtain a coordinate slice, wherein the coordinate slice is used for sharing the same hidden code for all coordinates in the slice.
Optionally, the performing, by the dual-stage multi-layer perceptron, pixel value prediction processing includes:
inputting a coordinate slice and a slice hidden code;
carrying out first-stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;
acquiring a coordinate to be predicted, wherein the coordinate to be predicted is any coordinate in the coordinate slice;
and carrying out second-stage processing on the slice implicit vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.
Optionally, the dual-stage multi-layer perceptron comprises a hidden layer, the hidden layer consisting of a linear layer and an activation function.
Optionally, before the pre-trained encoder performs implicit encoding processing on the image to be processed to obtain the two-dimensional feature map, the method further includes pre-training the encoder and the dynamic implicit image network, specifically including:
acquiring a training image;
performing pixel prediction processing on the training image through the encoder and the dynamic implicit image network to obtain a predicted pixel value;
determining a pixel loss value according to the pixel value of the training image and the predicted pixel value;
and updating the weight parameters of the encoder and the dynamic implicit image network according to the pixel loss value to obtain the trained encoder and the trained dynamic implicit image network.
In another aspect, an embodiment of the present invention further provides a system, including:
the first module is used for acquiring an image to be processed;
the second module is used for carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional characteristic diagram;
and the third module is used for inputting the two-dimensional characteristic diagram into a dynamic implicit image network, performing dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and performing pixel value prediction processing through a double-stage multilayer perceptron to obtain an image pixel value.
Optionally, the third module comprises:
the first submodule is used for carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram;
and the second sub-module is used for carrying out pixel value prediction processing through the two-stage multi-layer perceptron.
Optionally, the first sub-module comprises:
a first unit for inputting an image magnification;
the second unit is used for acquiring a feature vector from the two-dimensional feature map, determining the feature vector as a hidden code, and performing grouping processing on coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate group;
and the third unit is used for carrying out slicing processing on the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.
Optionally, the second sub-module comprises:
a fourth unit for inputting a coordinate slice and a slice hidden code;
a fifth unit, configured to perform first-stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;
a sixth unit, configured to acquire a coordinate to be predicted, where the coordinate to be predicted is any coordinate in the coordinate slice;
and the seventh unit is used for carrying out second-stage processing on the slice hidden vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects: in the embodiment of the invention, the two-dimensional characteristic diagram is input into a dynamic implicit image network, and dynamic coordinate slicing processing is carried out on the two-dimensional characteristic diagram, so that the neural network can execute many-to-many mapping from coordinate slices to pixel value slices, a decoder can predict all pixel values corresponding to the coordinate slices by using an implicit code only once, and the calculation cost is reduced; and the pixel value prediction processing is carried out through the two-stage multilayer perceptron to obtain the pixel value of the image, so that the decoder can use the coordinates with non-fixed number as input, thereby reducing the number of hidden layers and improving the processing performance.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart of an arbitrary scale image representation method based on a dynamic implicit image function according to an embodiment of the present application;
FIG. 2 is an overall frame diagram of a dynamic implicit image function provided in an embodiment of the present application;
FIG. 3 is an exemplary diagram of a coordinate slice provided by an embodiment of the present application;
fig. 4 is a structural diagram of a two-stage multilayer sensor according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The method and the system for representing the image with any scale based on the dynamic implicit image function provided by the embodiment of the application mainly relate to the artificial intelligence technology. Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence basic technology generally comprises technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, a big data processing technology, an operation/interaction system, electromechanical integration and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Specifically, the method and system for representing an image at any scale based on a dynamic implicit image function provided in the embodiment of the present application may employ a computer vision technique and a machine learning/deep learning technique in the field of artificial intelligence to analyze and process the image, so as to obtain a continuous image representation of the image. It can be understood that, for different tasks, the methods provided in the embodiments of the present application may all be executed in application scenarios of corresponding artificial intelligence systems; in addition, the specific time for executing the methods can be in any link in the operation flow of the artificial intelligence system.
Implicit neural representation techniques implicit neural representation can capture details of an object with a small number of parameters, and its differentiable nature allows back propagation through a neural rendering model, as compared to explicit representation. However, when the implicit neural representation is applied to a two-dimensional visual task, each pixel point is required to be predicted independently, and a large amount of calculation cost and a long running time are required.
Local Implicit Image Function (LIIF), a novel Implicit representation of an Image, uses a multi-layered perceptron to infer pixel values at each coordinate.
In the related art, although the LIIF can provide stable performance in an arbitrary scale super-resolution task of 30 times at maximum, its calculation cost rapidly increases as the magnification increases.
In view of this, referring to fig. 1, an embodiment of the present invention provides an arbitrary scale image representation method based on a dynamic implicit image function, including:
s101, acquiring an image to be processed;
s102, carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional feature map;
s103, inputting the two-dimensional characteristic diagram into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and carrying out pixel value prediction processing through a double-stage multilayer perceptron to obtain an image pixel value.
In the embodiment of the present invention, a Dynamic Implicit Image Function (DIIF) is proposed, which is a fast and effective method for representing an Image at any scale. Referring to FIG. 2, I in Representing the input image, the encoder maps the input image to a two-dimensional feature map as its DIIF representation. Given the resolution of the real image, the hidden code z can be obtained from the two-dimensional characteristic diagram * And coordinate slices around the covert code
Figure BDA0003993855330000061
Wherein X 1st Representing the first coordinate, X, of a coordinate slice last Representing the tail coordinates of the coordinate slice. The decoding function then uses the above information to predict all pixel values of the coordinate slice, i.e. the prediction of pixel values of the coordinates is performed by a two-stage multi-layer perceptron (or called coarse-to-fine multi-layer perceptron), and the slice hidden vector H is predicted by the first stage (coarse stage) * And with the coordinate X to be predicted i The two phases are used as the input of the second phase (fine phase), and the pixel value I of the coordinate to be predicted is output out-i . In the embodiment of the invention, in the training stage, the predicted pixel value I is used out-i And pixel value I of real image gt-i The loss function is calculated, the encoder and the decoding function are jointly trained in the self-supervised super-resolution task, and the learned network parameters are shared by all images. Embodiments of the present invention enable neural networks to perform many-to-many mapping from coordinate slices to pixel value slices by using image coordinate grouping and slicing strategies, rather than individually predicting pixel values at a given coordinate at a time. The embodiment of the invention further provides a two-stage multi-layer Perceptron (C2F-MLP) which is used for executing the dynamic coordinate-based multi-layer PerceptronImage decoding of the slice strategy enables the number of coordinates in each slice to vary with magnification, and DIIF using the dynamic coordinate slice strategy can significantly reduce the computational cost required for large scale super-resolution. Experimental results show that compared with the existing super-resolution method with any scale, the DIIF achieves the best calculation efficiency and super-resolution performance.
Further, as a preferred embodiment, the performing dynamic coordinate slicing processing on the two-dimensional feature map includes:
inputting an image magnification;
acquiring a feature vector from the two-dimensional feature map, determining the feature vector as an implicit code, and performing grouping processing on coordinates in the two-dimensional feature map according to the implicit code to obtain a feature coordinate group;
and slicing the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.
In the embodiment of the invention, a vector is selected from the two-dimensional feature map as the hidden code, and the coordinates closer to the hidden code than other hidden codes in the two-dimensional feature map are grouped according to the hidden code to obtain the feature coordinate group. The hidden code can be shared in one coordinate set by the characteristic coordinate set, so that the decoder can predict all pixel values corresponding to the coordinate set by using the hidden code only once. The number of coordinates in one coordinate set is proportional to the magnification, so the larger the magnification, the more computation cost can be saved. The coordinate grouping requires the decoder to predict all pixel values of the coordinate set at the same time, which places a heavy burden on the decoder when performing super-resolution on a large scale. The embodiment of the invention provides a reasonable solution that the characteristic coordinate set is sliced according to the image magnification to obtain the coordinate slice, one coordinate set is divided into a plurality of coordinate slices, and the hidden code input is only shared in the coordinate slice but not in the whole coordinate set.
Further as a preferred embodiment, the slicing the feature coordinate set according to the image magnification to obtain a coordinate slice includes:
determining slice intervals according to the image magnification;
and dividing the characteristic coordinate group according to the slice interval to obtain a coordinate slice, wherein the coordinate slice is used for sharing the same hidden code for all coordinates in the slice.
Where appropriate slice spacing is set to achieve the best performance and efficiency balance, the simplest approach is to fix coordinate slices, which in any case uses fixed slice spacing. However, this strategy retains the square order increase characteristic of computational cost as magnification increases. In addition, there are two problems of spatial discontinuity and redundant coordinates inside the coordinate slice. To address these problems, embodiments of the present invention propose dynamic coordinate slicing to adjust the slice spacing as the magnification changes. A first strategy that may be employed by embodiments of the present invention is linear-order coordinate slicing, which sets the slice interval to a magnification factor. Using linear order coordinate slices, the computational cost of DIIF increases linearly with increasing magnification. Another strategy is to set the slice interval to the square of the magnification, called constant order coordinate slice. With constant order coordinate slices, the computational cost of DIIF is determined only by the resolution of the input image, which remains constant as the magnification increases. In the embodiment of the invention, the characteristic coordinate groups are divided according to the slicing intervals to obtain the coordinate slices, and the coordinate slices are used for sharing the same hidden code for all coordinates in the slices. Referring to FIG. 3, FIG. 3 is a drawing of coordinate slices grouped at a magnification of 4 and with a slice spacing of 4, Z * Representing a covert code, X 1st Representing the first coordinate, X, of a coordinate slice last Representing the tail coordinates of the coordinate slice.
Further, as a preferred embodiment, the pixel value prediction processing by the two-stage multi-layer perceptron includes:
inputting a coordinate slice and a slice hidden code;
carrying out first-stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;
acquiring a coordinate to be predicted, wherein the coordinate to be predicted is any coordinate in the coordinate slice;
and carrying out second-stage processing on the slice hidden vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.
Among them, in order to implement a dynamic coordinate slicing strategy, a decoder needs to have scalability that uses a non-fixed number of coordinates as input and outputs corresponding pixel values. However, normal MLP only allows the use of fixed length vectors as inputs. To solve this problem, the embodiment of the present invention proposes a dual-stage multi-layered perceptron (C2F-MLP) as a decoder, divided into a first stage (coarse stage) for predicting slice hidden vectors and a second stage (fine stage) for predicting pixel values. In the embodiment of the invention, the hidden layer in the rough stage takes the boundary coordinates of the coordinate slice and the corresponding hidden code as input to generate the hidden vector of the slice. The slice hidden vector contains information of all pixel values in the slice and is used as input for the fine phase. The computational cost of the coarse phase is determined by the number of coordinate slices, which is much smaller than the number of output coordinates due to the dynamic coordinate slicing strategy used. The coarse phase also allows the decoding function to exploit spatial relationships within the slice, which makes its prediction of pixel values more accurate. The concealment layer of the fine phase takes as input the slice concealment vector output by the coarse phase and any coordinates in a given coordinate slice to predict the pixel values at that coordinate. The fine phase is designed to independently predict pixel values at the coordinates to be predicted. The decoding function employed by the refinement stage can be expressed as:
I(X * )=f θ (z * ,[x tl -v * ,…,x rb -v * ]);
wherein I is a pixel value, X * =[x tl ,…,x rb ]Is a given coordinate slice, f θ Is a decoder, z * Is a hidden code corresponding to a coordinate slice, v * Is the coordinate of the covert code, x tl And x rb Respectively the head and tail coordinates of the coordinate slice.
Since the length of the sliced implicit vector is shorter than the length of the implicit code and the number of concealment layers in the fine stage is smaller, the computational cost required for the fine stage of DIIF is significantly lower than for the LIIF decoder.
Further as a preferred embodiment, the dual-stage multi-layer perceptron comprises a hidden layer, which consists of a linear layer and an activation function.
Referring to fig. 4, C2F-MLP divides the decoder into a coarse phase for predicting the slice hidden vectors and a fine phase for predicting the pixel values. The hidden layer of the C2F-MLP consists of a linear layer with a dimension of 256, followed by the ReLU activation function. In the coarse phase, the implicit code z * First coordinate X of a coordinate slice 1st Tail coordinate of coordinate slice X last Taking the area a of the pixel point under the current magnification factor as input, and outputting to obtain a coordinate hidden vector H lt~rb . In the fine stage, a coordinate implicit vector and a coordinate X to be predicted are input I The output is given as I i . To predict the RGB values, the fine phase finally uses an output linear layer of dimension 3.
Further as a preferred embodiment, before the image to be processed is implicitly encoded by the pre-trained encoder to obtain the two-dimensional feature map, the method further includes pre-training the encoder and the dynamic implicit image network, specifically including:
acquiring a training image;
performing pixel prediction processing on the training image through the encoder and the dynamic implicit image network to obtain a predicted pixel value;
determining a pixel loss value according to the pixel value of the training image and the predicted pixel value;
and updating the weight parameters of the encoder and the dynamic implicit image network according to the pixel loss value to obtain the trained encoder and dynamic implicit image network.
In an embodiment of the invention, the training phase uses the predicted pixel values and the pixel values of the real image to calculate the pixel level loss. The encoder and decoding functions are jointly trained in the self-supervised super resolution task, while the learned network parameters are shared by all images.
In another aspect, an embodiment of the present invention further provides a system, including:
the first module is used for acquiring an image to be processed;
the second module is used for carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional characteristic diagram;
and the third module is used for inputting the two-dimensional characteristic diagram into a dynamic implicit image network, performing dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and performing pixel value prediction processing through a double-stage multilayer perceptron to obtain an image pixel value.
Optionally, the third module comprises:
the first submodule is used for carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram;
and the second sub-module is used for carrying out pixel value prediction processing through the two-stage multi-layer perceptron.
Optionally, the first sub-module comprises:
a first unit for inputting an image magnification;
the second unit is used for acquiring a feature vector from the two-dimensional feature map, determining the feature vector as a hidden code, and grouping coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate group;
and the third unit is used for carrying out slicing processing on the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.
Optionally, the second sub-module comprises:
a fourth unit for inputting a coordinate slice and a slice hidden code;
a fifth unit, configured to perform a first-stage processing on the coordinate slice and the slice implicit code to obtain a slice implicit vector;
a sixth unit, configured to acquire a coordinate to be predicted, where the coordinate to be predicted is any coordinate in the coordinate slice;
and the seventh unit is used for carrying out second-stage processing on the slice hidden vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.
The invention provides an arbitrary scale image representation method and system based on a dynamic implicit image function, which are used for quickly and effectively representing an arbitrary scale image. In DIIF, a pixel-based image is represented as a two-dimensional feature map, and a decoding function takes a coordinate slice and a local feature vector as inputs to predict a corresponding set of pixel values. By sharing local feature vectors inside coordinate slices, DIIF can perform large-scale super-resolution reconstruction at very low computational cost. Experimental results show that the super-resolution performance and the calculation efficiency of the DIIF are superior to those of the existing super-resolution method with any scale in all scaling factors. DIIF can save up to 87% of the computational cost compared to LIIF and consistently has better PSNR performance. DIIF can be efficiently applied to scenes that need to present images in real time at an arbitrary resolution. By applying the embodiment of the invention, the arbitrary zooming function in the image viewing/editing software can be realized, the low-resolution image can be amplified and restored, and the high-resolution image can be compressed and stored.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those of ordinary skill in the art will be able to practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. An arbitrary scale image representation method based on a dynamic implicit image function is characterized by comprising the following steps:
acquiring an image to be processed;
carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional characteristic diagram;
and inputting the two-dimensional characteristic diagram into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and carrying out pixel value prediction processing through a double-stage multilayer perceptron to obtain an image pixel value.
2. The method of claim 1, wherein the dynamic coordinate slicing of the two-dimensional feature map comprises:
inputting an image magnification;
acquiring a feature vector from the two-dimensional feature map, determining the feature vector as a hidden code, and grouping coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate group;
and slicing the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.
3. The method of claim 2, wherein said slicing the set of feature coordinates according to the image magnification to obtain coordinate slices comprises:
determining slice intervals according to the image magnification;
and dividing the characteristic coordinate group according to the slice interval to obtain a coordinate slice, wherein the coordinate slice is used for sharing the same hidden code for all coordinates in the slice.
4. The method of claim 1, wherein the processing for pixel value prediction by the dual-stage multi-layer perceptron comprises:
inputting a coordinate slice and a slice hidden code;
carrying out first-stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;
acquiring a coordinate to be predicted, wherein the coordinate to be predicted is any coordinate in the coordinate slice;
and carrying out second-stage processing on the slice hidden vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.
5. The method of claim 1, wherein the dual-stage multi-layered perceptron comprises a hidden layer, the hidden layer consisting of a linear layer and an activation function.
6. The method according to any one of claims 1 to 5, wherein before the performing the implicit coding process on the image to be processed by the pre-trained encoder to obtain the two-dimensional feature map, the method further includes pre-training the encoder and the dynamic implicit image network, specifically including:
acquiring a training image;
performing pixel prediction processing on the training image through the encoder and the dynamic implicit image network to obtain a predicted pixel value;
determining a pixel loss value according to the pixel value of the training image and the predicted pixel value;
and updating the weight parameters of the encoder and the dynamic implicit image network according to the pixel loss value to obtain the trained encoder and the trained dynamic implicit image network.
7. An arbitrary scale image representation system based on a dynamic implicit image function, the system comprising:
the first module is used for acquiring an image to be processed;
the second module is used for carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional characteristic diagram;
and the third module is used for inputting the two-dimensional characteristic diagram into a dynamic implicit image network, performing dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and performing pixel value prediction processing through a double-stage multilayer perceptron to obtain an image pixel value.
8. The system of claim 7, wherein the third module comprises:
the first submodule is used for carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram;
and the second sub-module is used for carrying out pixel value prediction processing through the two-stage multi-layer perceptron.
9. The system of claim 8, wherein the first sub-module comprises:
a first unit for inputting an image magnification;
the second unit is used for acquiring a feature vector from the two-dimensional feature map, determining the feature vector as a hidden code, and performing grouping processing on coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate group;
and the third unit is used for carrying out slicing processing on the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.
10. The system of claim 8, wherein the second sub-module comprises:
a fourth unit for inputting a coordinate slice and a slice hidden code;
a fifth unit, configured to perform first-stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;
a sixth unit, configured to acquire a coordinate to be predicted, where the coordinate to be predicted is any coordinate in the coordinate slice;
and the seventh unit is used for carrying out second-stage processing on the slice implicit vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.
CN202211590183.8A 2022-12-12 2022-12-12 Method and system for representing arbitrary scale image based on dynamic implicit image function Pending CN115861343A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211590183.8A CN115861343A (en) 2022-12-12 2022-12-12 Method and system for representing arbitrary scale image based on dynamic implicit image function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211590183.8A CN115861343A (en) 2022-12-12 2022-12-12 Method and system for representing arbitrary scale image based on dynamic implicit image function

Publications (1)

Publication Number Publication Date
CN115861343A true CN115861343A (en) 2023-03-28

Family

ID=85672081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211590183.8A Pending CN115861343A (en) 2022-12-12 2022-12-12 Method and system for representing arbitrary scale image based on dynamic implicit image function

Country Status (1)

Country Link
CN (1) CN115861343A (en)

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014197994A1 (en) * 2013-06-12 2014-12-18 University Health Network Method and system for automated quality assurance and automated treatment planning in radiation therapy
US20180260981A1 (en) * 2017-03-07 2018-09-13 Children's Medical Center Corporation Registration-based motion tracking for motion-robust imaging
US20190171476A1 (en) * 2013-08-20 2019-06-06 Teleputers, Llc System and Method for Self-Protecting Data
CN111784570A (en) * 2019-04-04 2020-10-16 Tcl集团股份有限公司 Video image super-resolution reconstruction method and device
KR102193108B1 (en) * 2019-10-10 2020-12-18 서울대학교산학협력단 Observation method for two-dimensional river mixing using RGB image acquired by the unmanned aerial vehicle
CN112163655A (en) * 2020-09-30 2021-01-01 上海麦广互娱文化传媒股份有限公司 Dynamic implicit two-dimensional code and generation and detection method and device thereof
CN112419150A (en) * 2020-11-06 2021-02-26 中国科学技术大学 Random multiple image super-resolution reconstruction method based on bilateral up-sampling network
CN112446489A (en) * 2020-11-25 2021-03-05 天津大学 Dynamic network embedded link prediction method based on variational self-encoder
WO2021122850A1 (en) * 2019-12-17 2021-06-24 Canon Kabushiki Kaisha Method, device, and computer program for improving encapsulation of media content
WO2021183336A1 (en) * 2020-03-09 2021-09-16 Schlumberger Technology Corporation Fast front tracking in eor flooding simulation on coarse grids
WO2021216747A1 (en) * 2020-04-21 2021-10-28 Massachusetts Institute Of Technology Real-Time Photorealistic 3D Holography with Deep Neural Networks
EP3907695A1 (en) * 2019-08-14 2021-11-10 Samsung Electronics Co., Ltd. Electronic apparatus and control method thereof
CN113689539A (en) * 2021-07-06 2021-11-23 清华大学 Dynamic scene real-time three-dimensional reconstruction method and device based on implicit optical flow field
CN113947521A (en) * 2021-10-14 2022-01-18 展讯通信(上海)有限公司 Image resolution conversion method and device based on deep neural network and terminal equipment
US20220027723A1 (en) * 2020-07-27 2022-01-27 Robert Bosch Gmbh Hardware compute fabrics for deep equilibrium models
US11308657B1 (en) * 2021-08-11 2022-04-19 Neon Evolution Inc. Methods and systems for image processing using a learning engine
CN114897912A (en) * 2022-04-24 2022-08-12 广东工业大学 Three-dimensional point cloud segmentation method and system based on enhanced cyclic slicing network
CN115049556A (en) * 2022-06-27 2022-09-13 安徽大学 StyleGAN-based face image restoration method

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014197994A1 (en) * 2013-06-12 2014-12-18 University Health Network Method and system for automated quality assurance and automated treatment planning in radiation therapy
US20190171476A1 (en) * 2013-08-20 2019-06-06 Teleputers, Llc System and Method for Self-Protecting Data
US20180260981A1 (en) * 2017-03-07 2018-09-13 Children's Medical Center Corporation Registration-based motion tracking for motion-robust imaging
CN111784570A (en) * 2019-04-04 2020-10-16 Tcl集团股份有限公司 Video image super-resolution reconstruction method and device
EP3907695A1 (en) * 2019-08-14 2021-11-10 Samsung Electronics Co., Ltd. Electronic apparatus and control method thereof
KR102193108B1 (en) * 2019-10-10 2020-12-18 서울대학교산학협력단 Observation method for two-dimensional river mixing using RGB image acquired by the unmanned aerial vehicle
WO2021122850A1 (en) * 2019-12-17 2021-06-24 Canon Kabushiki Kaisha Method, device, and computer program for improving encapsulation of media content
WO2021183336A1 (en) * 2020-03-09 2021-09-16 Schlumberger Technology Corporation Fast front tracking in eor flooding simulation on coarse grids
WO2021216747A1 (en) * 2020-04-21 2021-10-28 Massachusetts Institute Of Technology Real-Time Photorealistic 3D Holography with Deep Neural Networks
US20220027723A1 (en) * 2020-07-27 2022-01-27 Robert Bosch Gmbh Hardware compute fabrics for deep equilibrium models
CN112163655A (en) * 2020-09-30 2021-01-01 上海麦广互娱文化传媒股份有限公司 Dynamic implicit two-dimensional code and generation and detection method and device thereof
CN112419150A (en) * 2020-11-06 2021-02-26 中国科学技术大学 Random multiple image super-resolution reconstruction method based on bilateral up-sampling network
CN112446489A (en) * 2020-11-25 2021-03-05 天津大学 Dynamic network embedded link prediction method based on variational self-encoder
CN113689539A (en) * 2021-07-06 2021-11-23 清华大学 Dynamic scene real-time three-dimensional reconstruction method and device based on implicit optical flow field
US11308657B1 (en) * 2021-08-11 2022-04-19 Neon Evolution Inc. Methods and systems for image processing using a learning engine
CN113947521A (en) * 2021-10-14 2022-01-18 展讯通信(上海)有限公司 Image resolution conversion method and device based on deep neural network and terminal equipment
CN114897912A (en) * 2022-04-24 2022-08-12 广东工业大学 Three-dimensional point cloud segmentation method and system based on enhanced cyclic slicing network
CN115049556A (en) * 2022-06-27 2022-09-13 安徽大学 StyleGAN-based face image restoration method

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
HUANRONG ZHANG; JIE XIAO; ZHI JIN: ""Multi-scale Image Super-Resolution via A Single Extendable Deep Network"", 《 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING》, 16 December 2020 (2020-12-16) *
LUKE LOZENSKI; MARK A. ANASTASIO; UMBERTO VILLA: ""A Memory-Efficient Self-Supervised Dynamic Image Reconstruction Method Using Neural Fields"", 《IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING》, 21 September 2022 (2022-09-21) *
NING NI; HANLIN WU; LIBAO ZHANG: ""A Memory-Efficient Self-Supervised Dynamic Image Reconstruction Method Using Neural Fields"", 《2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)》, 16 October 2022 (2022-10-16) *
XIN HUANG; QI ZHANG; YING FENG; HONGDONG LI; XUAN WANG; QING WANG: ""HDR-NeRF: High Dynamic Range Neural Radiance Fields"", 《2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》, 27 September 2022 (2022-09-27) *
XUECAI HU, HAOYUAN MU, XIANGYU ZHANG, ZILEI WANG, TIENIU TAN, AND JIAN SUN.: ""Meta-sr: A magnification-arbitrary network for super-resolution"", 《HTTPS://DOI.ORG/10.48550/ARXIV.1903.00875》, 3 March 2019 (2019-03-03) *
YINBO CHEN; SIFEI LIU; XIAOLONG WANG: ""Learning Continuous Image Representation with Local Implicit Image Function"", 《2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》, 2 November 2021 (2021-11-02) *
朱方: ""3D场景表征—神经辐射场(NeRF)近期成果综述"", 《 中国传媒大学学报(自然科学版) 》, 20 October 2022 (2022-10-20) *
李哲远;陈翔宇;乔宇;董超;井焜: ""注意力机制在单图像超分辨率中的分析研究"", 《集成技术》, 15 September 2022 (2022-09-15) *
李征;金迪;黄雪原;袁科: ""基于隐式反馈的推荐研究综述"", 《河南大学学报(自然科学版)》, 16 May 2022 (2022-05-16) *
王亚刚;王萌;: "基于Softplus+HKELM的彩色图像超分辨率算法", 计算机与数字工程, no. 01, 20 January 2020 (2020-01-20) *
程德强;蔡迎春;陈亮亮;宋玉龙;: "边缘修正的多尺度卷积神经网络重建算法", 激光与光电子学进展, no. 09, 28 March 2018 (2018-03-28) *

Similar Documents

Publication Publication Date Title
CN110782490B (en) Video depth map estimation method and device with space-time consistency
CN113034380B (en) Video space-time super-resolution method and device based on improved deformable convolution correction
CN109756690B (en) Light-weight video interpolation method based on feature-level optical flow
CN110533712A (en) A kind of binocular solid matching process based on convolutional neural networks
CN108491763B (en) Unsupervised training method and device for three-dimensional scene recognition network and storage medium
CN111062395B (en) Real-time video semantic segmentation method
CN112396645B (en) Monocular image depth estimation method and system based on convolution residual learning
CN114731408A (en) System, device and method for video frame interpolation using structured neural network
CN108776971A (en) A kind of variation light stream based on layering nearest-neighbor determines method and system
CN109903315B (en) Method, apparatus, device and readable storage medium for optical flow prediction
CN115294282A (en) Monocular depth estimation system and method for enhancing feature fusion in three-dimensional scene reconstruction
CN115187638B (en) Unsupervised monocular depth estimation method based on optical flow mask
CN113095254A (en) Method and system for positioning key points of human body part
CN115272437A (en) Image depth estimation method and device based on global and local features
CN115205150A (en) Image deblurring method, device, equipment, medium and computer program product
CN117115786B (en) Depth estimation model training method for joint segmentation tracking and application method
CN114359554A (en) Image semantic segmentation method based on multi-receptive-field context semantic information
CN115861343A (en) Method and system for representing arbitrary scale image based on dynamic implicit image function
CN116452599A (en) Contour-based image instance segmentation method and system
CN114037731A (en) Neural network optical flow estimation method, device and medium realized by FPGA
CN110490235B (en) Vehicle object viewpoint prediction and three-dimensional model recovery method and device facing 2D image
Chen et al. Adaptive hybrid composition based super-resolution network via fine-grained channel pruning
KR102057395B1 (en) Video generation method using video extrapolation based on machine learning
Lee et al. Qff: Quantized fourier features for neural field representations
Liu et al. Building effective large-scale traffic state prediction system: Traffic4cast challenge solution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination