CN115861343A

CN115861343A - Method and system for representing arbitrary scale image based on dynamic implicit image function

Info

Publication number: CN115861343A
Application number: CN202211590183.8A
Authority: CN
Inventors: 金枝; 何宗耀
Original assignee: Sun Yat Sen University; Sun Yat Sen University Shenzhen Campus
Current assignee: Sun Yat Sen University; Sun Yat Sen University Shenzhen Campus
Priority date: 2022-12-12
Filing date: 2022-12-12
Publication date: 2023-03-28

Abstract

The invention discloses an arbitrary scale image representation method and system based on a dynamic implicit image function, wherein the method comprises the steps of obtaining an image to be processed; carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional characteristic diagram; and inputting the two-dimensional characteristic diagram into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and carrying out pixel value prediction processing through a double-stage multilayer perceptron to obtain an image pixel value. The embodiment of the invention can reduce the calculation cost of continuous representation of the image, improve the processing performance and can be widely applied to the technical field of artificial intelligence.

Description

Method and system for representing arbitrary scale image based on dynamic implicit image function

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an arbitrary scale image representation method and system based on a dynamic implicit image function.

Background

Digital images are two-dimensional representations of the real world in the digital world, but the continuous physical world is often quantified in a sensor while stored in a computer as a matrix of discrete pixels. If the images can be expressed in a continuous form, images of arbitrary resolution can be acquired in a continuous space, thereby ensuring the accuracy of the described scene of the image. Although the continuous representation method for the images in the related art has excellent performance in the aspect of continuous image representation, the calculation cost increases in a square order along with the increase of the image magnification, so that super-resolution reconstruction at any scale is time-consuming and huge. In view of the above, there is a need to solve the technical problems in the related art.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and a system for representing an image with an arbitrary scale based on a dynamic implicit image function, so as to reduce the computation cost and improve the processing performance.

In one aspect, the present invention provides a method for representing an image with an arbitrary scale based on a dynamic implicit image function, including:

acquiring an image to be processed;

carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional characteristic diagram;

and inputting the two-dimensional characteristic diagram into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and carrying out pixel value prediction processing through a two-stage multilayer perceptron to obtain an image pixel value.

Optionally, the performing dynamic coordinate slicing processing on the two-dimensional feature map includes:

inputting an image magnification;

acquiring a feature vector from the two-dimensional feature map, determining the feature vector as a hidden code, and grouping coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate group;

and slicing the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.

Optionally, the slicing the feature coordinate set according to the image magnification to obtain a coordinate slice includes:

determining slice intervals according to the image magnification;

and dividing the characteristic coordinate group according to the slice interval to obtain a coordinate slice, wherein the coordinate slice is used for sharing the same hidden code for all coordinates in the slice.

Optionally, the performing, by the dual-stage multi-layer perceptron, pixel value prediction processing includes:

inputting a coordinate slice and a slice hidden code;

carrying out first-stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;

acquiring a coordinate to be predicted, wherein the coordinate to be predicted is any coordinate in the coordinate slice;

and carrying out second-stage processing on the slice implicit vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.

Optionally, the dual-stage multi-layer perceptron comprises a hidden layer, the hidden layer consisting of a linear layer and an activation function.

Optionally, before the pre-trained encoder performs implicit encoding processing on the image to be processed to obtain the two-dimensional feature map, the method further includes pre-training the encoder and the dynamic implicit image network, specifically including:

acquiring a training image;

performing pixel prediction processing on the training image through the encoder and the dynamic implicit image network to obtain a predicted pixel value;

determining a pixel loss value according to the pixel value of the training image and the predicted pixel value;

and updating the weight parameters of the encoder and the dynamic implicit image network according to the pixel loss value to obtain the trained encoder and the trained dynamic implicit image network.

In another aspect, an embodiment of the present invention further provides a system, including:

the first module is used for acquiring an image to be processed;

the second module is used for carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional characteristic diagram;

and the third module is used for inputting the two-dimensional characteristic diagram into a dynamic implicit image network, performing dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and performing pixel value prediction processing through a double-stage multilayer perceptron to obtain an image pixel value.

Optionally, the third module comprises:

the first submodule is used for carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram;

and the second sub-module is used for carrying out pixel value prediction processing through the two-stage multi-layer perceptron.

Optionally, the first sub-module comprises:

a first unit for inputting an image magnification;

the second unit is used for acquiring a feature vector from the two-dimensional feature map, determining the feature vector as a hidden code, and performing grouping processing on coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate group;

and the third unit is used for carrying out slicing processing on the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.

Optionally, the second sub-module comprises:

a fourth unit for inputting a coordinate slice and a slice hidden code;

a fifth unit, configured to perform first-stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;

a sixth unit, configured to acquire a coordinate to be predicted, where the coordinate to be predicted is any coordinate in the coordinate slice;

and the seventh unit is used for carrying out second-stage processing on the slice hidden vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.

Compared with the prior art, the invention adopting the technical scheme has the following technical effects: in the embodiment of the invention, the two-dimensional characteristic diagram is input into a dynamic implicit image network, and dynamic coordinate slicing processing is carried out on the two-dimensional characteristic diagram, so that the neural network can execute many-to-many mapping from coordinate slices to pixel value slices, a decoder can predict all pixel values corresponding to the coordinate slices by using an implicit code only once, and the calculation cost is reduced; and the pixel value prediction processing is carried out through the two-stage multilayer perceptron to obtain the pixel value of the image, so that the decoder can use the coordinates with non-fixed number as input, thereby reducing the number of hidden layers and improving the processing performance.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart of an arbitrary scale image representation method based on a dynamic implicit image function according to an embodiment of the present application;

FIG. 2 is an overall frame diagram of a dynamic implicit image function provided in an embodiment of the present application;

FIG. 3 is an exemplary diagram of a coordinate slice provided by an embodiment of the present application;

fig. 4 is a structural diagram of a two-stage multilayer sensor according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The method and the system for representing the image with any scale based on the dynamic implicit image function provided by the embodiment of the application mainly relate to the artificial intelligence technology. Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence basic technology generally comprises technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, a big data processing technology, an operation/interaction system, electromechanical integration and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Specifically, the method and system for representing an image at any scale based on a dynamic implicit image function provided in the embodiment of the present application may employ a computer vision technique and a machine learning/deep learning technique in the field of artificial intelligence to analyze and process the image, so as to obtain a continuous image representation of the image. It can be understood that, for different tasks, the methods provided in the embodiments of the present application may all be executed in application scenarios of corresponding artificial intelligence systems; in addition, the specific time for executing the methods can be in any link in the operation flow of the artificial intelligence system.

Implicit neural representation techniques implicit neural representation can capture details of an object with a small number of parameters, and its differentiable nature allows back propagation through a neural rendering model, as compared to explicit representation. However, when the implicit neural representation is applied to a two-dimensional visual task, each pixel point is required to be predicted independently, and a large amount of calculation cost and a long running time are required.

Local Implicit Image Function (LIIF), a novel Implicit representation of an Image, uses a multi-layered perceptron to infer pixel values at each coordinate.

In the related art, although the LIIF can provide stable performance in an arbitrary scale super-resolution task of 30 times at maximum, its calculation cost rapidly increases as the magnification increases.

In view of this, referring to fig. 1, an embodiment of the present invention provides an arbitrary scale image representation method based on a dynamic implicit image function, including:

s101, acquiring an image to be processed;

s102, carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional feature map;

s103, inputting the two-dimensional characteristic diagram into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and carrying out pixel value prediction processing through a double-stage multilayer perceptron to obtain an image pixel value.

In the embodiment of the present invention, a Dynamic Implicit Image Function (DIIF) is proposed, which is a fast and effective method for representing an Image at any scale. Referring to FIG. 2, I _in Representing the input image, the encoder maps the input image to a two-dimensional feature map as its DIIF representation. Given the resolution of the real image, the hidden code z can be obtained from the two-dimensional characteristic diagram ^* And coordinate slices around the covert code

Wherein X _1st Representing the first coordinate, X, of a coordinate slice _last Representing the tail coordinates of the coordinate slice. The decoding function then uses the above information to predict all pixel values of the coordinate slice, i.e. the prediction of pixel values of the coordinates is performed by a two-stage multi-layer perceptron (or called coarse-to-fine multi-layer perceptron), and the slice hidden vector H is predicted by the first stage (coarse stage) ^* And with the coordinate X to be predicted _i The two phases are used as the input of the second phase (fine phase), and the pixel value I of the coordinate to be predicted is output _out-i . In the embodiment of the invention, in the training stage, the predicted pixel value I is used _out-i And pixel value I of real image _gt-i The loss function is calculated, the encoder and the decoding function are jointly trained in the self-supervised super-resolution task, and the learned network parameters are shared by all images. Embodiments of the present invention enable neural networks to perform many-to-many mapping from coordinate slices to pixel value slices by using image coordinate grouping and slicing strategies, rather than individually predicting pixel values at a given coordinate at a time. The embodiment of the invention further provides a two-stage multi-layer Perceptron (C2F-MLP) which is used for executing the dynamic coordinate-based multi-layer PerceptronImage decoding of the slice strategy enables the number of coordinates in each slice to vary with magnification, and DIIF using the dynamic coordinate slice strategy can significantly reduce the computational cost required for large scale super-resolution. Experimental results show that compared with the existing super-resolution method with any scale, the DIIF achieves the best calculation efficiency and super-resolution performance.

Further, as a preferred embodiment, the performing dynamic coordinate slicing processing on the two-dimensional feature map includes:

inputting an image magnification;

acquiring a feature vector from the two-dimensional feature map, determining the feature vector as an implicit code, and performing grouping processing on coordinates in the two-dimensional feature map according to the implicit code to obtain a feature coordinate group;

In the embodiment of the invention, a vector is selected from the two-dimensional feature map as the hidden code, and the coordinates closer to the hidden code than other hidden codes in the two-dimensional feature map are grouped according to the hidden code to obtain the feature coordinate group. The hidden code can be shared in one coordinate set by the characteristic coordinate set, so that the decoder can predict all pixel values corresponding to the coordinate set by using the hidden code only once. The number of coordinates in one coordinate set is proportional to the magnification, so the larger the magnification, the more computation cost can be saved. The coordinate grouping requires the decoder to predict all pixel values of the coordinate set at the same time, which places a heavy burden on the decoder when performing super-resolution on a large scale. The embodiment of the invention provides a reasonable solution that the characteristic coordinate set is sliced according to the image magnification to obtain the coordinate slice, one coordinate set is divided into a plurality of coordinate slices, and the hidden code input is only shared in the coordinate slice but not in the whole coordinate set.

Further as a preferred embodiment, the slicing the feature coordinate set according to the image magnification to obtain a coordinate slice includes:

determining slice intervals according to the image magnification;

Where appropriate slice spacing is set to achieve the best performance and efficiency balance, the simplest approach is to fix coordinate slices, which in any case uses fixed slice spacing. However, this strategy retains the square order increase characteristic of computational cost as magnification increases. In addition, there are two problems of spatial discontinuity and redundant coordinates inside the coordinate slice. To address these problems, embodiments of the present invention propose dynamic coordinate slicing to adjust the slice spacing as the magnification changes. A first strategy that may be employed by embodiments of the present invention is linear-order coordinate slicing, which sets the slice interval to a magnification factor. Using linear order coordinate slices, the computational cost of DIIF increases linearly with increasing magnification. Another strategy is to set the slice interval to the square of the magnification, called constant order coordinate slice. With constant order coordinate slices, the computational cost of DIIF is determined only by the resolution of the input image, which remains constant as the magnification increases. In the embodiment of the invention, the characteristic coordinate groups are divided according to the slicing intervals to obtain the coordinate slices, and the coordinate slices are used for sharing the same hidden code for all coordinates in the slices. Referring to FIG. 3, FIG. 3 is a drawing of coordinate slices grouped at a magnification of 4 and with a slice spacing of 4, Z ^* Representing a covert code, X _1st Representing the first coordinate, X, of a coordinate slice _last Representing the tail coordinates of the coordinate slice.

Further, as a preferred embodiment, the pixel value prediction processing by the two-stage multi-layer perceptron includes:

inputting a coordinate slice and a slice hidden code;

and carrying out second-stage processing on the slice hidden vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.

Among them, in order to implement a dynamic coordinate slicing strategy, a decoder needs to have scalability that uses a non-fixed number of coordinates as input and outputs corresponding pixel values. However, normal MLP only allows the use of fixed length vectors as inputs. To solve this problem, the embodiment of the present invention proposes a dual-stage multi-layered perceptron (C2F-MLP) as a decoder, divided into a first stage (coarse stage) for predicting slice hidden vectors and a second stage (fine stage) for predicting pixel values. In the embodiment of the invention, the hidden layer in the rough stage takes the boundary coordinates of the coordinate slice and the corresponding hidden code as input to generate the hidden vector of the slice. The slice hidden vector contains information of all pixel values in the slice and is used as input for the fine phase. The computational cost of the coarse phase is determined by the number of coordinate slices, which is much smaller than the number of output coordinates due to the dynamic coordinate slicing strategy used. The coarse phase also allows the decoding function to exploit spatial relationships within the slice, which makes its prediction of pixel values more accurate. The concealment layer of the fine phase takes as input the slice concealment vector output by the coarse phase and any coordinates in a given coordinate slice to predict the pixel values at that coordinate. The fine phase is designed to independently predict pixel values at the coordinates to be predicted. The decoding function employed by the refinement stage can be expressed as:

I(X ^* )＝f _θ (z ^* ,[x _tl -v ^* ,…,x _rb -v ^* ])；

wherein I is a pixel value, X ^* ＝[x _tl ,…,x _rb ]Is a given coordinate slice, f _θ Is a decoder, z ^* Is a hidden code corresponding to a coordinate slice, v ^* Is the coordinate of the covert code, x _tl And x _rb Respectively the head and tail coordinates of the coordinate slice.

Since the length of the sliced implicit vector is shorter than the length of the implicit code and the number of concealment layers in the fine stage is smaller, the computational cost required for the fine stage of DIIF is significantly lower than for the LIIF decoder.

Further as a preferred embodiment, the dual-stage multi-layer perceptron comprises a hidden layer, which consists of a linear layer and an activation function.

Referring to fig. 4, C2F-MLP divides the decoder into a coarse phase for predicting the slice hidden vectors and a fine phase for predicting the pixel values. The hidden layer of the C2F-MLP consists of a linear layer with a dimension of 256, followed by the ReLU activation function. In the coarse phase, the implicit code z ^* First coordinate X of a coordinate slice _1st Tail coordinate of coordinate slice X _last Taking the area a of the pixel point under the current magnification factor as input, and outputting to obtain a coordinate hidden vector H _lt～rb . In the fine stage, a coordinate implicit vector and a coordinate X to be predicted are input _I The output is given as I _i . To predict the RGB values, the fine phase finally uses an output linear layer of dimension 3.

Further as a preferred embodiment, before the image to be processed is implicitly encoded by the pre-trained encoder to obtain the two-dimensional feature map, the method further includes pre-training the encoder and the dynamic implicit image network, specifically including:

acquiring a training image;

and updating the weight parameters of the encoder and the dynamic implicit image network according to the pixel loss value to obtain the trained encoder and dynamic implicit image network.

In an embodiment of the invention, the training phase uses the predicted pixel values and the pixel values of the real image to calculate the pixel level loss. The encoder and decoding functions are jointly trained in the self-supervised super resolution task, while the learned network parameters are shared by all images.

the first module is used for acquiring an image to be processed;

Optionally, the third module comprises:

Optionally, the first sub-module comprises:

a first unit for inputting an image magnification;

the second unit is used for acquiring a feature vector from the two-dimensional feature map, determining the feature vector as a hidden code, and grouping coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate group;

Optionally, the second sub-module comprises:

a fourth unit for inputting a coordinate slice and a slice hidden code;

a fifth unit, configured to perform a first-stage processing on the coordinate slice and the slice implicit code to obtain a slice implicit vector;

The invention provides an arbitrary scale image representation method and system based on a dynamic implicit image function, which are used for quickly and effectively representing an arbitrary scale image. In DIIF, a pixel-based image is represented as a two-dimensional feature map, and a decoding function takes a coordinate slice and a local feature vector as inputs to predict a corresponding set of pixel values. By sharing local feature vectors inside coordinate slices, DIIF can perform large-scale super-resolution reconstruction at very low computational cost. Experimental results show that the super-resolution performance and the calculation efficiency of the DIIF are superior to those of the existing super-resolution method with any scale in all scaling factors. DIIF can save up to 87% of the computational cost compared to LIIF and consistently has better PSNR performance. DIIF can be efficiently applied to scenes that need to present images in real time at an arbitrary resolution. By applying the embodiment of the invention, the arbitrary zooming function in the image viewing/editing software can be realized, the low-resolution image can be amplified and restored, and the high-resolution image can be compressed and stored.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those of ordinary skill in the art will be able to practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An arbitrary scale image representation method based on a dynamic implicit image function is characterized by comprising the following steps:

acquiring an image to be processed;

and inputting the two-dimensional characteristic diagram into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and carrying out pixel value prediction processing through a double-stage multilayer perceptron to obtain an image pixel value.

2. The method of claim 1, wherein the dynamic coordinate slicing of the two-dimensional feature map comprises:

inputting an image magnification;

3. The method of claim 2, wherein said slicing the set of feature coordinates according to the image magnification to obtain coordinate slices comprises:

determining slice intervals according to the image magnification;

4. The method of claim 1, wherein the processing for pixel value prediction by the dual-stage multi-layer perceptron comprises:

inputting a coordinate slice and a slice hidden code;

5. The method of claim 1, wherein the dual-stage multi-layered perceptron comprises a hidden layer, the hidden layer consisting of a linear layer and an activation function.

6. The method according to any one of claims 1 to 5, wherein before the performing the implicit coding process on the image to be processed by the pre-trained encoder to obtain the two-dimensional feature map, the method further includes pre-training the encoder and the dynamic implicit image network, specifically including:

acquiring a training image;

7. An arbitrary scale image representation system based on a dynamic implicit image function, the system comprising:

the first module is used for acquiring an image to be processed;

8. The system of claim 7, wherein the third module comprises:

9. The system of claim 8, wherein the first sub-module comprises:

a first unit for inputting an image magnification;

10. The system of claim 8, wherein the second sub-module comprises:

a fourth unit for inputting a coordinate slice and a slice hidden code;

and the seventh unit is used for carrying out second-stage processing on the slice implicit vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.