CN116245736B

CN116245736B - Orthogonal position coding representation method of image block and up-sampling module implementation method

Info

Publication number: CN116245736B
Application number: CN202310527057.6A
Authority: CN
Inventors: 孙倩; 宋高超; 潘成胜
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2023-05-11
Filing date: 2023-05-11
Publication date: 2023-07-21
Anticipated expiration: 2043-05-11
Also published as: CN116245736A

Abstract

The invention discloses an orthogonal position coding representation method of an image block and an up-sampling module implementation method, wherein the orthogonal position coding representation method comprises the following steps: s1, cutting a target super-resolution image to be rendered into a plurality of super-resolution image blocks; s2, determining a continuous binary function and an orthogonal position code corresponding to the super-resolution image block in a relative coordinate domain, wherein the orthogonal position code is as follows: and respectively carrying out position coding on the coordinate information of the transverse dimension and the longitudinal dimension of the super-resolution image block by utilizing a Fourier base, and multiplying the position codes of the dimensions one by one to obtain a comprehensive position coding vector. The invention solves the symmetry problem of the up-sampling module in the super resolution of any scale image: the up-sampling module based on implicit neural representation needs to learn symmetry in the image through data enhancement, so that symmetry priori in the image can be naturally introduced, and the difficulty of neural network training is reduced.

Description

Orthogonal position coding representation method of image block and up-sampling module implementation method

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to an orthogonal position coding representation method of an image block and an up-sampling module implementation method based on orthogonal position coding.

Background

Image super-resolution is a fundamental branch of computer vision that takes as input a low-resolution image and a specified integer super-resolution scale (typically 2,3, 4), outputting a sharp image corresponding to the high resolution. The process comprises the following steps: the low-resolution image firstly obtains a feature map through an encoder based on a convolutional neural network, then the feature map is subjected to an up-sampling module designed by a corresponding super-resolution scale, and a high-resolution image is obtained through convolution operation. The upsampling module needs to be designed for different super-resolution scales, which causes inconvenience in application, so that the super-resolution related technology of the image with any scale is further generated.

As an emerging branch of image super-resolution, the method aims to design a brand new up-sampling module, and can receive any non-integer super-resolution scale as an up-sampling factor so as to be convenient for practical application. The process is similar to the super resolution of the image: the low-resolution image firstly obtains a feature map through an encoder based on a convolutional neural network, and then the feature map passes through an up-sampling module designed by a corresponding super-resolution scale. The difference is that the upsampling module can support non-integer scale, and the upsampling module meeting such requirements is realized by adopting an emerging technology: implicit neural representation.

Implicit neural representation: the coordinate-based implicit neural representation originates from three-dimensional vision, the basic idea being to use neural networks to directly represent complex, continuous scenes or various signals. In the field of image continuous representation, the image pixel coordinates are taken as the input of a neural network (usually a multi-layer perceptron), and RGB channel values corresponding to the coordinate pixels are output. Since the input coordinates are continuous and dense, a "continuous" image can be represented, with resolution being any size.

The up-sampling module of super-resolution of any scale image adopts implicit neural representation, thus supporting super-resolution of any scale, taking a feature map and designating the super-resolution scale as input, firstly calculating the resolution of an output image according to the super-resolution scale, then distributing a two-dimensional Cartesian coordinate system coordinate value for each pixel of the input image, taking the coordinate values and the feature map as input of the implicit neural representation, and obtaining RGB values of corresponding quantity as a result of a final image. The technology fully solves the problem of any scale super-resolution mesoscale of the image, but when the super-resolution of the image with a higher scale (such as 30) is realized, the problems of low interpretability, large calculated amount, high occupation of the video memory and the like are faced.

Disclosure of Invention

The invention aims to: the invention aims to overcome the defects of the prior method and provides an orthogonal position coding representation method of an image block and an up-sampling module implementation method based on the orthogonal position coding.

The technical scheme is as follows: in one aspect, the present invention provides a method for representing an image block by orthogonal position coding, comprising the steps of:

s1, cutting a target super-resolution image to be rendered into a plurality of super-resolution image blocks;

s2, determining a continuous binary function and an orthogonal position code corresponding to the super-resolution image block in a relative coordinate domain;

the orthogonal position codes are: the method comprises the steps of firstly, carrying out position coding on coordinate information of transverse and longitudinal dimensions of the super-resolution image block by utilizing a Fourier base, and then multiplying the position codes of the dimensions one by one to obtain a comprehensive position coding vector.

Further, the method comprises the steps of:

the super-resolution image block representation method comprises the following steps:

equidistant transverse and longitudinal cutting is carried out on the target super-resolution image to be rendered into the imageA size portion of the low resolution input image, and a size corresponding to the target super resolution image to be rendered is expressed as +.>Then obtainPersonal->Image area tiles of a size, each such image area tile being a super resolution image tile.

The relative domain coordinates are defined as:

if the size of the low resolution input image isThen divide the 2D domain into +.>A grid, namely the 2D domain is called an absolute coordinate domain; each grid corresponds to a low resolution image pixel and also corresponds to an image block of the target super resolution image, and the size of the image block is +.>I.e. +.>Super resolution scale, further dividing a new 2D domain into +.>A grid, called a relative coordinate domain.

Further, the method comprises the steps of:

the orthogonal position coding representation method comprises the following steps:

for each channel of the RGB image, the super resolution image block is represented asIThe image block is divided into relative coordinate domainsx,yThe pixel value of the coordinates of the center point is recorded asIs regarded as a binary functionfResults of samplingfRepresented as a linear combination of a set of orthogonal basis and projection:

；

wherein,,representing the expansion of a 2D domain matrix into a 1D domain, Z is a hidden code, which is a projection of a set of orthogonal bases, which is a generalized term for mathematical operations, +.>A position code representing a certain coordinate is presented,nthe maximum frequency of the position code is indicated, and P is the orthogonal position code OPE.

Further, the method comprises the steps of:

the binary functionfA continuous binary function corresponding to the image block, the definition domain of which is [ -1,1]The binary function belongs to the inner product space M, so thatx,yMapping coordinates in a 2D domain for each pixel center point according to an image block, therebyf(x,y) Is a scalar quantity and represents the value of a pixel point on the image block.

The 2D domain matrixEach element->All are regarded as a binary function, satisfying the following relationship:

；

wherein,,a set of orthogonal bases constituting a continuous image space, and encoded according to the above definition of orthogonal positionWhich contains the set of orthogonal bases. Thus, a continuous image block is represented as a linear combination of a set of basic plane waves, and orthogonal position coding does not contain complex exponential terms.

On the other hand, the invention also provides an up-sampling module implementation method based on orthogonal position coding, which comprises the following steps:

s1, obtaining a size of an input image with low resolution through an encoderIs a feature map of the low resolution input image size +.>And the length and width of the characteristic diagram are consistent with those of the input image, and the number of image channels becomesWherein, the method comprises the steps of, wherein,nthe maximum frequency is encoded for the preselected position as an integer.

S2, extracting each pixel of the feature map according to the channel dimension to obtain a shape ofOne-dimensional implicit code->Which is used to represent the image block corresponding to the target super-resolution image.

The target super-resolution image is an RGB image, the corresponding image block consists of three channels, and each channel is virtually represented by one third of the hidden codes, namely the shape isIn the subsequent calculation, the shape isOrthogonal position code P and shape of +.>Is linearly combined to represent the pixel value of a pixel of the target image block for a particular RGB channel.

S3, since the RGB image comprises three channels, the same orthogonal position codes are duplicated 3 times and respectively have the corresponding shapesAnd performing feature map rendering or image block aggregation by the hidden code to obtain pixel values of three RGB channels of a certain pixel of the super-resolution image.

S4, repeating the steps S2-S3 until all pixel values of the target resolution are calculated, and then rearranging all pixel values toThe target super-resolution image can be obtained by the rectangle of the image.

Further, the method comprises the steps of:

the step S2 specifically includes:

s21 for rendering sizeDividing 2D domain into +.>The grid areas are so that each target pixel can be assigned absolute center point coordinates +.>。

S22 the same 2D domain is divided into according to the size of the feature mapEach grid region, Z of each feature map can be allocated an absolute center point coordinate +.>Thus, for each target pixel, the coordinates of the center point can be usedCalculating a Z of shortest distance, and marking as hidden code ++>And the target pixel and the hidden code are obtained>Relative coordinates of +.>After that, the relative coordinates are coded by using orthogonal positions, and +.>The target pixel value of a certain RGB channel under the coordinate can be obtained by linear combination, and the target pixel value is expressed as follows:

。

wherein,,rendering function->Representing linear combination of the relative coordinates after orthogonal position encoding.

Further, the method comprises the steps of:

the step S2 specifically calculates the target pixel value by using image block aggregation, where the image block aggregation is expressed as:

。

for each center pointFind the nearest four adjacent hidden codes, i.e. +.>The method comprises the steps of carrying out a first treatment on the surface of the Corresponds to four center point coordinates +.>After which +.>And obtaining four undetermined pixel values by corresponding coordinates of the four hidden codes, and then interpolating the overlapped parts of the four undetermined pixel values to obtain a target pixel value of a certain RGB channel.

Wherein,,representing the area of a rectangular area surrounded by the current hidden code and the central point, and the S of the denominator represents all +.>And, four->Respectively expressed as->Subscripts are used for distinctiont，/>Corresponding to ∈>The four coordinates that are hidden-encoded within the 2D threshold each enclose a rectangular area with the absolute center point coordinates of the target pixel value, i.e., a rectangular area with the two coordinates as diagonal lines.

Further, the method comprises the steps of:

in the calculationRelative coordinates to the nearest hidden code +.>When the value range is in 2D domain, and when the relative coordinates of other three hidden codes are calculated, the 2D domain is exceeded, so when the relative coordinates are calculated, the relative coordinates are obtained, and then divided by 2, so that the relative coordinates are ensured to be in 2D domain, and from the whole point of view, the image blocks of the target super-resolution image are overlapped, and the target super-resolution image can not produce discontinuous effect because of the image blocks, namely>Expressed as: />。

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above method when executing the program.

Finally, the invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method described above.

The beneficial effects are that: (1) Compared with an up-sampling module based on implicit neural representation, the method has the advantages that MACs are adopted as calculated amount, and the number of multiplying and adding operations is increased, so that 2 orders of magnitude are saved, 1-2 orders of magnitude are saved due to occupation of a display memory, and the number of parameters of the neural network is 0.

(2) In actual measurement, the rendering speed of the method is improved by about 40% on average under different super-resolution scales.

(3) The invention has the interpretability, the interpretation feature map is the projection on a group of orthogonal bases, the position code corresponds to a group of orthogonal bases, the input and the output of the up-sampling module have strict mathematical significance, and the operation principle can be interpreted.

(4) The invention has symmetry prior, and aims at the overturned characteristic diagram: the super-resolution image is composed of a linear combination of feature map hidden coding and orthogonal position coding of coordinates, which has symmetry because of the use of a sine function. Therefore, a clear flipped super-resolution image can be obtained, while an up-sampling module based on implicit neural representation cannot obtain a clear flipped image, and data enhancement is required during training.

Drawings

FIG. 1 is a schematic diagram of a linear combination of image blocks represented as a set of basic plane waves according to an embodiment of the present invention;

FIG. 2 is a flow chart of an up-sampling module based on orthogonal position coding according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of calculating relative coordinates from target super-resolution pixels according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of two pixel value calculation methods used by the upsampling module according to an embodiment of the present invention;

FIG. 5 is a schematic representation of a target dimension using a sinusoidal trigonometric function according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a symmetry comparison of a method according to an embodiment of the present invention with an implicit neural representation method;

fig. 7 is a schematic diagram showing the calculation amount of each target pixel calculation method compared with other methods according to the embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the drawings and the detailed description.

In a first aspect, the present invention provides a method for orthogonal position coding representation of an image block, comprising:

firstly, cutting a target super-resolution image to be rendered into a plurality of super-resolution image blocks;

secondly, determining continuous binary functions and orthogonal position codes corresponding to the super-resolution image blocks in a relative coordinate domain;

To achieve arbitrary scale resolution, first, a continuous representation of an image should be designed, and to achieve a finer representation of the continuous image, the image blocks are used as representation units, instead of a whole image. This section will introduce an orthogonal position-coded representation of the image block, which is also the basis of the proposed arbitrary scale up-sampling module.

In particular, the invention relates to related terms comprising:

OPE (orthogonal position encoding), orthogonal position coding as proposed by the present invention.

Image block: assuming that an image can be cut equally transversely and longitudinally, e.g.The resolution image is equally cut into 4 parts along the length and width to obtain 16 +.>Image area patches of a size, hereinafter referred to as each of theseThe small areas of the sample are image blocks, and each image block can also be regarded as a discrete pixel value formed by discrete sampling of a continuous binary function.

2D domain: the continuous square planar area located in the rectangular planar coordinate system can also be used as the definition domain of the binary function.

Feature map: in the case of a super-resolution pipeline,the same size, i.e. the height and width are first obtained by means of an encoder>Then, the super-resolution image is obtained through an up-sampling module.

Abbreviations in technical schemes:: a low resolution input image in an image super resolution pipeline; />: the height and width of the low resolution input image; />: realizing the height and width of the image after super resolution; />Expansion of super resolution to length and width, obviously +.>，/>；/>: the feature map can be seen as viewed along the channel dimension as being composed of +.>A one-dimensional vector component, each one-dimensionalVectors are steganographically encoded with Z or +.>Z is a implicit code which is conveniently and broadly referred to in mathematical operations herein, < >>For a particular hidden code at the time of application.

Setting a continuous binary function corresponding to an image blockfThe definition domain is [ -1,1]The function belongs to the inner product space M, letx,yMapping coordinates in a 2D domain for each pixel center point according to an image block, therebyf(x,y) Is a scalar quantity representing the value of a pixel point on an image block and can be also regarded as a result offSampled. Such as: assume that the low resolution input image is of sizeThen divide the 2D domain into +.>A grid, namely the 2D domain is called an absolute coordinate domain;

for each divided grid, a low resolution image pixel is corresponding to an image block of a target super resolution image, and the size of the image block is as followsI.e. super resolution scale, a new 2D domain is further divided into +.>A grid, called a relative coordinate domain.

To this end, each input image corresponds to an absolute coordinate domain, which can be further divided intoEach of the relative coordinate domains may be regarded as a continuous binary function definition domain of the super resolution image block.

For RGB imagesRepresenting a super resolution tile as I, representing the tile tox,yThe pixel values for the center point coordinates (in the relative coordinate domain) are noted asIs regarded as being composed offResults of samplingfCan be expressed as a linear combination of a set of orthogonal basis and projection:

；

notably, for the case ofx,yIt can be regarded as a small grid area in the relative coordinate domain corresponding to the pixels in the image blockDirectly using the value of the center point coordinate to represent the due value of the grid region, so that the grid region corresponding to the pixel value of I (x, y) can be obtainedfAnd (5) calculating. In addition, in the case of the optical fiber,fm represents the inner product space, for +.>The inner product operation is defined as:

；

wherein,,indicating somethingA position code of a coordinate is provided,nrepresenting the maximum frequency of the position coding, +.>The representation is obtained by expanding a 2D matrix to 1D, called orthogonal position coding (OPE), which can be represented by linear combination with Zf。

The orthogonal position coding is a position coding method applied to the field of super resolution of two-dimensional arbitrary scale images. The method comprises the steps of firstly, carrying out position coding on coordinate information of the transverse dimension and the longitudinal dimension of an image by utilizing a Fourier base, and then multiplying the position codes of the dimensions one by one to obtain a comprehensive position coding vector. The coding mode has orthogonality, and can integrate the position information among different dimensions, thereby better capturing the spatial characteristics of the input data.

Theoretical basis:

when 2D matrix is to be formedEach element->All are considered as a binary function, such as:

；

they satisfy the following relationship:

；

thus, the first and second substrates are bonded together,a set of orthogonal bases that make up the continuous image space is included in the orthogonal position code, and Z can be considered as a set of projections above. At an overall level, a continuous image block may be represented as a linear combination of a set of basic plane waves, as shown in fig. 1. Unlike two-dimensional fourier transforms, orthogonal position coding does not contain complex exponential terms, in factIt can also be derived from a two-dimensional fourier transform based on conjugate symmetry cancellation complex exponential terms.

In another aspect, the invention discloses an up-sampling module based on orthogonal position coding, comprising: the up-sampling module based on the orthogonal position coding takes the feature map as input and outputs a target super-resolution image. The one-dimensional vector Z represented by the feature map along the channel dimension is regarded as projection on an orthogonal basis, so that when the pixel point color of the target image is queried, coordinates are orthogonally encoded and then corresponding to the pixel point colorAnd linearly combining to obtain the color.

The up-sampling module takes the target resolutions H, W and the feature map as inputs, and calculates the color of each target pixel in parallel after assigning coordinates to the super-resolution image pixels. The composition of the upsampling module may be explained by the feature map rendering process described below, and in order to make up for the deficiencies in the feature map rendering process, an image block aggregation technique is further proposed later as optimization of the feature map rendering process.

As shown in fig. 2, the present method aims at a three-channel low resolution image (size) And the specified target resolution size H, W, to obtain a super-resolution image (size +.>Wherein H can also be regarded asW can be regarded as +.>Wherein->，/>In super-resolution scale, and as described above, each low-resolution image pixel corresponds to a size +.>Image blocks in the super resolution image).

When the method is specifically implemented, each pixel value of the target super-resolution image is calculated one by one, and finally the pixel values are rearranged to obtain the target super-resolution image. The process of calculating each target pixel can be seen as calculating a mapping of a two-dimensional coordinate (allocated one by one for each target super-resolution image pixel) to a specific value (i.e. as the gray level of a channel of the target pixel RGB).

For low resolution input images, first, a size is obtained through an encoderThe length and width of the feature map are kept consistent with those of the input image, and the number of channels is changed to +.>Wherein, the method comprises the steps of, wherein,nthe maximum frequency is encoded for the preselected position for the integer, and is set in actual operationnFor 3, each pixel of the feature map is extracted according to the channel dimension, and the shape of +.>One-dimensional implicit code->For representing corresponding image blocks of the target super-resolution image.

Since the target super-resolution image block is composed of three channels, each channel is virtually individually coded hiddenExpressed as one third of (i.e. length +)>One third of (i.e. length +.>Is a part of the hidden code of (a). In the subsequent calculation, the length is +.>The orthogonal position code P of (2) will be equal to +.>To represent the value of a certain channel of the target image block, which is shown in the figure as a case after comprehensively considering three channels, in order to accommodate a length of +.>So we will have a length +.>Is duplicated 3 times with the orthogonal position codes of (a) and (b) are each duplicated 3 times with the hidden code ++>And (3) carrying out linear combination on one third of the three channels to obtain the pixel values of the three channels of the super-resolution image. If the image block aggregation technology which belongs to the later is not used, the pixel value calculated in the mode is taken as the final super-resolution image pixel value; in case image block aggregation techniques are used, the calculated neighboring four pixel values should be subjected to an area interpolation resulting in a final pixel value (see image block aggregation part for details). As described above, the calculation process of a certain pixel RGB channel of the target resolution image has just been described, and the process of calculating the pixel RGB channel is repeated several times until all the target resolution pixel values are calculated (because the target image is common->Pixels, thus together calculate +>Secondary), after which the pixels are subjected toRearrange to +.>The target super-resolution image can be obtained by rectangular shape).

For rendering of sizeThe upsampling module first divides the 2D domain into +.>The grid areas are so that each pixel can be assigned absolute center point coordinates +.>As shown in fig. 3 (c); second, the input image size is +.>The same 2D domain can be divided into +/according to the feature map size>The mesh regions, as shown in (a) of FIG. 3, are one-dimensional latent codes (shape +.>Is common->Individual) can be assigned an absolute center point coordinate +.>The method comprises the steps of carrying out a first treatment on the surface of the Thus, for each target pixel, a hidden code of the shortest distance can be calculated according to the coordinates of the central point and is marked as +.>And the target pixel and +.>Relative coordinates of +.>I.e., -as shown in the (b) 2D domain in FIG. 3>And->At->Is a distance of (3). After this the relative coordinates are used with OPE, and +.>The target pixel value under the coordinate can be obtained by linear combination, and the method is as follows:

；

wherein the rendering functionThat is, represents that the coordinates are linearly combined after OPE is performed, and the above formula is +.>Similarly, the difference is that the OPE is repeated three times to match the RGB channels.

For an orthogonal position-coded OPE, it can be derived from the following formula:

；

in particular, the process of orthogonal position coding can be seen as a 2-dimensional toMapping of dimensions, each target dimension is composed of a sinusoidal trigonometric function, as shown in FIG. 5. Each function in the graph is used as the output of the orthogonal position codes and is totally +.>And outputting the result.

Specifically, as shown in fig. 6, the first behavior is an implicit neural representation method, the second behavior is that the 1 st and 3 rd columns are original outputs, the 2 nd and 4 th columns are outputs after turning over the feature map, and the lower numerical value is PSNR, at this time, we find that for the implicit neural representation method, the output corresponding to the turned-over feature map has a fuzzy phenomenon, because of lack of symmetry priori, training is needed to use data enhancement to alleviate the phenomenon, and for the method of the present invention, because the orthogonal position coding adopts a symmetrical sine function, perfect symmetrical output can be obtained after turning over the feature map.

And the calculation process of each target pixel only comprises one coordinate position coding operation and one matrix operation (namely linear combination operation), so that the calculation amount is extremely small, as shown in fig. 7, wherein MACs is a multiply-add operand, FLPs is a floating point operand, and the method of the invention saves two orders of magnitude. The other two methods in fig. 7 are: LIIF (Local Implicit Image Function) local implicit image function; LTE (Local Texture Estimator) local texture estimator.

According to the defect of feature map rendering, the invention further provides image block aggregation:

the shortcomings of the feature map rendering process just described are: as shown in fig. 3, whenMoving from a to b, the nearest +.>The method comprises the following steps that A is suddenly changed to B from A, so that a target super-resolution image is not continuous at the edge of an adjacent image block, and in order to solve the problem, two formulas of a feature map rendering part are expanded as follows:

；

for each ofNot only find the closest hidden code +.>The nearest four neighboring steganography (including the nearest steganography) are also found, i.e. +.>Corresponds to four center point coordinates +.>After which +.>And respectively carrying out orthogonal position coding on the relative coordinates of the four hidden codes, and then carrying out linear combination with the hidden codes to finally obtain four undetermined pixel values, and then distributing the specific gravity of the four pixel values in the final pixel according to the area shown in (a) in fig. 4, and carrying out interpolation to obtain the final pixel value. As shown in (b) of fig. 4, < + >>And->The distance between them is the shortest, therefore +.>The calculated pixel should have a higher specific gravity and the corresponding area region is +.>The remaining hidden codes each calculate a corresponding area according to such a relationship.

Wherein,,representing the area of a rectangular area surrounded by the current hidden code and the central point, and the S of the denominator represents all +.>And, four->Respectively expressed as->Subscripts are used for distinctiont，/>Corresponding to ∈>The four coordinates that are hidden-encoded within the 2D threshold each enclose a rectangular area with the absolute center point coordinates of the target pixel value, i.e., a rectangular area with the two coordinates as diagonal lines. Note the above->And->They are not simply direct correspondences but +.>This presents a diagonal correspondence because we want the weight that the implicit code closer to the query point has when weightedLarge, while directly computing the rectangular areas with their distance diagonal is certainly smaller, the smaller the final weight is, so we compute the rectangular areas diagonally.

It should be noted that, when calculating the relative coordinates with the nearest hidden code, the value range is in the 2D domain, and the relative coordinates of the remaining three are calculated to exceed the 2D domain, so that the relative coordinates are calculated by dividing by 2 to ensure that the relative coordinates are in the 2D domain, as shown in the formula。

From an overall point of view, as shown in (a) of FIG. 4, it is shown that the image block aggregation technique is not used, forOnly the nearest one of the hidden codes needs to be found, and the four colors in the figure represent the target super-resolution image blocks, each of which is controlled by its own hidden code, without overlapping each other, but such non-overlapping will lead to a discontinuity of the edges of said adjacent image blocks at the beginning. Fig. 4 (b) shows a variation of the image block aggregation technique, in which four colors represent image blocks of the target super-resolution image, so that overlapping is generated between each other by the image block aggregation technique, that is, because we have found ∈ ->Surrounding four adjacent hidden codes. Our image block aggregation technique finally interpolates the overlapping portions to ensure that each final pixel contains the contents of the surrounding four image blocks, so that the target super-resolution image does not produce a discontinuous effect due to the image blocks.

In another aspect, the invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method described above when the program is executed.

Finally, the invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method described above.

The invention solves the problems of large calculation amount of the up-sampling module operation and high occupied video memory in the super-resolution of any scale image.

The symmetry problem of the up-sampling module in the super-resolution of the image with any scale is solved: the invention designs an up-sampling module with a symmetrical structure based on the up-sampling module of implicit neural representation, which needs to learn the symmetry in an image through data enhancement, because the used orthogonal position codes correspond to a series of trigonometric functions which are symmetrical in the definition domain, the symmetry prior in the image can be naturally introduced, and the difficulty of neural network training is reduced.

The problem of poor interpretability in super-resolution of images of any scale is also solved: the up-sampling module based on the implicit neural representation has no interpretability because the main body of the up-sampling module is a neural network black box model.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims

1. An orthogonal position coding representation method of an image block, comprising the steps of:

the orthogonal position codes are: firstly, respectively carrying out position coding on coordinate information of transverse and longitudinal dimensions of the super-resolution image block by utilizing a Fourier basis, and then multiplying the position codes of the dimensions one by one to obtain a comprehensive position coding vector;

equidistant transverse and longitudinal cutting is carried out on the target super-resolution image to be rendered into the imageA size portion of the low resolution input image, and a size corresponding to the target super resolution image to be rendered is expressed as +.>Then get +.>Personal (S)Image area tiles of size, each such image area tile being a super resolution image tile;

the relative coordinate domain is defined as:

if the size of the low resolution input image isThen divide the 2D domain into +.>A grid, namely the 2D domain is called an absolute coordinate domain; each grid corresponds to a low resolution image pixel and also corresponds to an image block of the target super resolution image, and the size of the image block is +.>I.e. +.>Super resolution scale, further dividing a new 2D domain into +.>A grid, called a relative coordinate domain;

the binary functionfAs a continuous binary function corresponding to the image block,the definition domain is [ -1,1]The binary function belongs to the inner product space M, so thatx,yMapping coordinates in a 2D domain for each pixel center point according to an image block, therebyf(x,y) A scalar quantity representing the value of a pixel point on the image block;

；

wherein,,a set of orthogonal bases constituting a continuous image space, and encoded according to the above definition of orthogonal positionWhich contains the set of orthogonal bases; thus, a continuous image block is represented as a linear combination of a set of basic plane waves, and orthogonal position coding does not contain complex exponential terms.

2. The method of claim 1, wherein the orthogonal position code representation method is: for each channel of the RGB image, the super resolution image block is represented asIThe image block is divided into relative coordinate domainsx,yThe pixel value of the coordinates of the center point is recorded asIs regarded as a binary functionfResults of samplingfRepresented as a linear combination of a set of orthogonal basis and projection:

；

3. An up-sampling module implementation method based on the orthogonal position coding as claimed in claim 1 or 2, characterized in that the up-sampling module implementation method comprises:

s1, obtaining a size of an input image with low resolution through an encoderIs a feature map of the low resolution input image size +.>And the length and width of the characteristic diagram are consistent with those of the input image, and the number of image channels becomesWherein, the method comprises the steps of, wherein,nis an integerEncoding a maximum frequency for a preselected location;

s2, extracting each pixel of the feature map according to the channel dimension to obtain a shape ofOne-dimensional implicit coding of (2)Which is used to represent the image block corresponding to the target super-resolution image;

the target super-resolution image is an RGB image, the corresponding image block consists of three channels, and each channel is virtually represented by one third of the hidden codes, namely the shape isIn the subsequent calculation, the shape is +.>Orthogonal position code P and shape of +.>Linear combination is performed to represent the pixel value of a pixel of the target image block for a certain RGB channel;

s3, since the RGB image comprises three channels, the same orthogonal position codes are duplicated 3 times and respectively have the corresponding shapesThe hidden code performs feature map rendering or image block aggregation to obtain pixel values of three RGB channels of a certain pixel of the super-resolution image;

4. The method for implementing the upsampling module based on orthogonal position coding according to claim 3, wherein said step S2 specifically comprises:

s21 for rendering sizeDividing 2D domain into +.>The grid areas are so that each target pixel can be assigned absolute center point coordinates +.>；

；

5. A method according to claim 3, wherein the step S2 calculates the target pixel value using image block aggregation, the image block aggregation being expressed as:

；

for each center pointFind the nearest four adjacent hidden codes, i.e. +.>The method comprises the steps of carrying out a first treatment on the surface of the Corresponds to four center point coordinates +.>After which +.>The relative coordinates of the four hidden codes are used for obtaining four undetermined pixel values, and then interpolation is carried out on the overlapped parts of the four undetermined pixel values to obtain a target pixel value of a certain RGB channel;

6. The method of claim 5, wherein, in the calculatingRelative coordinates to the nearest hidden codeWhen the value range is in 2D domain, and when the relative coordinates of other three hidden codes are calculated, the 2D domain is exceeded, so when the relative coordinates are calculated, the relative coordinates are obtained, and then divided by 2, so that the relative coordinates are ensured to be in 2D domain, and from the whole point of view, the image blocks of the target super-resolution image are overlapped, and the target super-resolution image can not produce discontinuous effect because of the image blocks, namely>Expressed as:。

7. an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-6 when executing the program.

8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of the preceding claims 1-6.