CN118042133A - Panoramic image coding method, decoding method and related device based on slice expression - Google Patents

Panoramic image coding method, decoding method and related device based on slice expression Download PDF

Info

Publication number
CN118042133A
CN118042133A CN202410436958.9A CN202410436958A CN118042133A CN 118042133 A CN118042133 A CN 118042133A CN 202410436958 A CN202410436958 A CN 202410436958A CN 118042133 A CN118042133 A CN 118042133A
Authority
CN
China
Prior art keywords
slice
super
image
panoramic image
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410436958.9A
Other languages
Chinese (zh)
Other versions
CN118042133B (en
Inventor
李穆
程裕龙
李锦兴
卢光明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Original Assignee
Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology filed Critical Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Priority to CN202410436958.9A priority Critical patent/CN118042133B/en
Publication of CN118042133A publication Critical patent/CN118042133A/en
Application granted granted Critical
Publication of CN118042133B publication Critical patent/CN118042133B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a panoramic image coding method, a decoding method and a related device based on slice expression, which relate to the field of panoramic image coding and decoding, wherein the method comprises the following steps: performing super-slice image conversion on the acquired panoramic image to be encoded to obtain a super-slice image set serving as a slicing expression form of the panoramic image, extracting features of the super-slice image set by using a slice encoder to obtain super-slice encoding, further generating a super-slice encoding quantization result and a priori encoding quantization result, determining a Gaussian distribution probability model according to the super-slice encoding quantization result and generating a bit stream of the super-slice encoding quantization result and a bit stream of the priori encoding quantization result by using the model; and in the decoding stage, the bit stream of the super slice coding quantization result is decoded, and the panorama reconstruction image is obtained after the decoding result is subjected to inverse quantization and slice decoding, so that the stability of the panorama image expression is improved, and the high-performance panorama image coding and decoding are realized.

Description

Panoramic image coding method, decoding method and related device based on slice expression
Technical Field
The invention relates to the field of panoramic image encoding and decoding, in particular to a panoramic image encoding method and decoding method based on slice expression and a related device.
Background
Panoramic images can provide 360 ° by 180 ° panoramic natural scenes and realize a direction for a user to select freely viewing, and in recent years, the generation amount of panoramic image data has drastically increased. On the one hand, the general user can easily access the panoramic imaging and display device, and is using the virtual reality content in this format every day. On the other hand, there is a trend to capture ultra-high definition panoramas to provide an excellent immersive experience, pushing the spatial resolution very high (e.g., 8K). With the increasing need to store and transmit large amounts of panoramic data, new and efficient panoramic image encoding methods are needed.
Currently, a popular panoramic image encoding method adopts a two-step method. In the first step, a map projection approach with sphere-to-plane mapping is used. In a second step, a standard image codec is used, which is a compatible central perspective image compression. In mapping projection technology of sphere to plane mapping, the most widely used is panoramic image sampling based on equidistant cylindrical projection (ERP), but in differential geometry, all plane projections of a sphere must be distorted as known from the aigrette theorem proposed by gauss. Therefore, the adoption of panoramic image sampling based on equidistant cylindrical projection (ERP) can cause great difference of resolution of images in different latitude areas, so that the image is seriously oversampled in a high latitude area, and the image is relatively undersampled in a low latitude area, so that code rate imbalance easily occurs when the ERP image is directly encoded, and the quality of the image in the high latitude area is far better than that in the low latitude area, so that the encoding performance is seriously affected.
Disclosure of Invention
The invention aims to provide a panoramic image coding method, a decoding method and a related device based on slice expression, which can improve the stability of panoramic image expression and reduce the influence of latitude change in the panoramic image on coding performance.
In order to achieve the above object, the present invention provides the following.
In one aspect, the present invention provides a slice expression-based panoramic image encoding method, comprising the following steps.
And acquiring a panoramic image to be encoded.
Performing super-slice image conversion on the panoramic image to be encoded to obtain a super-slice image set; the super slice image set comprises a plurality of super slice images; the sum of the heights of the super slice images is the same as the unfolded height of the panoramic image to be encoded.
Inputting the super slice image set into a slice encoder, and extracting features of the super slice image set to obtain a super slice code; the slice encoder includes a downsampling encoding submodule, a residual encoding submodule, and an attention submodule.
And generating a super slice coding quantization result and an priori coding quantization result based on the super slice coding.
And determining a Gaussian distribution probability model of the super slice coding quantization result based on the prior coding quantization result and the super slice coding quantization result.
Based on the Gaussian distribution probability model and the prior probability density function, generating a bit stream of the super slice coding quantization result and a bit stream of the prior coding quantization result; the bit stream of the a priori encoded quantized results is used to assist in decoding the bit stream of the super-slice encoded quantized results in the decoding stage.
Optionally, performing super-slice image conversion on the panoramic image to be encoded to obtain a super-slice image set, which specifically comprises the following steps.
And carrying out slice image conversion on the panoramic image to be coded in the vertical direction to obtain a plurality of slice images.
And for any slice image, splicing a plurality of adjacent slice images with the same width as the slice image with the slice image to obtain a super slice image, wherein each super slice image forms a super slice image set.
Optionally, before the panoramic image to be encoded is subjected to super-slice image conversion to obtain a super-slice image set, the method further comprises the following steps.
Determining a super slice image conversion parameter based on a greedy search method; the super slice image conversion parameter includes the number of pixel columns of the T-th slice image when the number T of slice images is fixed.
Optionally, determining the super slice image conversion parameters based on a greedy search method specifically comprises the following steps.
When the number T of slice images is fixed, enumerating the pixel column number of each slice image from top to bottom in sequence for each slice image; the number of pixel columns of each slice image satisfies the following constraint.
Wherein T is the label of the th slice image, T is the number of slice images, wt-1 is the number of pixel columns of the T-1 th slice image, and Wt is the number of pixel columns of each of the T th slice images.
Optionally, the downsampling encoding submodule, the residual encoding submodule and the attention submodule each include at least one slice convolution block.
And the slice convolution block calculates to obtain a convolution result according to the following formula.
Where N is a neighborhood, w (i, j) is a convolution weight, x (pi, qj) represents coordinates of a neighborhood pixel of a point (p, q) in the panoramic image to be encoded in the super-slice stitched image, a manhattan distance between the point (p, q) and the point (pi, qj) in a counterclockwise direction parallel to the equator on the spherical surface is iΔx, and a manhattan distance between the point (p, q) and the equator in a perpendicular direction is jΔy.
The neighborhood of the slice convolution is determined according to the following equation.
Where (i, j) is the pixel index and K is the neighborhood range.
On the other hand, the invention also provides a panoramic image decoding method based on slice expression, which comprises the following steps:
acquiring a bit stream set to be decoded; the bit stream set to be decoded comprises bit streams of super slice coding quantization results and bit streams of priori coding quantization results; the bit stream of the super slice coding quantization result and the bit stream of the priori coding quantization result are bit streams obtained according to the slice expression-based panoramic image coding method.
And decoding the bit stream of the prior coding quantization result to obtain the prior coding quantization result.
And decoding the bit stream of the super slice coding quantization result based on the priori coding quantization result to obtain the super slice coding quantization result.
And generating a super-slice coding dequantization result based on the super-slice coding quantization result.
Inputting the super slice coding inverse quantization result into a slice decoder to generate a panoramic reconstruction image; the slice decoder includes an attention sub-module, a residual coding sub-module, and an upsampling coding sub-module.
In still another aspect, the present invention provides a computer device comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor executing the computer program to implement the steps of a slice-expression-based panoramic image encoding method or the steps of a slice-expression-based panoramic image decoding method as described in any one of the above.
According to the specific embodiments provided by the invention, the following technical effects are disclosed.
The invention provides a panoramic image coding method, a decoding method and a related device based on slice expression, wherein the method comprises the following steps: performing super-slice image conversion on the acquired panoramic image to be encoded to obtain a super-slice image set serving as a slicing expression form of the panoramic image; extracting features of the super slice image set by using a slice encoder to obtain super slice encoding, further generating a super slice encoding quantization result and a priori encoding quantization result, determining a Gaussian distribution probability model according to the super slice encoding quantization result and generating a bit stream of the super slice encoding quantization result and a bit stream of the priori encoding quantization result by using the model; and in the decoding stage, the bit stream of the prior coding quantization result is used for auxiliary decoding of the bit stream of the super-slice coding quantization result, after decoding is finished, the super-slice coding quantization result is converted into a decoded super-slice expression by using a slice decoder, and finally the super-slice expression is restored into a panoramic reconstruction image by inverse transformation, so that the panoramic reconstruction image is obtained by reconstruction, and the high-performance panoramic image coding and decoding are realized. According to the invention, the panoramic image to be encoded is converted and represented based on slice expression conversion, the representation reduces the characteristic characterization difference caused by visual angle change, is favorable for the encoding network to extract more stable and generalized characteristics, improves the stability of panoramic image expression, reduces the influence of object deformation caused by latitude change in the panoramic image on the characteristic extraction, and enhances the robustness of the model on image geometric deformation.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a panoramic image encoding method based on slice expression according to embodiment 1 of the present invention.
Fig. 2 is a specific flowchart of step A2 in a panoramic image encoding method based on slice expression according to embodiment 1 of the present invention.
Fig. 3 is a specific flowchart of step A4 in a panoramic image encoding method based on slice expression according to embodiment 1 of the present invention.
Fig. 4 is a specific flowchart of step A6 in a panoramic image encoding method based on slice expression according to embodiment 1 of the present invention.
Fig. 5 is a schematic view of slice projection in a panoramic image encoding method based on slice expression according to embodiment 1 of the present invention.
Fig. 6 is a schematic diagram of a super slice image in a panoramic image encoding method based on slice expression according to embodiment 1 of the present invention.
Fig. 7 is a schematic structural diagram of a slice encoder in a panoramic image encoding method based on slice expression according to embodiment 1 of the present invention.
Fig. 8 is a schematic structural diagram of a downsampling encoding submodule in a panoramic image encoding method based on slice expression according to embodiment 1 of the present invention.
Fig. 9 is a schematic structural diagram of a residual coding submodule in a panoramic image coding method based on slice expression according to embodiment 1 of the present invention.
Fig. 10 is a schematic structural diagram of an attention sub-module in a slice expression-based panoramic image encoding method according to embodiment 1 of the present invention.
Fig. 11 is a schematic structural diagram of a context network module in a panoramic image encoding method based on slice expression according to embodiment 1 of the present invention.
Fig. 12 is a schematic structural diagram of a super prior slice encoder in a panoramic image encoding method based on slice expression according to embodiment 1 of the present invention.
Fig. 13 is a schematic structural diagram of a super prior slice decoder in a panoramic image encoding method based on slice expression according to embodiment 1 of the present invention.
Fig. 14 is a schematic structural diagram of a parameter network module in a panoramic image encoding method based on slice expression according to embodiment 1 of the present invention.
Fig. 15 is a flowchart of a panoramic image decoding method based on slice expression according to embodiment 2 of the present invention.
Fig. 16 is a specific flowchart of step B3 in a panoramic image decoding method based on slice expression according to embodiment 2 of the present invention.
Fig. 17 is a schematic structural diagram of an upsampling coding submodule in the panorama image decoding method based on slice expression according to embodiment 2 of the present invention.
Fig. 18 is a schematic diagram of a slice decoder in a panoramic image decoding method based on slice expression according to embodiment 2 of the present invention.
Fig. 19 is an internal structure diagram of a computer device according to embodiment 3 of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a panoramic image coding method, a decoding method and a related device based on slice expression, which aim to reduce characteristic characterization differences caused by visual angle changes, improve the stability of panoramic image expression, reduce the influence of object deformation caused by latitude changes in a panoramic image on characteristic extraction and enhance the robustness of a model on image geometric deformation.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Example 1.
As shown in the flowchart of fig. 1, a panoramic image encoding method based on slice expression in the present embodiment includes the following steps.
A1, acquiring a panoramic image to be encoded.
A2, performing super-slice image conversion on the panoramic image to be encoded to obtain a super-slice image set; the super slice image set comprises a plurality of super slice images; the sum of the heights of the super slice images is the same as the unfolded height of the panoramic image to be encoded. As shown in the flowchart of fig. 2, step A2 specifically includes the following steps.
A21, carrying out slice image conversion on the panoramic image to be coded in the vertical direction to obtain a plurality of slice images.
Specifically, when the panoramic image to be encoded is subjected to the slicing process in the vertical direction, the number of slice images and the number of pixel columns of each slice image are determined according to equation (1).
(1)。
Wherein T is the number of slice images, T is the index of the slice images, wt is the number of pixel columns of the T slice image, x is the panoramic image to be encoded, codec is the encoding scheme, F () is the metric function, and F () is used for measuring the encoding performance when the parameters of the encoding scheme are T and Wt.
In addition, all the super slice images are spliced together, so that a super slice spliced image can be obtained; the index of any pixel in the panoramic image to be encoded in the super-slice stitched image is determined according to equations (2) and (3).
(2)。
(3)。
Where i is the ordinate of a pixel in the panoramic image to be encoded, j is the abscissa of a pixel in the panoramic image to be encoded, θi is the index of the ith row of pixels of the panoramic image to be encoded in the super-slice stitched image, Φj is the index of the jth column of pixels of the panoramic image to be encoded in the super-slice stitched image, H is the number of rows of pixels of the panoramic image to be encoded, and Wi is the number of columns of pixels of the ith row of the panoramic image to be encoded.
And A22, for any slice image, splicing a plurality of adjacent slice images with the same width as the slice image with the slice image to obtain a super slice image.
A23, forming a super slice image set according to each super slice image.
A3, inputting the super slice image set into a slice encoder, and extracting features of the super slice image set to obtain super slice codes; the slice encoder includes a downsampling encoding submodule, a residual encoding submodule, and an attention submodule.
A4, based on super slice coding, generating a super slice coding quantization result and an priori coding quantization result. As shown in the flowchart of fig. 3, step A4 specifically includes the following steps.
A41, inputting the super slice code into a quantizer module, and generating a super slice code quantization result.
A42, inputting the super slice code into a super prior slice coder to generate prior code.
A43, inputting the priori code into the quantizer module to generate the quantized result of the priori code.
A5, based on the prior coding quantization result, super slice coding prior information is generated.
A6, determining a Gaussian distribution probability model of the super slice coding quantization result based on the super slice coding quantization result and the super slice coding priori information. As shown in the flowchart of fig. 4, step A6 specifically includes the following steps.
A61, inputting the super slice coding quantization result and super slice coding priori information into a parameter network, and determining Gaussian distribution probability model parameters; the gaussian distribution probability model parameters include mean and variance.
A62, determining a Gaussian distribution probability model based on the Gaussian distribution probability model parameters.
A7, generating bit streams of the super slice coding quantization result and bit streams of the priori coding quantization result based on the Gaussian distribution probability model and the priori probability density function; the bit stream of the a priori encoded quantized results is used to assist in decoding the bit stream of the super-slice encoded quantized results in the decoding stage. Step A7 specifically includes the following steps.
A71, determining a discrete probability distribution table corresponding to the super slice coding quantization result based on the Gaussian distribution probability model.
And A72, inputting a discrete probability distribution table corresponding to the super slice coding quantization result into an arithmetic coder to obtain a bit stream of the super slice coding quantization result.
A73, modeling a probability density function of the prior coding quantization result by using a learnable probability function, and generating a discrete probability distribution table corresponding to the prior coding quantization result based on the prior probability density function.
And A74, inputting a discrete probability distribution table corresponding to the prior coding quantization result into an arithmetic coder to obtain a bit stream of the prior coding quantization result.
Specifically, before the step A2 of performing super-slice image conversion on the panoramic image to be encoded to obtain a super-slice image set, the method further comprises the following steps.
Determining a super slice image conversion parameter based on a greedy search method; the super slice image conversion parameter includes the number of pixel columns of the T-th slice image when the number T of slice images is fixed. The method specifically comprises the following steps.
When the number T of slice images is fixed, enumerating the pixel column number of each slice image from top to bottom in sequence for each slice image; the number of pixel columns of each slice image satisfies the constraint shown in expression (4).
(4)。
Wherein T is the label of the th slice image, T is the number of slice images, wt-1 is the number of pixel columns of the T-1 th slice image, and Wt is the number of pixel columns of each of the T th slice images.
In this embodiment, the downsampling encoding submodule, the residual encoding submodule and the attention submodule in the slice encoder each include at least one slice convolution block, and the slice convolution blocks can calculate and obtain convolution results according to the formula (5).
(5)。
Where N is a neighborhood of slice convolution, w (i, j) is a convolution weight, x (pi, qj) represents coordinates of a neighborhood pixel of a point (p, q) of the panoramic image to be encoded in the super slice spliced image, a manhattan distance between the point (p, q) and the point (pi, qj) in a counterclockwise direction parallel to the equator on the spherical surface is iΔx, and a manhattan distance between the point (p, q) and the equator in a perpendicular direction is jΔy.
A neighborhood N of the slice convolution is determined according to equation (6).
(6)。
Where (i, j) is the pixel index and K is the neighborhood range.
The following describes, with a specific example, the slice expression-based panoramic image encoding method provided in this embodiment, first, the parametric slice projection expression mode is provided in this embodiment, as shown in fig. 5, where the left side is an ERP panoramic image, and the right side is a slice projection expression mode thereof.
The ERP image is expressed as x ε RH x W, where H and W are the maximum number of samples for each column and each row, respectively. The mapping function from the spherical surface to the planar surface can be expressed as the following equations (7) and (8).
(7)。
(8)。
Where θ and Φ denote indexes in the latitude and longitude directions, respectively, and when the calculated coordinate value is a decimal, a target value is calculated using bilinear interpolation in order to make the calculated coordinate value an integer.
The proposed parameterized slice projection architecture is a generalized representation of ERP, i.e., changing the W common to each row in equation (8) into an independently adjustable parameter for each row. This structure is defined by a set of parameters { Wi }, where i ε {1, … H }. The expression formula of θi is the same as that of formula (7), and is not changed, and the expression formula of Φj is mainly changed here. Phi j of the sphere-to-plane mapping function can be converted by equation (8) into equation (9).
(9)。
The Wi represents the width of the ith row, and the expression structure provided by the embodiment can accurately control the sampling density of weft coils represented by each row by adjusting Wi. The slice projection architecture is degraded into an ERP projection architecture, particularly when wi=w.
In this example, in order to speed up the computation speed of the slice expression structure, adjacent slices may be combined by sharing parameters between adjacent slices, that is, a plurality of adjacent slice images with only one line each are combined into a super slice image with multiple lines, as shown in fig. 6, 3 original slices with only 1 line may be combined into a super slice image with 3 original slice images.
Specifically, the parameters of each original slice image in any super slice are equal, i.e. the original slice image can be regarded as a block of ht×wt, where Ht represents the number of original slices in the super slice, and Wt represents the common parameter in the super slice, i.e. the width or the number of pixel columns of each original slice image.
In extreme cases, if the parameters of the adjacent slice images are different, the effect of sharing the parameters of the neighborhood in the block can be achieved in the same super slice image through sharing the parameters, and the whole neighborhood of the block can be calculated through preprocessing, so that the operation speed of slice convolution is further increased.
Finally, a super-slice image is denoted by Zt, and the entire panoramic image can be represented as a set of T super-slices, { Zt, T e {1, …, T }. Wherein,Representing the t super slice, the parameters of the super slice are. For the above-mentioned expression structure, different panoramic images may have the optimal parameters of each different parameterized slice expression structure, that is, for one panoramic image, there may be different parameters, such as how many super slice images are divided, how much the height and width of each super slice image are most suitable, and the optimization problem as shown in the formula (10) may be generalized.
(10)。
Wherein T is the number of super slices, ht is the number of original slices in the super slices, wt is the common parameter of each slice image in the super slices, such as the number of pixel columns or width, and x is the panoramic image. It is required here that wt.ltoreq.W, ht.ltoreq.H, W and H being the width and height, respectively, of the original ERP image x. codec represents some image coding scheme. And it is guaranteed that the sum of the number of pixel rows of the slices in all super slices is H, F () is a metric function for measuring a set of parametersIs provided. The performance of image coding is typically represented by a rate-distortion curve. Here, a code rate-distortion curve when the ERP image uses the codec coding scheme is selected as a reference. Given a set of parametersAnd converting the ERP image into a slice expression structure according to the set of parameters, and encoding by using the codec as an encoding scheme to obtain a new code rate-distortion curve set. The metric function F () then uses the Bjontegaard DELTA RATE function to calculate the parameter/>The average saved code rate of the corresponding curve compared with the curve corresponding to the ERP image is that the more the saved code rate is, the/>, is representedThe better the coding performance of the corresponding slice expression structure.
The above-mentioned optimization problem has no direct solution scheme and requires finding the optimal parameter set by traversing all possible solution spaces. The search space for this problem is/>It is difficult to find the optimal solution in polynomial time. To reduce the search space, the present patent simplifies the optimization problem, assuming that all slices are inAll as well,/>. The optimization problem is converted into the expression (11).
(11)。
Given T, the search space for this problem is reduced to TW. In practical application, H is directly assigned to T, so that the search space can be greatly simplified. Although simplified, the complexity of traversing the solution space is still too high. Therefore, in the embodiment, a greedy search algorithm is designed according to the problem, and a better solution can be quickly found through heuristic search. The greedy idea is expressed as follows: the real area of the high latitude area is smaller, and the parameter Wt of the corresponding slice should be smaller than the parameter Wt of the slice corresponding to the low latitude area. Namely, in the northern hemisphere (T is less than or equal to T/2), wt-1 is less than or equal to Wt; in the southern hemisphere (T > T/2), wt+1 is equal to or greater than Wt. In case the above constraint is satisfied, we enumerate the parameters Wt of the t-th slice sequentially from north to south. The width of other slices is kept unchanged during enumeration. Once the optimal parameters of the t-th slice are enumerated, wt is fixed to be the parameters, and thus the parameter optimization process of the slice expression structure is completed. Given a panoramic image dataset X and a slice number T, the above described greedy algorithm-based optimization procedure can be described as follows.
1) All super slice parameters are initialized to the original ERP image width W.
2) Step 3) is performed sequentially from the first slice to the T-th slice.
3) Searching the parameter of the t-th slice. Freezing parameters of other slices, enumerating possible values of Wt, selecting the best performing parameters on dataset X from among themAnd fixing. To narrow the enumeration, if T < T/2, the optimal parameter/>, from the last edge cut that has completed the searchBeginning incrementing to end of W. Otherwise, the optimal parameter/>, of the last edge cut for which the search has been completedThe beginning of the decrement to 0 ends.
In the enumeration process, the searching efficiency can be further improved by sampling 1 sample every k numbers.
For the distortion rate trade-off given a particular parameter configuration Wt. The advantages and disadvantages of the distortion calculation under different codec coding schemes are analyzed here, such as applying a JPEG codec to each tile and aggregating the results, in which case such an estimation is prone to errors due to the difference in compression scheme between DNN-based and legacy codecs. At the same time, simply using DNN-based codecs for individual tiles without regard to neighboring tiles can also lead to inaccurate parameter estimation. As a solution, the various parameter configurations represented by the parameterized slices in the algorithm of this example are tested by replacing the standard convolution in the DNN-based encoder with a parameterized slice convolution and leaving the model parameters unchanged.
The custom parametric slice convolution module in this example consists of a generic 2D convolution layer and a custom slice packing layer and a custom trim layer. After the output result of the previous layer of module is input into the module, the input feature map firstly carries out padding on the feature boundary to obtain the shared neighborhood of each slice on the sphere, then carries out common 2D convolution calculation to extract the features, and finally removes redundant information generated by the 2D convolution on the boundary through clipping.
After the form of the projection expression mode of the parameterized slice is determined, the panoramic image to be encoded is obtained, the panoramic image x is subjected to grouping slicing, the number of the obtained super slice images is the parameter T which is determined based on a parameter optimization algorithm, the heights of all super slice images are the same and are one-half of the height of the ERP image, namely, the original image is cut into T slices and then converted into super slice representations { xi }, i epsilon {1, …, T }.
The pretreatment is performed next: receiving the generated image { xi } in the super slice form, firstly calculating the neighborhood coordinates and interpolation parameters of each slice in advance according to the number and the size of the slices of the image data xi, and taking the calculated neighborhood coordinates and interpolation parameters as the context of shared information and slice convolution for accelerating subsequent calculation, wherein the method specifically comprises the following procedures.
First, a neighborhood in which the slice convolution is defined is shown in equation (12), and the slice convolution result y can be expressed by equation (13).
(12)。
(13)。
Where (p, q) and (pi, qj) represent coordinates of two points on the sphere in the slice expression structure. The Manhattan distance between the point (p, q) and the point (pi, qj) in the anticlockwise direction parallel to the equator on the spherical surface is i delta x, and the Manhattan distance between the point (p, q) and the equator in the perpendicular direction is j delta y.
Because the calculation mode (spherical convolution) based on the Manhattan distance has large calculation amount and poor parallelism, the calculation mode of (pi, qj) is approximated in the example, so that the approximated convolution receptive field is similar to the calculation mode based on the Manhattan distance. The specific approximation formula is defined as formula (14) and formula (15).
(14)。
(15)。
In order to meet the true connection of the sphere, special treatment is performed on the two-pole parts (pi <0 and pi > =h) according to the formula (16) and the formula (17).
(16)。
(17)。
Qj will vary differently according to different settings of slice parameters Wpi and Wp, and when qj is non-integer, x (pi, qj) is interpolated according to adjacent coordinates in the slice by bilinear interpolation in this example.
Then, feature extraction can be performed on the slice image data after the expression form conversion, the step is performed in the slice encoder ga, the input parameters of the step comprise the slice number T of the image and the context parameters generated by self definition, the structure of the step comprises a multi-layer convolution and downsampling module, and multi-scale space information is extracted; and simultaneously, a focusing mechanism is introduced to focus the coding features on important features, and finally, a coding feature map in an NxCxHxW format is output and is determined as a potential feature representation Y. Wherein N represents the neighborhood size, C is the channel number, and H and W are the reduced height and width after multi-layer downsampling by the coding network.
As shown in the structure diagram of fig. 7, the slice encoder ga includes a plurality of downsampling encoding submodules and residual encoding submodules that are sequentially arranged, wherein an attention submodule is connected after the last residual encoding submodule of each two groups, and the slice convolution module customized in this example is connected after the last attention submodule. Where x is the picture subjected to the slicing process and representation conversion in step two and y is the potential feature representation. The parameter |1×1|n×n in the slice convolution indicates that the size of the slice convolution kernel is 1×1, the number of channels of the input data is N, and the number of channels of the output data is N, where N is a preset value, and can be generally changed.
As shown in the structure diagram of fig. 8, the downsampling coding submodule Ba is composed of two branches Ba1 and Ba2, and Ba1 receives the input of the downsampling coding submodule Ba and sequentially performs slice convolution and double downsampling. Ba2 receives the input of the downsampling encoding submodule Ba, and performs the slice convolution and the double downsampling and the slice convolution in sequence. And then, directly adding the data processed by the branches Ba1 and Ba2 to generate the final processing result of the downsampling coding module Ba.
In fig. 8, the parameter |1×1|n×3 in the slice convolution block represents a size of 1×1 of the slice convolution kernel. The number of channels of the input data is 3, and the number of channels of the output data is N. The meaning of 2-fold downsampling is to scale the width and height of the input data to 1/2 of the original in a certain way. As shown in fig. 8, the calculation results after the inputs of Ba are respectively branched are added to obtain the output result.
As shown in the block diagram of fig. 9, the residual coding sub-block Bb is composed of two convolutional layers. In the residual coding submodule Bb layer, there are two branches, one of which directly takes the input data xi as the output data t1 of the branch, and the other branch simultaneously and continuously carries out slice convolution on the input data xi with the size of 3×3 and the number of input channels and the number of output channels being N, so as to generate data t2. And then, connecting t1 and t2 by residual errors, and adding spatial information of the network extraction feature map through a residual error structure while maintaining the information flow to realize downsampling of the feature data. I.e. in the residual coding block Bb module, consists of two convolutional layers and one residual connection. The data xi of the last module data is subjected to continuous two-time slicing convolution to generate corresponding characteristic data t2, and the corresponding characteristic data t1 and t2 are added to generate a final output result of a coding block Bb, wherein the coding block enhances the expression capability of a network, and information loss is avoided, so that training is stable.
In fig. 9, the parameter |3×3|n×n in the slice convolution represents the size of the slice convolution kernel as 3×3. The number of channels of the input data is N, and the number of channels of the output data is N. In the moduleThe output result of (a) is the sum of the results of the two branches. The upper branch is directly the input data, and the lower branch is the result of the input data after two slicing convolutions.
As shown in the block diagram of fig. 10, the attention sub-module Bc is divided into two branches, one branch obtains an output h1 after the feature x1 output by the previous module is preprocessed by three residual blocks (each residual block is formed by three slice convolutions), and the other branch also obtains an attention weight y2 after the feature x1 output by the previous module passes through the three residual blocks, then passes through a convolution kernel (with a size of 1×1), and then is activated by a sigmoid function. And finally multiplying h1 by the attention weight y2 to obtain a characteristic y1, selectively focusing, and adding the characteristic y1 with input data of the attention module Bc to obtain an output characteristic y3, thereby increasing the model capacity and expanding the receptive field.
The parameters of the slice convolution module in fig. 10 are consistent with the meaning previously mentioned, e.g., the parameters |3×3|n/2×n/2 in the slice convolution represent a size of 3×3 of the slice convolution kernel. The number of channels of the input data is N/2, and the number of channels of the output data is N/2. The output of each three sliced convolutions in the two branches above will be added to the previous input (as indicated by the arrow). The result after multiplication of the output result of the upper sigmoid function and the output result of the lowermost branch and addition with the initial input (indicated by the x and + arrow) is the output result of the attention module Bc.
In the last step of the slice encoder ga, the coding feature map is fitted to the [0,1] interval to facilitate subsequent quantization, with the final temporary result being y as the super-slice coding.
Following the slice encoder ga is a quantizer module that receives the super-slice code y from the slice encoder ga that is ultimately generated in the sigmoid activation function, quantizes it, converts the features into discrete quantization indices, and loses the coding information to achieve compression. The quantizer gq (y) selected in this example is a uniform quantizer that performs quantization by adding uniform noise ϵ to the encoded feature y during training=Y+ ϵ. Each entry in the uniform noise ϵ is a small uniform noise in the range of [ -0.5,0.5 ]. During the test, it is quantized directly to the nearest integer, as shown in equation (18).
(18)。
In the formula (18), y is super slice coding, and brackets with only the lower half represent data obtained after quantization by rounding downThe super slice coding quantization result is obtained.
Connected behind the quantizer module is a context network module gc, the purpose of which is to prepare for the probability distribution of the following accurate prediction entropy model by means of the probability distribution of the autoregressive prediction entropy model, quantized dataObtaining intermediate result/>, after passing through context network module
The contextual network module architecture is shown in FIG. 11, with its inputsOutput result/>, which is the result output by the output result y of the slice encoder ga after being quantized by the quantizer moduleFor/>The output after convolution of the slices with the mask. Where the parameters of the sliced convolutions block are consistent with the previously mentioned meanings, the description will not be repeated.
In addition to the processing of the super-slice coding y by the quantizer module and the context network module described above, further information extraction is performed on the result y of the slice encoder in this example. The super prior slice encoder ha is connected behind the slice encoder ga, and before y is input into the super prior slice encoder ha, the expression form of the data is firstly converted from the parameterized slice projection expression mode to the expression mode of ERP projection, namely, the expression mode is converted by using the formula (7) and the formula (9). The condition that the slice height is smaller than 1 in the downsampling of the super prior slice encoder is prevented, and a temporary intermediate result z is generated after a series of convolution and downsampling operations for each sampling point (i, j).
The structure of the super a priori slice encoder ha is shown in fig. 12. The input is a temporary output result y of the slice encoder ga, and after a series of operations such as expression conversion, common convolution and downsampling of the module ha, an intermediate calculation result z is obtained, and is used as prior encoding, and the parameters of the convolution block in the structure are consistent with the meaning mentioned before and will not be repeated.
After obtaining the intermediate calculation result z output by the super prior slice encoder ha, quantizing the intermediate calculation result z by using a quantizer module to obtainAs a priori encoded quantized result, the result is input into a super a priori slice decoder hs, the structure of the super a priori slice decoder hs is shown in fig. 13, after a series of convolution and up-sampling, the data is converted from the expression mode of ERP projection to the expression mode of parameterized slice projection in the last step of a module hs, and a temporary result/> isfinally generated. 2-Fold upsampling refers to scaling the width and height of the input data to twice that of the original by selecting a sampling mode. The parameters of the convolution block in fig. 13 are identical to the previously mentioned meanings and will not be repeated here.
After the context network module gc and the super priori slice decoder hs, a parameter network module is connected to temporarily generate the above obtained resultAnd/>As the input of the parameter network module gp, the structure of the parameter network module gp is shown in fig. 14, and the super slice coding quantization result/> isestimated after 3 convolutionsMean/>, of potential gaussian distribution probability models of (c)Sum of variances/>. These two parameters will help to construct a more accurate probability distribution, thus making the image reconstruction more accurate, the meaning of the parameters of the convolution blocks in the network module is consistent with the meaning mentioned before, and the description will not be repeated.
Finally, the average value determined according to the parameter network moduleSum of variances/>Can determine the super slice coding quantization result/>The discrete probability distribution table of each data in the super slice coding quantization result can be generated by using the Gaussian distribution probability model, the discrete data and the corresponding discrete probability distribution table are input into an arithmetic coder for lossless coding, the process is called entropy coding, a bit stream can be output after the entropy coding is finished, and the coding of the panoramic image is finished.
In the process of entropy coding the super slice coding quantization result, the prior coding quantization result is also codedEntropy encoding is performed to obtain the bit stream thereof for correct decoding at the decoding end.
The embodiment provides a panoramic image coding method based on slice expression, which is used for carrying out super-slice image conversion on an acquired panoramic image to be coded to obtain a super-slice image set serving as a slicing expression form of the panoramic image; and extracting the characteristics of the super slice image set by using a slice encoder to obtain super slice encoding, further generating a super slice encoding quantization result and a priori encoding quantization result, determining a Gaussian distribution probability model according to the super slice encoding quantization result and generating a bit stream of the super slice encoding quantization result and a bit stream of the priori encoding quantization result by using the model, thereby realizing high-performance panoramic image encoding. In the embodiment, the panoramic image to be encoded is converted and represented based on slice expression conversion, the representation reduces characteristic characterization difference caused by visual angle change, is favorable for an encoding network to extract more stable and generalized characteristics, improves the stability of panoramic image expression, reduces the influence of object deformation caused by latitude change in the panoramic image on characteristic extraction, and enhances the robustness of a model on image geometric deformation.
Example 2.
As shown in fig. 15, a panoramic image decoding method based on slice expression in the present embodiment includes the following steps.
B1, acquiring a bit stream set to be decoded; the bit stream set to be decoded comprises bit streams of super slice coding quantization results and bit streams of priori coding quantization results; the bit stream of the super slice coding quantization result and the bit stream of the a priori coding quantization result are bit streams obtained by a slice expression-based panoramic image coding method provided in accordance with embodiment 1.
And B2, decoding the bit stream of the prior coding quantization result to obtain the prior coding quantization result.
And B3, decoding the bit stream of the super slice coding quantization result based on the priori coding quantization result to obtain the super slice coding quantization result. As shown in the flowchart of fig. 16, step B3 specifically includes the following steps.
B31, determining a discrete probability distribution table corresponding to the super slice coding quantization result based on the prior coding quantization result and the bit stream of the super slice coding quantization result.
And B32, inputting a discrete probability distribution table corresponding to the super slice coding quantization result into an arithmetic decoder to obtain the super slice coding quantization result.
And B4, generating a super slice coding inverse quantization result based on the super slice coding quantization result.
B5, inputting the super slice coding inverse quantization result into a slice decoder to generate a panoramic reconstruction image; the slice decoder includes an attention sub-module, a residual coding sub-module, and an upsampling coding sub-module.
In this embodiment, the upsampling encoding submodule, the residual encoding submodule, and the attention submodule of the slice decoder each include at least one slice convolution block. The slice convolution block calculates the convolution result according to the formula (19).
(19)。
Where N is a neighborhood, w (i, j) is a convolution weight, x (pi, qj) represents coordinates of a neighborhood pixel of a point (p, q) in the panoramic image to be encoded in the super-slice stitched image, a manhattan distance between the point (p, q) and the point (pi, qj) in a counterclockwise direction parallel to the equator on the spherical surface is iΔx, and a manhattan distance between the point (p, q) and the equator in a perpendicular direction is jΔy.
A neighborhood of the slice convolution is determined according to equation (20).
(20)。
Where (i, j) is the pixel index and K is the neighborhood range.
Similar to the example of embodiment 1 described above, in this embodiment, the slice-expression-based panorama image decoding method provided by this embodiment is described next with a specific example, and first, a bitstream to be decoded is obtained by the slice-expression-based panorama image decoding method provided by embodiment 1.
The decoding device is initialized, model parameters required by image decoding are loaded, an operation unit is configured, and input of a decoded bit stream is prepared, wherein the input comprises a bit stream of a super slice coding quantization result and a bit stream of an priori coding quantization result.
First, the context network module is initialized, starting to model the image width probability, then starting the decoder, opening the compressed bitstream file to be decoded. First pairDecoding is then performed according to/>Production/>/>, Of the initial gaussian distribution of the first element of (a)And/>A Gaussian mixture model of the current position is obtained, and then a discrete probability table of the position can be obtained. Then according to/>And/>Partial generation/>, already decodedGaussian distribution of the next data to be decodedAnd/>Thereby, the Gaussian mixture model at the current position can further obtain the super slice coding quantization result/>A discrete probability table is generated for all locations. The discrete probability table according to the super slice encoded quantization result decodes the data of the current position from the bitstream file using an arithmetic decoder. In this way, the complete super slice coding quantization result/> can be finally decoded from the compressed bit stream
Encoding quantized results after decoding complete super-slicesThen, firstly, performing inverse quantization treatment on the super slice code inverse quantization result by an inverse quantizer module to obtain the super slice code inverse quantization result/>. It is then input into a subsequent slice decoder gs, decoding into a result/>, of the same size as the original panoramic imageWherein the slice decoder structure is symmetrical to the slice encoder structure ga, comprising an attention module Bc, a residual coding sub-module Bb and an upsampling coding block sub-module be.
The structure of the upsampling encoding sub-module be is shown in fig. 17, the input of the upsampling encoding sub-module be passes through two branches m1 (up), m2 (down), the input data sequentially passes through one-time slice rolling and 2-time upsampling and one-time slice convolution in the branch m1 to generate a temporary result i1, the input data sequentially passes through one-time slice rolling and 2-time upsampling in the branch m2 to generate a temporary result i2, the calculation results of the two branches are added together to finally form the output result of the upsampling encoding sub-module be, wherein the parameters of the slice convolution block are consistent with the meanings mentioned before, and the description is not repeated.
The structure of the slice decoder gs is shown in FIG. 18, which inputs dataFor the result output by the inverse quantizer module, the result/>, is generated after a series of data operationsThe result is a slice image expressed in a slice projection mode for subsequent image reconstruction to generate a complete image.
Decoding result outputted from slice decoder gsSplicing and recombining again according to the channel sequence to obtain a result x1 with the same format as the original image data, namely, splicing again into an image according to the initially preset image after cutting of the T parts, wherein the step is completed by a recombining module.
For the overall model formed by each module in the encoding and decoding process, an objective function is designed in the example as shown in a formula (21), and parameters of each module can be optimized with the aim of minimizing the objective function.
(21)。
Wherein, dv (x,) For distortion calculation, λ is a trade-off parameter, D is a training set, and x is a panoramic image in the training set. Calculation of Dv (x,/>) for distortion) In this example, objective quality indicators based on the viewport are used, which correctly reflect the human visual perception of a 360 ° panoramic image. The view port based index has so far provided the best quality prediction performance on 360 deg. panoramic images.
MSE is used as a basic quality indicator in this example. Firstly, mapping parameterized slice projection back to a unit spherical surface, then carrying out linear projection to sample m view ports (the number of the sampled view ports can be adjusted in a proper range), and obtaining longitude and latitude corresponding to the view ports on the spherical surface through uniform sampling so as to simulate real visual feeling when a person wears VR headset to watch panoramic images. The distortion calculation formula is shown in formula (22).
(22)。
Wherein use is made ofRepresenting the terms from x and/>, respectivelyIs the i-th rectangular projection of (x) is the panoramic image provided,/>The image mapped back to the unit sphere is projected for the parameterized slice. D (·, ·) is a planar image distortion measure. Each viewport is a rectangle of Hv Wv, where/>,/>The field of view (FOV) is/>Together they cover all spherical content. By varying the parameters optimized for different bitrates, in this example one rate-distortion curve is generated for each image x and the average of all curves is taken as the rate-distortion performance of the current parameter configuration of the proposed parametric slice representation.
Is the prior encoding of quantization/>Results of loss calculation for bit rate,/>The calculation formula of (2) is shown as formula (23).
(23)。
Where p (z) is a probability estimate, estimated in an entropy bottleneck model.
For quantized super slice codingBit rate modeling estimation is performed on the bit rate, and the bit rate is/is estimated by the bit rateThe calculation formula of (2) is shown in formula (24).
(24)。
At the position ofIn/>As a gaussian probability density function, where/>Is an integral variable. According to a given mean/>Sum of variances/>To calculate the probability density and then integrate this probability density over an interval of y-0.5, y + 0.5. In this example, the above objective function is used to train on the panoramic image dataset D, determine various parameters of the model, and implement high-performance panoramic image encoding and decoding.
The embodiment provides a panoramic image decoding method based on slice expression, which encodes a panoramic image based on the scheme of embodiment 1 to obtain a bit stream of a super-slice encoding quantization result and a bit stream of a priori encoding quantization result, uses the bit stream of the priori encoding quantization result for auxiliary decoding of the bit stream of the super-slice encoding quantization result, and reconstructs the panoramic reconstructed image after inverse quantization and slice decoding, thereby realizing high-performance panoramic image decoding. In the embodiment, the panoramic image to be encoded is converted and expressed based on slice expression conversion in the encoding stage, so that the feature characterization difference caused by visual angle change is reduced, the encoding network is facilitated to extract more stable and generalized features, the stability of panoramic image expression is improved, the influence of object deformation caused by latitude change in the panoramic image on feature extraction is reduced, the robustness of the model on image geometric deformation is enhanced, and the decoding effect is further improved.
Example 3.
A computer device, which may be a database, may have an internal structure as shown in fig. 19. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store the pending transactions. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a slice-expression-based panoramic image encoding method in embodiment 1 or to implement a slice-expression-based panoramic image decoding method in embodiment 2.
It should be noted that, the object information (including, but not limited to, object device information, object personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) related to the present invention are both information and data authorized by the object or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related countries and regions.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magneto-resistive random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present invention may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (10)

1. A panoramic image encoding method based on slice representation, comprising:
Acquiring a panoramic image to be encoded;
performing super-slice image conversion on the panoramic image to be encoded to obtain a super-slice image set; the super slice image set comprises a plurality of super slice images; the sum of the heights of the super slice images is the same as the unfolded height of the panoramic image to be encoded;
Inputting the super slice image set into a slice encoder, and extracting features of the super slice image set to obtain super slice codes; the slice encoder comprises a downsampling encoding submodule, a residual error encoding submodule and an attention submodule;
Generating a super slice coding quantization result and an priori coding quantization result based on the super slice coding;
determining a Gaussian distribution probability model of the super slice coding quantization result based on the prior coding quantization result and the super slice coding quantization result;
generating a bit stream of the super slice coding quantization result and a bit stream of the prior coding quantization result based on the Gaussian distribution probability model and the prior probability density function; the bit stream of the prior encoded quantized result is used to assist in decoding the bit stream of the super-slice encoded quantized result in a decoding stage.
2. The panoramic image encoding method based on slice expression according to claim 1, wherein the panoramic image to be encoded is subjected to super slice image conversion to obtain a super slice image set, and the method specifically comprises:
performing slice image conversion on the panoramic image to be coded in the vertical direction to obtain a plurality of slice images;
and for any slice image, splicing a plurality of adjacent slice images with the same width as the slice image with the slice image to obtain a super slice image, wherein each super slice image forms a super slice image set.
3. The panoramic image encoding method based on slice expression as recited in claim 2, wherein all super slice images are spliced together to obtain a super slice spliced image; the index of any pixel in the panoramic image to be encoded in the super slice spliced image is determined according to the following formula:
Wherein i is the ordinate of a pixel in the panoramic image to be encoded, j is the abscissa of a pixel in the panoramic image to be encoded, θi is the index of the ith row of pixels of the panoramic image to be encoded in the super slice spliced image, Φj is the index of the jth row of pixels of the panoramic image to be encoded in the super slice spliced image, H is the number of rows of pixels of the panoramic image to be encoded, and Wi is the number of columns of pixels of the ith row of the panoramic image to be encoded.
4. The slice expression-based panorama image encoding method according to claim 2, wherein when the panorama image to be encoded is slice-processed in a vertical direction, the number of slice images and the number of pixel columns of each slice image are determined according to the following formula;
wherein T is the number of slice images, T is the index of the slice images, wt is the number of pixel columns of the T slice images, x is the panoramic image to be encoded, codec is the encoding scheme, F () is a measurement function, and F () is used for measuring the encoding performance when parameters of the encoding scheme are T and Wt.
5. The panoramic image encoding method based on slice representation according to claim 4, further comprising, before performing super slice image conversion on the panoramic image to be encoded to obtain a super slice image set:
Determining a super slice image conversion parameter based on a greedy search method; the super slice image conversion parameter includes the number of pixel columns of the T-th slice image when the number T of slice images is fixed.
6. The slice-expression-based panoramic image encoding method as claimed in claim 5, wherein determining the super slice image conversion parameters based on a greedy search method comprises:
When the number T of slice images is fixed, enumerating the pixel column number of each slice image from top to bottom in sequence for each slice image; the number of pixel columns for each slice image satisfies the following constraint:
Wherein T is the label of the th slice image, T is the number of slice images, wt-1 is the number of pixel columns of the T-1 th slice image, and Wt is the number of pixel columns of each of the T th slice images.
7. The panoramic image encoding method based on slice expression of claim 1, wherein the downsampling encoding submodule, the residual encoding submodule and the attention submodule each comprise at least one slice convolution block;
the slice convolution block calculates a convolution result according to the following formula:
wherein N is a neighborhood, w (i, j) is a convolution weight, x (pi, qj) represents coordinates of a neighborhood pixel of a midpoint (p, q) of a panoramic image to be encoded in a super slice spliced image, manhattan distance between the point (p, q) and the point (pi, qj) in a counterclockwise direction parallel to an equator on a spherical surface is i delta x, and Manhattan distance between the point (p, q) and the point (pi, qj) in a direction perpendicular to the equator is j delta y;
Determining a neighborhood of slice convolutions according to:
Where (i, j) is the pixel index and K is the neighborhood range.
8. A panoramic image decoding method based on slice representation, comprising:
Acquiring a bit stream set to be decoded; the bit stream set to be decoded comprises a bit stream of a super slice coding quantization result and a bit stream of an priori coding quantization result; the bit stream of the super slice coding quantization result and the bit stream of the prior coding quantization result are bit streams obtained according to a slice expression-based panoramic image coding method as set forth in any one of claims 1-7;
Decoding the bit stream of the prior coding quantization result to obtain a prior coding quantization result;
decoding the bit stream of the super slice coding quantization result based on the prior coding quantization result to obtain a super slice coding quantization result;
Generating a super-slice coding inverse quantization result based on the super-slice coding quantization result;
inputting the super slice coding inverse quantization result into a slice decoder to generate a panoramic reconstruction image; the slice decoder includes an attention sub-module, a residual coding sub-module, and an upsampling coding sub-module.
9. The slice-expression-based panoramic image decoding method of claim 8, wherein the upsampling coding submodule, the residual coding submodule and the attention submodule each comprise at least one slice convolution block;
the slice convolution block calculates a convolution result according to the following formula:
wherein N is a neighborhood, w (i, j) is a convolution weight, x (pi, qj) represents coordinates of a neighborhood pixel of a midpoint (p, q) of a panoramic image to be encoded in a super slice spliced image, manhattan distance between the point (p, q) and the point (pi, qj) in a counterclockwise direction parallel to an equator on a spherical surface is i delta x, and Manhattan distance between the point (p, q) and the point (pi, qj) in a direction perpendicular to the equator is j delta y;
Determining a neighborhood of slice convolutions according to:
Where (i, j) is the pixel index and K is the neighborhood range.
10. A computer device, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to perform the steps of a slice-expression-based panorama image encoding method according to any one of claims 1-7 or to perform the steps of a slice-expression-based panorama image decoding method according to any one of claims 8-9.
CN202410436958.9A 2024-04-12 2024-04-12 Panoramic image coding method, decoding method and related device based on slice expression Active CN118042133B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410436958.9A CN118042133B (en) 2024-04-12 2024-04-12 Panoramic image coding method, decoding method and related device based on slice expression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410436958.9A CN118042133B (en) 2024-04-12 2024-04-12 Panoramic image coding method, decoding method and related device based on slice expression

Publications (2)

Publication Number Publication Date
CN118042133A true CN118042133A (en) 2024-05-14
CN118042133B CN118042133B (en) 2024-06-28

Family

ID=90986231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410436958.9A Active CN118042133B (en) 2024-04-12 2024-04-12 Panoramic image coding method, decoding method and related device based on slice expression

Country Status (1)

Country Link
CN (1) CN118042133B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020256442A1 (en) * 2019-06-20 2020-12-24 주식회사 엑스리스 Method for encoding/decoding image signal and apparatus therefor
CN113411615A (en) * 2021-06-22 2021-09-17 深圳市大数据研究院 Virtual reality-oriented latitude self-adaptive panoramic image coding method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020256442A1 (en) * 2019-06-20 2020-12-24 주식회사 엑스리스 Method for encoding/decoding image signal and apparatus therefor
CN113411615A (en) * 2021-06-22 2021-09-17 深圳市大数据研究院 Virtual reality-oriented latitude self-adaptive panoramic image coding method

Also Published As

Publication number Publication date
CN118042133B (en) 2024-06-28

Similar Documents

Publication Publication Date Title
US11153566B1 (en) Variable bit rate generative compression method based on adversarial learning
US11836954B2 (en) 3D point cloud compression system based on multi-scale structured dictionary learning
US11221990B2 (en) Ultra-high compression of images based on deep learning
CN113766228A (en) Point cloud compression method, encoder, decoder, and storage medium
US10192353B1 (en) Multiresolution surface representation and compression
CN101626509A (en) Methods and devices for encoding and decoding three dimensional grids
CN111524232B (en) Three-dimensional modeling method, device and server
WO2022191872A1 (en) Method and apparatus for haar-based point cloud coding
CN118042133B (en) Panoramic image coding method, decoding method and related device based on slice expression
CN116600119B (en) Video encoding method, video decoding method, video encoding device, video decoding device, computer equipment and storage medium
CN116051609B (en) Unsupervised medical image registration method based on band-limited deformation Fourier network
CN115393452A (en) Point cloud geometric compression method based on asymmetric self-encoder structure
CN113344786B (en) Video transcoding method, device, medium and equipment based on geometric generation model
CN117750021B (en) Video compression method, device, computer equipment and storage medium
WO2023231872A1 (en) Coding method and apparatus, decoding method and apparatus, and device
CN112801919B (en) Image defogging model training method, defogging processing method and device and storage medium
CN116137050B (en) Three-dimensional real person model processing method, processing device, electronic equipment and storage medium
CN113674369B (en) Method for improving G-PCC compression by deep learning sampling
WO2023197990A1 (en) Coding method, decoding method and terminal
CN111988609B (en) Image encoding device, probability model generating device, and image decoding device
WO2022227073A1 (en) Multiresolution deep implicit functions for three-dimensional shape representation
WO2023248486A1 (en) Information processing device and method
Quach Deep learning-based Point Cloud Compression
Lin et al. Sparse Tensor-based point cloud attribute compression using Augmented Normalizing Flows
WO2024086099A1 (en) Dynamic mesh geometry displacements for a single video plane

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant