CN117956165A

CN117956165A - 360-Degree video stream coding method based on slicing and optimization method and system thereof

Info

Publication number: CN117956165A
Application number: CN202211365606.6A
Authority: CN
Inventors: 李成林; 高文轩; 潘新龙; 吕浩然; 戴文睿; 邹君妮; 熊红凯; 王海鹏; 刘瑜
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2024-04-30

Abstract

The invention provides a 360-degree video stream coding method based on slicing, and an optimization method and a system thereof, wherein the optimal coding parameters are selected: the slice height, slice width and quantization step size minimize video distortion displayed by the user window at a given transmission bandwidth, thereby achieving a tradeoff between transmission efficiency and coding efficiency. In particular, the transmission efficiency is quantified by estimating the average number of pixels required to transmit the overlay user window, and the coding efficiency is predicted from the video content characteristics using a trained CNN-based predictor, expressed as the number of bits required to encode a pixel. Then, a rate distortion optimization problem based on slice coding is proposed, and the problem is effectively solved by a method for reducing a feasible coding parameter set, so that optimal coding parameters are obtained. Simulation results show that the proposed optimal slicing coding is superior to the fixed slicing and the most advanced adaptive slicing method in terms of rate-distortion behavior.

Description

360-Degree video stream coding method based on slicing and optimization method and system thereof

Technical Field

The invention relates to the technical field of video coding, in particular to a 360-degree video stream coding method based on slicing and an optimization method and system thereof.

Background

360-Degree video is a research hotspot in the current multimedia field, and is widely used in telepresence and Virtual Reality (VR). The viewer can freely rotate the head-mounted display to feel an immersive experience. Due to the ultra-high resolution of 360 degree video, 360 degree video is typically presented to users using slice-based coding and streaming. In particular, spherical video frames are first projected onto a plane, with the most common projection method being Equidistant Rectangular Projection (ERP). The 2D projection of the spherical video frame is further divided into a plurality of rectangles by slice encoding, each rectangular slice being encoded and transmitted independently. Then, upon request, the tiles covering the user window are transmitted with high quality, while the remaining tiles are either not transmitted or are transmitted with low quality.

The smaller tile size reduces the non-overlapping (redundant) area between the user view port and the transmitted tile, thereby reducing the number of pixels required for transmission and further improving transmission efficiency. However, reducing the tile size inevitably results in an increased number of tiles per frame, as the headers of these tiles increase, and the efficiency of spatial and temporal prediction decreases, which in turn increases the number of bits required to encode the pixels. On the other hand, the quantization step size directly affects the rate-distortion behavior of the encoding. Smaller quantization step sizes may reduce visual distortion while increasing the coding rate to consume higher transmission bandwidth.

Through the search of the prior art, the current 360-degree video slice coding scheme can be mainly divided into two types: uniform slicing scheme and non-uniform slicing scheme.

For a uniform slicing scheme, concolato et al published in IEEE transactions on circuits AND SYSTEMS for video technology, 2017 under the heading: the article "ADAPTIVE STREAMING of HEVC TILED videos using MPEG-DASH" proposes a slice-based adaptive transmission method using MPEG-DASH and HEVC encoded video, in which the entire video frame is uniformly divided into slices of the same size.

For the non-uniform slicing scheme, ozcinar et al published in 2019 under the name of "Visual attention-aware omnidirectional video streaming using optimal tiles for virtual reality" on IEEE Journal on EMERGING AND SELECTED Topics in Circuits AND SYSTEMS, proposed a non-uniform slicing scheme based on visual attention of 360 degrees video, which uses an entire tile for encoding a high-dimensional region of 360 degrees video, and non-uniform slicing for a region near the equator, so as to obtain better rate-distortion performance than transmitting the whole video. Carreira et al, in 2021, conference IEEE International Conference on Image Processing, published "Attention-driven tile splitting method for improved efficiency of omnidirectional versatile video coding", as a user attention-based slicing method by setting vertical and horizontal CTU-aligned slicing boundaries for VVC-based coding.

The influence of the slice size on the user experience is not deeply explored in the uniform slice scheme, and the relation between the coding transmission efficiency and the coding parameters (slice size and quantization step size) is not explicitly disclosed in the non-uniform slice scheme. Therefore, the existing 360-degree video slicing scheme still has the problems, no description or report of similar technology as the present invention is found, and similar data at home and abroad are not collected.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a 360-degree video stream coding method based on slicing, an optimization method and a system thereof, and a corresponding terminal and medium.

According to one aspect of the present invention, there is provided a 360 degree video stream coding optimization method based on slicing, including:

extracting a viewpoint area of a user, and estimating the average transmission pixel number under different coding parameters;

Processing an intra-frame coding frame and an inter-frame coding frame of a video frame, counting the average value, the median and the standard deviation of processing results to obtain characteristics representing video contents, and splicing the characteristics of the video contents with different coding parameters respectively for predicting the number of bits per pixel;

Obtaining a bandwidth estimation value according to the obtained average transmission pixel number and bit number per pixel under different coding parameters, carrying out optimization problem solving according to the bandwidth estimation value and bandwidth limitation (determined according to the actual network condition of a user), and outputting the optimal coding parameters under the limited bandwidth;

wherein the encoding parameters include: slice height, slice width, and/or quantization step size.

Optionally, the extracting the viewpoint area of the user, estimating the average number of transmission pixels under different coding parameters includes:

Extracting a viewpoint area of a user by prior information of the viewpoint of the user;

calculating a specific view port of the user by adopting a geometric mapping and ERP projection method based on the obtained view point region of the user;

judging whether a specific view port of the user coincides with the slice according to different slice heights and slice widths, and further calculating actual transmission pixels of the user;

And averaging the actual transmission pixels of all users to obtain the average transmission pixel number under different slice heights and slice widths.

Optionally, the processing the intra-frame and the inter-frame of the video frame, and counting the average value, the median and the standard deviation of the processing result, includes:

dividing video frames of a 360-degree video stream into two groups of intra-frame coding and inter-frame coding;

Calculating the intra-frame coding frame and the inter-frame coding frame by using a Laplacian operator to obtain the sum of second-order differential values of the intra-frame coding frame and the inter-frame coding frame;

calculating the average value, the median and the standard deviation of the sum of all the second-order differential values, and carrying out statistics;

wherein, the intra-frame coding pixel value is the actual pixel value of the frame adopting the intra-frame coding mode, and the inter-frame coding pixel value is the pixel value difference value between the current frame and the previous frame;

The splicing the features of the video content with different coding parameters for predicting the number of bits per pixel, includes:

constructing a pretrained predictor based on a convolutional neural network;

splicing the characteristics of the video content with different slice heights, slice widths and quantization step sizes in the same dimension to obtain an input characteristic vector of the predictor;

predicting, by the predictor, a coding efficiency, the coding efficiency representing a number of bits required to code one pixel, i.e., the number of bits per pixel;

Wherein:

the magnitude of the input feature vector of the predictor is: b×n×6, where B is the actual batch size of the training, N is the number of frames contained in one group of pictures, and 6 represents the concatenation of video content features and different coding parameters.

Optionally, the calculating the intra-frame encoded frame and the inter-frame encoded frame by using the laplace operator to obtain a sum of second order differential values of pixel values of the intra-frame encoded frame and the inter-frame encoded frame includes:

convolving the intra-frame coded frame and the inter-frame coded frame by adopting a Laplacian operator convolution check; wherein, the Laplacian convolution kernel is:

calculating the sum of second-order differential values of pixel values of the intra-frame coding frame and the inter-frame coding frame by using a Laplacian operator on a two-dimensional space; wherein, laplacian on the two-dimensional space is:

Where Δf is the laplace operator operation on f, f is the pixel value of the corresponding frame, x is the image transverse coordinate, and y is the image longitudinal coordinate.

Optionally, the obtaining a bandwidth estimation value according to the obtained average transmission pixel number and bit number per pixel under different coding parameters, and performing optimization problem solving according to the bandwidth estimation value and bandwidth limitation, and outputting the optimal coding parameters under the limited bandwidth, including:

multiplying the average transmission pixel number and bit number under different coding parameters by each pixel to obtain a bandwidth estimation value;

according to the bandwidth estimation value and the bandwidth limitation, solving a target optimization problem of minimizing the user viewport error under the limited bandwidth;

aiming at the target optimization problem, searching and outputting optimal coding parameters in the set super-parameter actual coding subset.

Optionally, the user viewport error is defined as follows:

Wherein, the V-MSE is the error of the user's viewport, Representing the expected value of the user viewport error,/>For a user i, dividing a slice included in a view port at a frame f and an MSE value of a 360-degree video stream to be coded, wherein M is a total transmission frame number, and N is the total number of users;

The said The calculation is as follows:

Where τ (k, i, f) is the ratio of the overlap of view port and slice k to the view port for user i at frame f, and MSE (k, f) is the MSE value of slice k at frame f.

Optionally, the solving the objective optimization problem that minimizes the user viewport error under the limited bandwidth includes:

the construction objective optimization problem is as follows:

the constraint conditions are as follows:

Where w _t is the slice width, h _t is the slice height, qstep is the quantization step size, B is the bandwidth limit, For a set of slices that coincides with the view port of user i, k is the identity of the slice, η _k(w_t,h_t, qstep) is the number of bits per pixel of slice k under the corresponding coding parameters.

Optionally, for the objective optimization problem, searching and outputting an optimal coding parameter in the set super-parameter actual coding subset, including:

reconstructing the target optimization problem:

Constraint conditions:

Wherein E _p(w_t,h_t) is the average number of transmission pixels, Defining a function for the super-parametric actual encoded subset, thenK groups of coding parameters representing that the bandwidth estimation value and the bandwidth limit B have the smallest difference in absolute value;

by solving the optimization problem, searching and outputting the optimal coding parameters in the set super-parameter actual coding subset are completed.

According to another aspect of the present invention, there is provided a slice-based 360 degree video stream coding optimization system, comprising:

A transmission efficiency estimation module for extracting a viewpoint area of a user and estimating an average transmission pixel number under different coding parameters;

The coding efficiency estimation module is used for processing intra-frame coding frames and inter-frame coding frames of video frames, counting the average value, the median and the standard deviation of processing results, taking the average value, the median and the standard deviation as characteristics for representing video contents, splicing the characteristics of the video contents with different coding parameters respectively, and predicting the number of bits per pixel;

The optimization problem solving module is used for obtaining a bandwidth estimation value according to the obtained average transmission pixel number and bit number per pixel under different coding parameters, carrying out optimization problem solving according to the bandwidth estimation value and bandwidth limitation (determined according to the actual network condition of a user), and outputting the optimal coding parameters under the limited bandwidth;

According to a third aspect of the present invention, there is provided a 360-degree video stream coding method based on slicing, by adopting any one of the above-mentioned optimization methods, in a 360-degree video stream coding process, transmission efficiency is obtained by extracting average transmission pixel numbers under different coding parameters in a user viewpoint area, coding efficiency is obtained by predicting bit numbers per pixel, an optimal coding parameter under a bandwidth limitation is obtained by solving an optimization problem of a bandwidth estimation value and a bandwidth limitation, and finally coding of a 360-degree video stream is achieved by the obtained optimal coding parameter.

According to a fourth aspect of the present invention there is provided a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program being operable to perform the optimisation method of any one of the above, or to run the optimisation system of any one of the above, or to perform the encoding method of any one of the above embodiments of the present invention.

According to a fifth aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, is operative to perform the optimization method of any one of the above, or to run the optimization system of any one of the above, or to perform the encoding method of any one of the above embodiments of the present invention.

Due to the adoption of the technical scheme, compared with the prior art, the invention has at least one of the following beneficial effects:

The 360-degree video stream coding method based on the slicing and the optimizing method and the optimizing system thereof adopt a more universal slicing-based scheme, and the optimal balance between the size of the slicing and the quantization step size is realized by maximizing the user watching experience under the constraint of the transmission bandwidth.

The 360-degree video stream coding method based on the slicing and the optimizing method and the optimizing system thereof can accurately estimate the number of transmission pixels and the number of bits per pixel based on the viewpoint information of the user, the size of the slicing and the quantization step length.

Compared with the existing uniform slicing method and adaptive slicing method, the 360-degree video stream coding method based on slicing and the optimization method and system thereof have better performance in the aspect of rate distortion behaviors.

The 360-degree video stream coding method based on the slicing and the optimizing method and the optimizing system thereof adopt a high-efficiency CNN-based rate distortion optimizing coding scheme, jointly optimize the slicing size and the quantization step length of the 360-degree video stream, and maximize the watching experience under a given transmission bandwidth.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

fig. 1 is a flowchart illustrating a 360-degree video stream coding optimization method based on slices in a preferred embodiment of the present invention.

Fig. 2 is a block diagram of a 360-degree video stream coding optimization system based on slices in a preferred embodiment of the invention.

Fig. 3 is a block diagram showing the structure of a CNN neural network for predicting the number of bits per pixel according to a preferred embodiment of the present invention.

Detailed Description

The following describes embodiments of the present invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and detailed implementation modes and specific operation processes are given. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the invention, which falls within the scope of the invention.

As shown in fig. 1, an embodiment of the present invention provides a 360-degree video stream coding optimization method based on slicing, which may include:

s1, extracting a viewpoint area of a user, and estimating the average transmission pixel number under different coding parameters;

S2, processing intra-frame coding frames and inter-frame coding frames of the video frames, counting the average value, the median and the standard deviation of processing results to obtain characteristics representing video contents, and splicing the characteristics of the video contents with different coding parameters respectively for predicting the number of bits per pixel;

S3, obtaining a bandwidth estimation value according to the obtained average transmission pixel number and bit number per pixel under different coding parameters, solving an optimization problem according to the bandwidth estimation value and bandwidth limitation (determined according to the actual network condition of a user), and outputting the optimal coding parameters under the limited bandwidth;

Wherein the coding parameters include: slice height, slice width, and/or quantization step size.

The following further describes the steps of the optimization method provided in the embodiment of the present invention.

In a preferred embodiment of S1, the average number of transmitted pixels at different slice heights and slice widths is calculated based on the user viewpoint information of the extracted viewpoint area.

Further, an example analysis is performed below on the average number of transmission pixels of the encoding optimization method of the slice-based 360-degree video. Firstly, according to viewpoint information of a user, the easiest, southwest and northst points of a viewing port on a sphere can be projected on an ERP plane in a geometric projection and ERP projection mode. Then, the minimum rectangular boundary frame projected by the user view port is uniquely determined by the four boundary points projected on the ERP plane, according to the minimum rectangular boundary frame, a sliced set with an overlapping part with the rectangular boundary frame can be obtained, and according to the height and width of the sliced set, the actual transmission pixel number of the user can be obtained. And calculating the expected number of transmission pixels of all users, namely the average number of transmission pixels.

In a preferred embodiment of S2, the number of bits per pixel is estimated based on the 360 degree video stream content and different encoding parameters.

Further, by dividing the video frames into two groups of intra-coding and inter-coding, the temporal correlation of the video frames and the different coding modes are taken into account when estimating the number of bits per pixel. Then, the sum of second order differential values of pixel values of the intra-coded frame and the inter-coded frame is calculated using the laplace operator, and the spatio-temporal complexity of the video is estimated. The intra-frame coding frame pixel value is an actual pixel value of a frame adopting an intra-frame coding mode, and the inter-frame coding frame pixel value is represented by a pixel value difference value between a current frame and a previous frame. Further, a pretrained predictor based on a Convolutional Neural Network (CNN) is constructed, the average value, the median and the standard deviation of the sum of all second-order differential values are calculated to form a video feature vector, and the video feature vector is spliced with input video coding parameters (slice height, slice width and quantization step length) to be used as the input of the predictor. The stitching here refers to stitching the feature vectors of the input network in the same dimension, which is a method known in the neural network field, and will not be described herein. The CNN neural network structure as a predictor is shown in FIG. 3, and three one-dimensional convolution layers, namely a 2-layer pooling layer and three fully connected layers, are used, wherein the ReLU function between the layers is used as an activation function. For the input feature vector of the CNN network, the size is b×n×6, where B is the actual batch size of training, N is the number of frames contained in one GOP (Group of pictures), and 6 represents the concatenation of the video content features (average, median, standard deviation) and the coding parameters (slice height, slice width, quantization step).

Further, the method for calculating the intra-frame coding frame and the inter-frame coding frame by using the Laplacian operator to obtain the sum of second-order differential values of pixel values of the intra-frame coding frame and the inter-frame coding frame comprises the following steps:

convolving the intra-frame coding frame and the inter-frame coding frame by adopting a Laplacian operator convolution check; wherein the laplace operator convolution kernel is:

where Δf is the laplace operator operation on f, where f is the corresponding function, here the pixel value of the frame, x is the image transverse coordinate, and y is the image longitudinal coordinate.

In a preferred embodiment of S3, the optimal coding parameters under bandwidth limitation are based on the bandwidth estimation values.

in order to maximize the user viewing experience (i.e., minimize the user viewport error), the user viewport error needs to be further defined, as follows:

Wherein, the V-MSE is the error of the user's viewport, Representing the expected value of the user viewport error,/>For the MSE value of the 360-degree video stream to be encoded and the slice included in the view port of the user i at the frame f, M is the total transmission frame number, and N is the total number of users.

For the followingIs calculated by the following steps:

Where τ (k, i, f) represents the ratio of the overlap of view port and slice k to the view port for user i at frame f, and MSE (k, f) refers to the MSE value of slice k at frame f.

The aim of this embodiment is to select coding parameters that maximize the user viewing experience (i.e. minimize the user viewport error) under bandwidth constraints, which problem can be modeled as:

Objective optimization problem:

Constraint conditions:

Where w _t is the slice width, h _t is the slice height, qstep is the quantization step size, B is the bandwidth limit, For a set of slices that coincides with the view port of user i, k is the identity of the slice, η _k(w_t,h_t, qstep) is the number of bits per pixel of slice k under the corresponding coding parameters. Since each slice is encoded independently, it is difficult to obtain the number of bits per slice per pixel, and furthermore/>The introduction of non-linear factors makes it more difficult for the optimization problem to obtain an optimal solution. Therefore, the invention reconstructs the original optimization problem:

Objective optimization problem:

Constraint conditions:

Where E _p(w_t,h_t) is the average number of transmitted pixels, η (w _t,h_t, qstep) is the number of bits per pixel of the entire video, used to approximate the number of bits per pixel per tile. Definition of the definition Defining a function for the super-parametric actual encoded subset forK sets of coding parameters whose meaning is to estimate the minimum absolute value of the difference between bandwidth and bandwidth limit B by using/>The constraint is relaxed to account for errors due to approximation. By selecting a proper k value, the balance between the calculation complexity and the optimization result is realized, and the optimal coding parameter under the bandwidth limitation is obtained.

As shown in fig. 2, an embodiment of the present invention provides a 360-degree video stream coding optimization system based on slicing, and the whole system may be divided into three modules: a transmission efficiency estimation module, a coding efficiency estimation module and an optimization problem solving module; wherein:

The coding efficiency estimation module is used for processing intra-frame coding frames and inter-frame coding frames of the video frames, counting the average value, the median and the standard deviation of processing results, taking the average value, the median and the standard deviation as characteristics for representing video contents, splicing the characteristics of the video contents with different coding parameters respectively, and predicting the number of bits per pixel;

In a preferred embodiment, the transmission efficiency estimation module extracts the user's viewpoint area with a priori information of the user's viewpoint, and further estimates the average number of transmission pixels at different tile heights and widths.

In a preferred embodiment, the coding efficiency estimation module uses the laplace operator to process the intra-frame coding frame and the inter-frame coding frame respectively, counts the average value, the median and the standard deviation of the result as the characteristics representing the video content, and splices the video content characteristics with the coding parameter slice height, the slice width and the quantization step length to obtain the input characteristic vector of the CNN network, and predicts the number of bits per pixel through the CNN network.

In a preferred embodiment, the optimization problem solving module obtains a bandwidth estimation value according to the obtained average transmission pixel number and bit number per pixel under different coding parameters, and performs optimization problem solving under the bandwidth limiting condition according to the bandwidth estimation value, so as to output the optimal coding parameters.

In a preferred embodiment, the viewpoint information in the transmission efficiency estimation module may be obtained from the currently existing viewpoint prediction method or the statistics of the visual attention dataset of the existing 360-degree video.

In a preferred embodiment, the user view port in the transmission efficiency estimation module is obtained through geometric projection and ERP projection, and the number of fragments actually transmitted and the number of pixels actually transmitted by the user are further calculated.

In a preferred embodiment, the coding efficiency transmission module uses the laplace operator to operate on the video frames of intra-frame coding and inter-frame coding, and uses the average value, the median and the standard deviation of the statistical result as video characteristics, and the statistical result is spliced with the video coding parameter vector, and is input into the CNN neural network to predict the bit number per pixel.

In a preferred embodiment, the optimization problem solving module models the solved problem as an optimization problem that maximizes the user viewing experience under bandwidth constraints by selecting encoding parameters.

In a preferred embodiment, for each slice is independently encoded and bandwidth estimation is difficult, using average transmission pixel number and bit number per pixel approximation to estimate transmission bandwidth, for this approximation error, the invention is implemented by selectingDefining a function for the actual encoding subset of the super-parameters for/>The meaning estimates the k groups of coding parameters with the smallest absolute value difference between the bandwidth and the bandwidth limit B, and the compromise between the solving result and the calculating complexity is obtained by setting different k.

In some embodiments of the invention:

A transmission efficiency estimation module comprising: a viewport calculator unit and a transmission pixel calculator unit; the view port calculator unit firstly calculates a specific view port of a user through a geometric mapping and ERP (Equirectangular) projection mode through the view point of the user. And in the transmission pixel calculator unit, judging whether the user viewing port is overlapped with the fragments according to different fragment widths and fragment heights, further calculating actual transmission pixels of the users, and averaging the transmission pixels of all the users to obtain the average transmission pixel number.

And the coding efficiency estimation module is used for representing the intra-frame coding pixel value as the actual pixel value of the frame adopting the intra-frame coding mode through the pixel value difference value of the current frame and the previous frame. For the input feature vector of the CNN network, the size is b×n×6, where B is the actual batch size of training, N is the number of frames contained in one GOP (Group of pictures), and 6 represents the concatenation of the video content features (average, median, standard deviation) and the coding parameters (slice height, slice width, quantization step).

And the optimization problem solving module obtains an estimated bandwidth according to the number of transmission pixels and the number of bits under different coding parameters per pixel, and searches the subset to obtain the optimal coding parameters according to the input bandwidth limit and the set size of the subset of the super-parameter actual coding.

Maximizing the user viewing experience requires further definition, as follows:

Wherein, For the MSE value of the slice and the coded video included in the viewport of the user i at the frame f, M is the total transmission frame number, and N is the total number of users.

For the followingIs calculated by the following steps:

For the optimization problem, the original problem is specifically as follows:

Objective optimization problem:

Constraint conditions:

According to the proposed original optimization problem, an optimization problem solving method is proposed, and the method specifically comprises the following steps:

Objective optimization problem:

Constraint conditions:

Wherein E _p(w_t,h_t) is the average number of transmitted pixels, defining Define a function for the subset, for/>The k groups of coding parameters with the minimum absolute value difference between the meaning estimation bandwidth and the bandwidth limit B are used for obtaining the compromise between the solving result and the calculating complexity by solving the optimizing problem.

And combining the transmission efficiency estimation module, the coding efficiency estimation module and the optimization problem solving module to finally give the optimal coding parameters under the bandwidth limitation.

It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules in the system, and those skilled in the art may refer to the technical scheme of the method to implement the composition of the system, or may refer to the technical scheme of the system to implement the steps of the method, that is, the method and the embodiments in the system may be understood as preferred examples, which are not repeated herein.

An embodiment of the present invention further provides a 360 degree video stream coding method based on slicing, by adopting the optimization method according to any one of the above embodiments of the present invention, in a 360 degree video stream coding process, transmission efficiency is obtained by extracting average transmission pixel numbers under different coding parameters in a user viewpoint area, coding efficiency is obtained by predicting bit numbers per pixel, an optimal coding parameter under a bandwidth limitation is obtained by solving an optimization problem of a bandwidth estimation value and a bandwidth limitation, and finally coding of the 360 degree video stream is achieved by the obtained optimal coding parameter.

An embodiment of the present invention provides a terminal, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor is configured to execute the optimization method according to any one of the foregoing embodiments of the present invention, or execute the optimization system according to any one of the foregoing embodiments of the present invention, or execute the encoding method according to any one of the foregoing embodiments of the present invention when the processor executes the program.

Optionally, a memory for storing a program; memory, which may include volatile memory (english) such as random-access memory (RAM), such as static random-access memory (SRAM), double data rate synchronous dynamic random-access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM), and the like; the memory may also include a non-volatile memory (English) such as a flash memory (English). The memory is used to store computer programs (e.g., application programs, functional modules, etc. that implement the methods described above), computer instructions, etc., which may be stored in one or more memories in a partitioned manner. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

The computer programs, computer instructions, etc. described above may be stored in one or more memories in partitions. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

And a processor for executing the computer program stored in the memory to implement the steps in the method or the modules of the system according to the above embodiments. Reference may be made in particular to the description of the previous method and system embodiments.

The processor and the memory may be separate structures or may be integrated structures that are integrated together. When the processor and the memory are separate structures, the memory and the processor may be connected by a bus coupling.

An embodiment of the present invention further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, is operative to perform the optimization method of any of the above embodiments of the present invention, or to run the optimization system of any of the above embodiments of the present invention, or to perform the encoding method of any of the above embodiments of the present invention.

The 360-degree video stream coding method based on the slicing, the optimizing method and the optimizing system thereof, which are provided by the embodiment of the invention, adopt a more general slicing-based scheme, and realize the optimal balance between the size of the slicing and the quantization step length by maximizing the user watching experience under the constraint of the transmission bandwidth; the number of transmission pixels and the number of bits per pixel can be accurately estimated based on the user viewpoint information, the size of the fragments and the quantization step length; compared with the existing uniform slicing method and adaptive slicing method, the method has better performance in the aspect of rate distortion behavior; by adopting an efficient CNN-based rate distortion optimized coding scheme, the slice size and quantization step size of the 360-degree video stream are jointly optimized, so that the viewing experience is maximized under a given transmission bandwidth.

The foregoing embodiments of the present invention are not all well known in the art.

The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention.

Claims

1. The 360-degree video stream coding optimization method based on the slicing is characterized by comprising the following steps of:

Obtaining a bandwidth estimation value according to the obtained average transmission pixel number and the bit number per pixel under different coding parameters, carrying out optimization problem solving according to the bandwidth estimation value and the bandwidth limitation, and outputting the optimal coding parameters under the limited bandwidth;

2. The method for optimizing 360-degree video stream coding based on slicing according to claim 1, wherein the extracting the viewpoint area of the user, estimating the average number of transmission pixels under different coding parameters, comprises:

3. The method for optimizing 360-degree video stream coding based on slicing according to claim 1, wherein the processing intra-frame and inter-frame of the video frame, and counting the average, median and standard deviation of the processing results, comprises:

Calculating the intra-frame coding frame and the inter-frame coding frame by using a Laplacian operator to obtain the sum of second-order differential values of pixel values of the intra-frame coding frame and the inter-frame coding frame;

Wherein, the pixel value of the frame of the intra-frame coding is the actual pixel value of the frame of the intra-frame coding mode, and the pixel value of the frame of the inter-frame coding is the pixel value difference value between the current frame and the previous frame;

constructing a pretrained predictor based on a convolutional neural network;

Wherein:

4. The method for optimizing 360-degree video stream coding based on slicing of claim 3, wherein said computing intra-frame encoded frames and inter-frame encoded frames using laplace operator to obtain a sum of second order differential values of pixel values of the intra-frame encoded frames and the inter-frame encoded frames comprises:

5. The method for optimizing 360-degree video stream coding based on slicing according to claim 1, wherein obtaining a bandwidth estimation value according to the obtained average transmission pixel number and bit number per pixel under different coding parameters, and solving an optimization problem according to the bandwidth estimation value and bandwidth limitation, and outputting an optimal coding parameter under the limited bandwidth, comprises:

6. The slice-based 360 degree video stream encoding optimization method of claim 5, further comprising any one or more of:

-the user viewport error is defined as follows:

The said The calculation is as follows:

Wherein τ (k, i, f) is the ratio of the overlapping part of the view port and the slice k at the frame f to the view port for the user i, and MSE (k, f) is the MSE value of the slice k at the frame f;

-said solving a target optimization problem that minimizes user viewport errors under a limited bandwidth, comprising:

the construction objective optimization problem is as follows:

the constraint conditions are as follows:

Where w _t is the slice width, h _t is the slice height, qstep is the quantization step size, B is the bandwidth limit, K is the identification of the slice, eta _k(w_t,h_t and qsetp) is the bit number per pixel of the slice k under the corresponding coding parameters;

-said searching and outputting optimal coding parameters in the set super parameter actual coding subset for said target optimization problem, comprising:

reconstructing the target optimization problem:

Constraint conditions:

Wherein E _p(w_t,h_t) is the average number of transmission pixels, Defining a function for the super-parameter actual coding subset, then/>K groups of coding parameters representing that the bandwidth estimation value and the bandwidth limit B have the smallest difference in absolute value;

7. A slice-based 360-degree video stream coding optimization system, comprising:

The optimization problem solving module is used for obtaining a bandwidth estimation value according to the obtained average transmission pixel number and bit number per pixel under different coding parameters, carrying out optimization problem solving according to the bandwidth estimation value and the bandwidth limitation, and outputting the optimal coding parameters under the limited bandwidth;

8. The 360-degree video stream coding method based on slicing is characterized in that the optimization method according to any one of claims 1-6 is adopted, in the 360-degree video stream coding process, transmission efficiency is obtained by extracting average transmission pixel numbers under different coding parameters in a user viewpoint area, coding efficiency is obtained by predicting bit numbers per pixel, an optimal coding parameter under a bandwidth limitation is obtained by solving an optimization problem of a bandwidth estimation value and a bandwidth limitation, and finally coding of the 360-degree video stream is achieved through the obtained optimal coding parameter.

9. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor is operative to perform the optimization method of any one of claims 1-6 or the encoding method of claim 8 when executing the program.

10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor is operative to perform the optimization method of any one of claims 1-6, or to perform the encoding method of claim 8.