CN116248955A

CN116248955A - VR cloud rendering image enhancement method based on AI frame extraction and frame supplement

Info

Publication number: CN116248955A
Application number: CN202211722376.4A
Authority: CN
Inventors: 童文栋; 晏宗明; 文建良
Original assignee: Unicom Lingjing Video Jiangxi Technology Co ltd
Current assignee: Unicom Lingjing Video Jiangxi Technology Co ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-06-09

Abstract

The invention discloses a VR cloud rendering image enhancement method based on AI frame extraction and frame supplement, and belongs to the technical field of VR panoramic videos. According to the invention, the VR application performs cloud rendering through cloud GPU capability, the server side adopts an AI algorithm to extract frames of VR plug-flow picture images, data packet transmission is reduced, the client side adopts a deep learning neural network algorithm to perform frame supplementing operation on the plug-flow pictures after receiving the data packets through pulling, the completeness and smoothness of the put-in pictures are ensured, and then the pictures generated after frame supplementing are subjected to image enhancement by adopting technologies such as gray level conversion, image sharpening, image smoothing, frequency domain filtering and the like, so that the problems of picture smoothness and completeness of pulling cloud VR video resources by the client side are solved, and the experience of accessing cloud VR application by the VR client side is greatly improved.

Description

VR cloud rendering image enhancement method based on AI frame extraction and frame supplement

Technical Field

The invention relates to the technical field of VR panoramic video, in particular to a VR cloud rendering image enhancement method based on AI frame extraction and frame supplement.

Background

With the continuous development of information technology, VR virtual reality technology gradually goes into our daily life, and through virtual generation in three-dimensional space, people can observe things in three-dimensional space indefinitely. The current VR virtual reality technology is very mature, the VR virtual reality technology is popularized and applied, digital products are continuously emerging, the novelty and the interestingness of the materialized products experienced by consumers can be improved, and the consumers can be guaranteed to better feel the cultural knowledge and the value concept behind the products, so that very important application value is embodied for better meeting pursuits of cultural spirit and life of the consumers. Traditional VR virtual reality technology needs to store VR panorama video resource locally at the customer end, and the customer end loads local VR scene resource, along with the space scene of VR panorama video and the demand of watching experience, also can be higher and higher to the configuration requirement of customer end, and VR experience can receive the influence because of the customer end configuration. Therefore, rendering the VR panoramic video on the cloud server and displaying the VR panoramic video on the client through ffmpeg and other technologies is a future development trend, so that the problem of client configuration requirements can be solved, and VR experience of the client is enhanced.

Along with the vigorous development of VR panoramic video technology, the size of VR panoramic video also becomes bigger and bigger along with the space scene of VR panoramic video and the demand of watching experience, traditional VR glasses need to carry out the continuous upgrading of configuration and just can provide better experience, VR panoramic video carries out cloud rendering through high in cloud GPU ability, put in the display at the client and be a better solution in future, but when the client draws high in cloud VR video resource, there is not high and the poor problem of integrality of picture, for this reason, proposes a VR cloud rendering image enhancement method based on AI frame drawing and frame filling.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: how to solve the problems of smoothness and completeness of pictures of a cloud VR video resource pulled by a client, the VR cloud rendering image enhancement method based on AI frame extraction and frame supplementing is provided, a video stream is extracted at a server through a GPU, an optical flow algorithm is used for frame supplementing operation of the video stream at the client, image enhancement is performed, and experience of the VR client for accessing cloud VR applications is improved.

The invention solves the technical problems through the following technical proposal, and the invention comprises the following steps:

s1: making VR applications as mirror images

Packaging VR application scenes which need to be used into a mirror image through a Docker containerization technology, packaging a push server into the mirror image, and opening UDP/TCP ports which are needed by the push server;

s2: client entering VR application

The client opens the prepared front-end website page, opens the application, clicks the VR application to be accessed, starts the container by calling the server interface, opens the internal VR application in the container, intercepts the internal screen of the container, displays the VR picture in the container by pushing through the pushing server, and returns the VR picture to the client VR application scene pulling address by internal UDP communication;

s3: the server side extracts frames from the video stream

The server side sequentially realizes frame extraction of the video stream through video decoding, frame color space conversion and JPEG encoding operation;

s4: client-side frame supplementing to video stream

The client analyzes the video stream by using a video stream frame inserting algorithm based on an optical flow algorithm, and performs frame supplementing operation;

s5: enhancing the frame-compensated picture

And carrying out image enhancement on the frame-supplementing picture by adopting an image enhancement algorithm.

Furthermore, in the step S3, the frame color space conversion and the JPEG encoding operation all use GPU CUDA kernel to perform parallel computation acceleration.

Further, in the step S4, the video stream interpolation algorithm based on the optical flow algorithm uses the optical flow algorithm to calculate the pixel point change condition of the t-1 frame and the t+1 frame through the mathematical modeling form, and calculates the pixel point position condition at the time of the t frame according to the optical flow algorithm, so as to perform the frame interpolation operation.

Further, the step S4 includes the following steps:

s41: obtaining a current frame image of the VR video stream, namely obtaining and intercepting the current frame image through pulling;

s42: calculating the position of a group of pixel points of the current frame in the next frame image according to an optical flow algorithm;

s43: establishing a global parameter model of two adjacent frames of images to sample pixel points;

s44: calculating global motion parameters from a current frame to a next frame through mathematical modeling;

s45: and performing frame inserting operation on the intermediate frames according to the global motion parameters.

Further, the step S42 includes the following steps:

s421: dividing a current frame image into k x n macro blocks, and taking a center pixel point from each macro block to obtain k x n pixel points;

s422: calculating a local motion vector of each pixel point in the next frame of image through an optical flow algorithm;

s423: and calculating the position coordinates in the next frame image according to the coordinates of the current frame of each pixel point and the local motion vector.

Further, the step S43 includes the following steps:

s431: substituting the positions of a plurality of pixel points in the randomly selected current frame image and the position coordinates of the plurality of pixel points in the next frame image into a global parameter model to obtain global motion parameters;

s432: randomly sampling and consistent mode is adopted to randomly extract a plurality of pairs of pixel points from k x n pairs of pixel points of a current frame image and a next frame image, 2 pairs of pixel points are randomly extracted from the plurality of pairs of pixel points, coordinates are substituted into a global parameter model to obtain four equations, and four global motion parameters w, alpha and l are obtained according to the four equations _x 、l _y ；

S433: inputting the position coordinates of the rest pixel points in a group of pixel points in the current frame image into a global parameter model to obtain the position coordinates of the rest pixel points in the next frame image;

s434: and counting pixel coordinates in a preset range, and judging whether the accuracy of the sampling reaches the standard or not and whether the result calculated by the sampling is effective or not according to the recorded number of the pixel points screened each time.

Further, in the step S431, the global parameter model is as follows:

wherein alpha is the rotation angle parameter of the frame image, w is the telescopic motion parameter of the frame image, and l _x 、l _y Respectively x-axis translational motion parameters and y-axis translational motion parameters, (x) _i ,y _i ) For the position coordinates of the pixel point in the i frame image, (x) _i+1 ，y _i+1 ) The position coordinates of the pixel point in the i+1 frame image are obtained.

Further, in the step S44, global motion parameters are calculated according to the position coordinates of the filtered pixel points in the current frame image and the next frame image.

Further, in the step S45, the flow condition of the pixels at the middle moment is obtained by calculating the global motion parameters from the current frame to the next frame, the whole process is regarded as being performed at a constant speed, and the flow condition is obtained by the position of each pixel in the current frame and the global motion parameters w, α, l _x 、l _y And calculating the positions of all the pixel points at the middle time, and performing frame inserting operation at the middle time.

Further, in the step S5, the image enhancement algorithm includes image sharpening, image smoothing, gray scale adjustment, and histogram equalization.

Compared with the prior art, the invention has the following advantages: the VR application carries out cloud rendering through cloud GPU capability, the server adopts an AI algorithm to draw frames of VR plug-flow picture images, data packet transmission is reduced, after the client receives the data packets through pulling, a deep learning neural network algorithm is adopted to carry out frame supplementing operation on the pull-flow pictures, the completeness and smoothness of put-in pictures are guaranteed, then the pictures generated after frame supplementing are subjected to image enhancement by adopting technologies such as gray level transformation, image sharpening, image smoothing and frequency domain filtering, the problems of picture smoothness and completeness of pulling cloud VR video resources by the client are solved, and the experience of accessing cloud VR application by the VR client is greatly improved.

Drawings

Fig. 1 is a flow chart of a VR cloud rendering image enhancement method based on AI frame extraction and frame compensation in an embodiment of the invention;

fig. 2 is a schematic flow chart of a client for supplementing frames to a video stream according to an embodiment of the present invention.

Detailed Description

The following describes in detail the examples of the present invention, which are implemented on the premise of the technical solution of the present invention, and detailed embodiments and specific operation procedures are given, but the scope of protection of the present invention is not limited to the following examples.

As shown in fig. 1, this embodiment provides a technical solution: a VR cloud rendering image enhancement method based on AI frame extraction and frame compensation comprises the following steps:

s1, manufacturing VR application as mirror image: and packaging the VR application scene to be used into the mirror image through a Docker containerization technology, packaging the push server into the mirror image, and opening a UDP/TCP port required by the push server.

S2, the client enters into the VR application: the client opens the manufactured front-end website page, opens the application, clicks the VR application to be accessed, starts the container by calling the server interface, opens the internal VR application in the container, intercepts the screen in the container, displays the VR picture in the container through plug-flow through the plug-flow server, and returns the VR application scene plug-flow address to the client through internal UDP communication.

S3, the server side extracts frames from the video stream: the process of extracting frames from the video stream by the server generally comprises video decoding, frame color space conversion and JPEG encoding. The frame color space conversion and JPEG coding all involve pixel level calculation, and GPU CUDA kernel is adopted for parallel calculation acceleration. In addition, frames obtained after video decoding are uncompressed original data, the data size is large, if decoding is carried out on a CPU (central processing unit) or GPU (graphics processing unit) is decoded and then automatically transmitted back to the CPU, original frame data between a device (video memory) and a host (main memory) need to be frequently copied back and forth, IO (input output) time consumption is long, data bandwidth is congested, and time delay is obviously increased. Therefore, the main objective of the invention is to reduce the data IO exchange between host and device as much as possible, realize the whole-flow GPU heterogeneous computation in the frame extraction process, fully utilize the hardware decoding unit NVDEC of NVIDIA GPU, furthest reduce the occupation of video decoding on CPU and GPU CUDA cores, and simultaneously process video frame extraction and subsequent video frame interpolation processing with low delay and high throughput as much as possible.

S4, the client-side supplements frames to the video stream: the client analyzes the video stream by using a frame inserting algorithm, and improves the frame rate of the video stream and the fluency of the video. Generating non-existent frames from a continuous video sequence has been a challenging problem in the field of video processing. Typical kernel-based interpolation methods predict pixels by a single convolution process that convolves the source frame with a spatially adaptive local kernel, thereby avoiding time-consuming explicit motion estimation in the form of optical flow. The invention specifically uses an optical flow algorithm to calculate the pixel point change conditions of t-1 frames and t+1 frames in a mathematical modeling mode, and calculates the pixel point position condition at the moment of t frames according to the optical flow algorithm so as to perform frame supplementing operation.

S5, enhancement of the frame-supplementing picture is carried out by adopting the technologies of gray level transformation, image sharpening, image smoothing, frequency domain filtering and the like: since the sharpness and contrast of the frame-compensating picture are low, the emphasis in the picture cannot be highlighted. For a small sample image dataset, an image enhancement method is often adopted to expand the data volume, so that the rationality of an experiment is increased, the image enhancement algorithm can improve the contrast of the whole image and the local image, the detail information of the image is highlighted, and the image is more in line with the visual characteristics of human eyes and is easy to identify by a machine. The specific image enhancement algorithm comprises the following steps: image sharpening, smoothing (denoising), gray scale adjustment (contrast enhancement), histogram equalization, and the like.

In this embodiment, as shown in fig. 2, the frame inserting algorithm in step S4 is a video stream frame inserting algorithm based on an optical flow algorithm, and the specific processing procedure is as follows:

s41: obtaining a current frame image of a VR video stream;

In the present embodiment, in step S41, the current frame image is taken by the pull-stream acquisition.

In this embodiment, in step S42, the current frame image is first divided into k×n macro blocks, and the center pixel point is taken from each macro block, so that k×n pixels can be taken; then calculating a local motion vector of each pixel point in the next frame of image through an optical flow algorithm; and finally, calculating the position coordinate in the next frame of image according to the coordinate of the current frame of each pixel point and the local motion vector.

The optical flow algorithm in this embodiment mainly refers to an L-K optical flow algorithm, which is an optical flow estimation algorithm of two-frame difference, and the pixel coordinates (x, y) of the current frame image, and the constraint equation of the adjacent two-frame image on the pixel at the time t of the current frame is:

(x，y，t)＝(x+&x，y+&y，t+&t)

wherein, & x is the position offset of the x-axis of the pixel point coordinate (x, y) in the time & t, and & y is the position offset of the y-axis of the pixel point coordinate (x, y) in the time & t.

The constraint equation is applied to k×n pixel points, and the motion vector of each pixel point in a local part can be obtained by carrying out taylor series expansion.

In this embodiment, in step S43, the positions of a plurality of pixels in the current frame image selected at random and the position coordinates of the pixels in the next frame image are substituted into the global parameter model to obtain the global motion parameter, and because the frame image may have operations such as translation and rotation during the motion, the global parameter model is built:

wherein alpha is the rotation angle parameter of the frame image, w is the telescopic motion parameter of the frame image, and l _x 、l _y Respectively x-axis translational motion parameters and y-axis translational motion parameters, (x) _i ，y _i ) For the position coordinates of the pixel point in the i frame image, (x) _i+1 ，y _i+1 ) The position coordinates of the pixel point in the i+1 frame image are obtained.

Then randomly sampling the pixel points of the k-n pairs of the current frame image and the next frame image in a consistent mode to arbitrarily extract a plurality of pairs of pixel points, and because four global motion parameters in the global parameter model are unknown quantities to be solved, the 4 unknown quantities are required to be solved through 4 equations, 2 pairs of pixel points are arbitrarily extracted from the plurality of pairs of pixel points, and coordinates to be input into the global parameter model can obtain four equations, and four global motion parameters w, alpha and l can be solved according to the four equations _x 、l _y ；

Inputting the position coordinates of the rest pixel points in a group of pixel points in the current frame image into the global parameter model to obtain the position coordinates of the rest pixel points in the next frame image;

counting pixel point coordinates in a preset range, and judging whether the accuracy of the sampling reaches the standard or not and whether the result calculated by the sampling is effective or not according to the recorded number of the pixel points screened out each time;

the number of samples M can be found according to an empirical formula:

wherein z is the accuracy of the algorithm result, w is the proportion of the selected pixel points in the whole pixel points, w > =0.7 can be considered for improving the accuracy, and l is the number of the points taken in the set each time; from this information, it can be calculated how many random samples are reached to ensure accuracy.

In this embodiment, in step S44, the global motion parameter may be calculated according to the position coordinates of the filtered pixel point in the current frame image and the next frame image.

In this embodiment, in step S45, the flow condition of the pixels at the middle time is obtained by calculating the global motion parameters from the current frame to the next frame, the whole process can be regarded as uniform speed, and the flow condition is obtained by the position of each pixel in the current frame and the global motion parameters w, α, l _x 、l _y And calculating the positions of all the pixel points at the middle time, and performing frame inserting operation at the middle time.

In summary, in the VR cloud rendering image enhancement method based on AI frame extraction and frame compensation in the above embodiment, the VR application performs cloud rendering through cloud GPU capability, the server uses AI algorithm to extract frames from VR plug-flow image, reduce data packet transmission, the client uses deep learning neural network algorithm to perform frame compensation operation on the plug-flow image after receiving the data packet through the plug-flow, ensure the integrity and smoothness of the input image, and then performs image enhancement on the image generated after frame compensation by using technologies such as gray level transformation, image sharpening, image smoothing, frequency domain filtering, so as to solve the problems of image smoothness and integrity of the client pulling cloud VR video resources, and greatly improve the experience of the VR client accessing cloud VR application.

While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims

1. The VR cloud rendering image enhancement method based on AI frame extraction and frame supplement is characterized by comprising the following steps:

s1: making VR applications as mirror images

s2: client entering VR application

s3: the server side extracts frames from the video stream

s4: client-side frame supplementing to video stream

s5: enhancing the frame-compensated picture

2. The VR cloud rendered image enhancement method based on AI frame extraction and frame filling of claim 1, wherein the method comprises the steps of: in the step S3, the frame color space conversion and JPEG encoding operations all use GPU CUDA kernel to perform parallel computation acceleration.

3. The VR cloud rendered image enhancement method based on AI frame extraction and frame filling of claim 1, wherein the method comprises the steps of: in the step S4, the video stream interpolation frame algorithm based on the optical flow algorithm uses the optical flow algorithm to calculate the pixel point change condition of the t-1 frame and the t+1 frame through a mathematical modeling form, and calculates the pixel point position condition at the time of the t frame according to the optical flow algorithm, so as to perform the frame interpolation operation.

4. The VR cloud rendered image enhancement method based on AI frame extraction and frame filling of claim 1, wherein the method comprises the steps of: the step S4 includes the following steps:

5. The VR cloud rendered image enhancement method based on AI frame extraction and frame filling of claim 4, wherein the method comprises the steps of: the step S42 includes the following steps:

6. The VR cloud rendered image enhancement method based on AI frame extraction and frame filling of claim 5, wherein the method comprises the steps of: the step S43 includes the following steps:

7. The VR cloud rendered image enhancement method based on AI frame extraction and frame filling of claim 6, wherein the method comprises the steps of: in the step S431, the global parameter model is as follows:

8. The VR cloud rendered image enhancement method based on AI frame extraction and frame filling of claim 7, wherein the method comprises the steps of: in the step S44, global motion parameters are calculated according to the position coordinates of the filtered pixel points in the current frame image and the next frame image.

9. The VR cloud rendered image enhancement method based on AI frame extraction and frame filling of claim 8, wherein the method comprises the steps of: in the step S45, the flow condition of the pixels at the middle moment is obtained by calculating the global motion parameters from the current frame to the next frame, the whole process is regarded as being performed at a constant speed, and the position of each pixel in the current frame and the global motion parameters w, α, l are used for the flow condition _x 、l _y And calculating the positions of all the pixel points at the middle time, and performing frame inserting operation at the middle time.

10. The VR cloud rendered image enhancement method based on AI frame extraction and frame filling of claim 9, wherein the method comprises the steps of: in said step S5, the image enhancement algorithm comprises image sharpening, image smoothing, gray scale adjustment, histogram equalization.