CN116912385B

CN116912385B - Video frame adaptive rendering processing method, computer device and storage medium

Info

Publication number: CN116912385B
Application number: CN202311190963.8A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Shenzhen Yuntian Changxiang Information Technology Co ltd
Current assignee: Shenzhen Yuntian Changxiang Information Technology Co ltd
Priority date: 2023-09-15
Filing date: 2023-09-15
Publication date: 2023-11-17
Anticipated expiration: 2043-09-15
Also published as: CN116912385A

Abstract

The application discloses a video frame self-adaptive rendering processing method, a computer device and a storage medium, comprising the following steps: acquiring continuous online video as video frame training data, dividing the video frame training data into a plurality of Tile areas by a Tile divider, encoding video frame numbers according to video frame quality grades, writing video frame rendering tasks of the corresponding Tile areas by adopting a parallel algorithm, scheduling the rendering tasks by a GPU computing architecture of a server cluster to obtain cluster computing capability with optimal performance, and selecting a macro module according to video frame type parameters to obtain MPD description files; and establishing a corresponding data structure for the MPD description file according to the MPD structure, analyzing the video frame to obtain each video slice information, and taking the resource catalog of the server as the output catalog of the decoding stream generation model for multi-path parallel high-efficiency rendering. The method can realize rapid and efficient game management, reduce deployment cost, support rapid reset, realize efficient utilization of resources and promote immersion and reality of users.

Description

Video frame adaptive rendering processing method, computer device and storage medium

Technical Field

The present application relates to the field of video rendering technologies, and in particular, to a video frame adaptive rendering processing method, a computer device, and a storage medium.

Background

Panoramic video is a video form which is popular along with the development of virtual reality technology, shooting and recording are carried out on all directions in space through equipment such as a multi-path video camera or a professional panoramic camera, a user can switch visual angles through changing head orientation or through input equipment such as a mouse and a controller, as if the user is in a video scene, the transmission and the rendering of the panoramic video are also key factors influencing subjective experience of the user as a form of virtual reality application, and the resolution ratio and the data size of the panoramic video are high, so that the panoramic video is smoothly transmitted under the existing network condition, and still faces a plurality of challenges.

When the panoramic video is operated in the cloud game mode, all game pictures are required to be operated on a cloud server, firstly, the panoramic video is required to be projected onto a plane from the spherical surface and then is sent to an encoder for processing, when the panoramic video is played, the panoramic video is required to be reversely mapped back onto the spherical surface from the plane, so that the stereoscopic impression of the video is recovered, the selection of the projection mapping format can influence the video coding and decoding process and the video rendering process, and because the resolution of the cloud game panoramic video picture is high and the data volume is large, higher bandwidth resources are required for transmitting the cloud game panoramic video through a network, and the processes such as rendering and displaying also have higher requirements on the processing capacity of a cloud game panoramic video playing client.

The existing method for video frame rendering mainly comprises a panoramic video rendering method based on ERP or CMP format, when a game interface is mapped, the ERP mapping mode is to directly map the longitude and latitude lines on a spherical surface into straight lines which are parallel and equidistant on a plane, the mapping formula is simple, so that ERP becomes the most widely used panoramic video projection format, the CMP format is to expand the spherical surface into a plane by projecting the spherical surface onto six faces of a sphere circumscribed cube, and the two formats need to encode and compress the plane video to be transmitted to a network server after mapping, so that the two common video frame rendering methods at present have the following problems: (1) In the mapping process, a large number of redundant pixels are easily accumulated in the two-pole area, performance bottleneck in rendering is easily formed, and an oversampling phenomenon exists at the edge of each face of the mapped cube, so that a seam exists at the joint of each face of the cube in the panoramic video, and the immersion and realism of a user are reduced; (2) After video frame data mapping encoding, the client can independently receive and decode code stream data of each Tile by adopting the Tile dividing characteristic in H.265/HEVC, and each Tile needs to be rendered to the position of the original video frame during rendering, but the relevance of information near a boundary is destroyed by the Tile dividing, so that the encoding performance is reduced, the video transmission process wastes bandwidth resources, and the processing of the client is challenged.

Disclosure of Invention

The application aims to provide a video frame self-adaptive rendering processing method, a computer device and a storage medium, so as to solve the technical problems that in the prior art, a panoramic video rendering method has a large amount of pixel redundancy, the transmission bandwidth is limited, and the decoding rendering capability requirement of a client is high.

In order to solve the technical problems, the application specifically provides the following technical scheme:

in a first aspect of the present application, there is provided a video frame adaptive rendering processing method, comprising the steps of:

acquiring continuous online video as video frame training data, dividing the video frame training data into a plurality of Tile areas by a Tile divider, detecting video frame code rates of the video frame training data of the Tile areas, writing video frame sequence numbers on the video frame data frame by frame according to the video frame code rates, and coding the video frame sequence numbers according to video frame quality grades;

writing the coded video frame sequence number into a video frame rendering task of a corresponding Tile area by adopting a parallel algorithm, and intelligently dividing task slices for the video frame rendering task according to the instant resource state and the communication capacity of a rendering server side;

distributing the task slices to clusters formed by rendering servers, performing rendering task scheduling through a GPU computing architecture of the server clusters to obtain cluster computing capacity with optimal performance, selecting macro modules according to video frame type parameters, and encoding video frames in parallel to obtain MPD (MPD) description files;

and establishing a corresponding data structure for the MPD description file according to the MPD structure, analyzing the video frame to obtain information of each video slice, taking a resource directory of the server as an output directory of a decoding stream generation model, and carrying out multipath parallel high-efficiency rendering.

As a preferred scheme of the present application, the method for writing video frame numbers on video frame data frame by using the video frame code rate includes:

the video frame training data is stored in a yuv format, and is decoded into a yuv original video sequence through FFmpeg;

inputting the yuv original video sequence into a Tile divider to obtain a plurality of Tile areas, detecting frame numbers of video frames in the Tile areas, reading data in a gray pattern mode, intercepting the video data in the Tile areas as basic data of the video frame detection frame numbers, performing binarization processing on the basic data, and obtaining binarized values of all pixel points in the Tile line long areas;

if the binarization of the rule line length area is detected to have more than 50% of white points, adding 1 to the frame number, and continuously detecting the number of white points after the binarization of other rule line length areas; otherwise, outputting the corresponding frame number.

As a preferred embodiment of the present application, the method for encoding the video frame number according to the video frame quality level includes:

and measuring the quality level of the current coding output through the average quality evaluation of the output frames of the nvENC coder, and adopting the nvENC coder to code the video frames of the Tile region according to the code rate of the video frame sequence number corresponding to the code rate of the video frame sequence number in the Tile region.

As a preferred solution of the present application, the nvENC encoder encodes a video frame of a Tile region according to a video frame type parameter and a frame data motion vector, and the specific method includes:

establishing L parallel processing threads for the video frame sequence number of the Tile region, determining the coding frame type of each frame in the video frame, performing motion estimation according to the original image reference frames corresponding to the coding frame types, and establishing according to motion vectors formed by the motion estimationParallel motion estimation threads of the whole pixels; establishment from the parallel motion estimation threadsThe video frame coding of each Tile region is divided into intra-frame coding and inter-frame coding, each intra-frame coding is set to be parallel fields, each field is used as a slice, the fields are independently coded in parallel, each inter-frame coding adopts a parallel coding mode of interdependence between original video reference frames, and each coding thread performs pixel searching according to an integral pixel motion vector obtained by a parallel motion estimation thread to obtain video frame coding data based on the motion vector.

As a preferred scheme of the application, the video frame sequence number after coding is written into a video rendering task through a parallel processing thread, and task slices are divided according to the thread length of the video rendering task in a corresponding Tile region, and the method comprises the following steps:

connecting the video frame coding data with a server cluster through a parallel processing thread, wherein the server cluster is connected with a rendering server through a clusterin-level high-speed network;

distributing video frame rendering tasks of the Tile area in the rendering server according to intra-frame coded slice and inter-frame coded thread progress;

and dividing the video frame rendering task into a plurality of task slices according to the duration of the output code stream of the nvENC encoder.

As a preferred embodiment of the present application, the task slices are distributed on the server cluster in the form of time nodes, including:

reading data of the task slice in a Concurent mode by a slice mechanism of MapTask, wherein a time point of a Tile region corresponding to the task slice is a time node of a FileInputFormat slice;

and scheduling task slices according to the time nodes by using a Lanes priority model, and distributing the task slices to a cluster formed by rendering servers by taking a time tangent point as a MapTask queue time axis.

As a preferred solution of the present application, after the task slices are distributed to the server clusters, the task slices are scheduled to execute rendering tasks, including:

and scheduling rendering tasks according to slice Task time nodes divided by the MapTask queue time axis, forming a GPU computing architecture on a cluster through the rendering server, operating an MPD generating module, generating an MPD description file of which the time nodes are based on the Task queue time axis, sequentially describing frame information of the Tile region video frames by adopting the MPD description file, and describing the relative position relationship among the Tile regions by adopting SRD labels.

As a preferable scheme of the application, the MPD description file adopts an SRD label to Segment video fragments in the task section through a bottom Segment hierarchical data model, and requests corresponding video content from a server in the Tile area through a URL address in the Segment hierarchical data model.

As a preferred scheme of the application, the MPD description file adopts adaptation set to represent slice video information of different Tile areas, analyzes the slice video information by an MPD data structure, and sets a resource directory of a terminal server as an output directory of a decoding stream generation model by adopting an AdrenoGPU direct connection mode to carry out multipath parallel efficient rendering.

In a second aspect of the application, a computer apparatus is provided,

comprising the following steps: at least one processor; and a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor.

In a third aspect of the present application, a computer-readable storage medium is provided,

the computer-readable storage medium has stored therein computer-executable instructions that are executed by a processor.

Compared with the prior art, the application has the following beneficial effects:

the application adopts a parallel algorithm, the video frame data is divided into areas by the Tile divider, and the frame sequence number is added to the video frames in the Tile area, so that the video decoding is convenient, the video positions of the source end are corresponding, the occurrence of repeated frames and lost frames is prevented, and the immersion and realism of the user are improved.

The video frame rendering task of the Tile area is scheduled by the rendering server, the rendering task is scheduled by the GPU computing architecture of the server cluster, the cluster computing capability of optimal performance is obtained, larger-scale rendering and acceleration of tasks are achieved, the method is suitable for scenes in which the demands of the rendering task on computing power resources are larger than that of a single GPU, and a distribution mechanism inside the rendering server is adopted, so that rapid and efficient game management can be achieved, the rendering server does not need to be configured with external storage, the deployment cost is reduced, rapid reset is supported, efficient utilization of resources is achieved, and the immersion and realism of users are improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.

Fig. 1 is a flowchart of a video frame adaptive rendering processing method according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

As shown in fig. 1, the present application provides a video frame adaptive rendering processing method, which includes the following steps:

In this embodiment, the video frame area is divided based on the Tile, the physical resources are divided into a plurality of virtual resources through the virtualization technology, flexible allocation, scheduling and management of the resources are realized, the distributed cluster overall computing power formed by single physical nodes with high differential computing performance is utilized, a plurality of clusters are connected by a high-speed proprietary network, the distributed processing of the cross-clusters can be supported, the overall computing power upper limit is improved, a parallel algorithm is adopted, the large-scale rendering task is intelligently divided into task slices according to the back-end instant resource state and the communication capability, the task slices are distributed on the clusters formed by the rendering server, the cluster computing capability with optimal performance is obtained, and the video rendering effect is greatly improved.

The method for writing the video frame sequence number on the video frame data frame by the video frame code rate comprises the following steps:

if the binarization of the rule line length area is detected to have more than 50% of white points, adding 1 to the frame number, and continuously detecting the number of white points after the binarization of other rule line length areas;

otherwise, outputting the corresponding frame number.

In this embodiment, a frame number is added to a video frame in the Tile area, and a drawText of FFmpeg is directly used to write the frame number value to an original video frame by frame, so that the video corresponding to a source video position during video decoding is facilitated, and a video with repeated frames and lost frames is prevented.

In this embodiment, the video frames in the Tile area are encoded according to different code rates, so that when the video is retransmitted, the Tile located in the current view range of the user transmits a version with a high code rate and a high quality level, and the Tile located outside the current view range of the user transmits a version with a low code rate and a low quality level, thereby reducing the requirement on network bandwidth.

The method for encoding the video frame sequence number according to the video frame quality level comprises the following steps:

In this embodiment, the nvENC encoder calculates the sub-image numbers of the Tile regions corresponding to different view angles, requests a new Tile code stream from the server, establishes a corresponding relationship between the current video and the sub-image numbers, records the number of Tile sub-regions in the current view range, and sends the number to the server.

In this embodiment, the step of establishing the TileMask task code by the nvENC encoder is:

firstly, acquiring a gesture information matrix of a current video of a user through a function provided by OpenVR, and calculating longitude and latitude of a central point of the current video of the user according to the gesture matrix of the user;

secondly, calculating the current video center point of the user in the Tile region with the number j according to the sub-image number corresponding to the Tile mask, and acquiring a set M of the Tile region overlapping with the current view range of the user according to the setting of the view range of the user in the Tile surrounding region with the number j;

and finally, traversing the set M, setting the data on the positions corresponding to the TileMask as 1 for each Tile, so as to conveniently determine the TileMask corresponding to the current view range, and transmitting the TileMask value into a code stream selection module and a code stream fusion module for further processing.

The nvENC encoder encodes the video frame of the Tile region according to the video frame type parameter and the frame data motion vector, and the specific method comprises the following steps:

establishing L parallel processing threads for the video frame sequence number of the Tile region, determining the coding frame type of each frame in the video frame, performing motion estimation according to the original image reference frames corresponding to the coding frame types, and establishing according to motion vectors formed by the motion estimationParallel motion estimation threads of the whole pixels;

in the embodiment, the characteristic that the frame can be buffered by utilizing the coding delay is utilized, the frame type parameters obtained by the preprocessing module are fully utilized, so that each frame of data in the parallel motion estimation thread can independently run in parallel, the video frames subjected to motion estimation are input and output into the coder coding module for coding, the parallelism of the correction coding frame is greatly improved by applying the parallel motion estimation method, and the utilization rate of multi-core hardware computing resources is also greatly improved.

Establishment from the parallel motion estimation threadsA coding thread for dividing the video frame coding of each Tile region into intra-frame coding and inter-frame coding, and each intra-frame codingThe method comprises the steps of setting parallel fields, wherein each field is used as a slice, the fields are independently coded in parallel, each inter-frame coding adopts a parallel coding mode which is interdependent among original video reference frames, and each coding thread performs pixel searching according to an integral pixel motion vector obtained by a parallel motion estimation thread to obtain video frame coding data based on the motion vector.

In this embodiment, pixels are searched in the form of motion vectors, and the residual images are subjected to integer transformation, quantization and encoding through motion compensation to obtain a video frame code stream, so that whole pixel estimation of the whole video is not required, the calculation speed is improved, the inter-frame prediction efficiency is also improved, and the parallel frame number is reduced.

Writing the coded video frame sequence number into a video rendering task through a parallel processing thread, and dividing task slices according to the thread length of the video rendering task in a corresponding Tile region, wherein the method comprises the following steps:

In this embodiment, the parallel encoding thread is adopted to realize rate control, and after the motion estimation thread performs motion estimation, not only the motion vector of the encoded image can be obtained, but also the absolute difference value of the encoded image can be obtained, and SAD can be obtained as the complex reference degree of the encoded image, so that a basis is provided for bit allocation of the encoded image.

The task slices are distributed on a server cluster in the form of time nodes, and the task slices comprise:

In the embodiment, the rendering server has ultrahigh graphic image rendering capability, and is low to nanosecond-level in-device data transmission, so that the intelligent platform management interface is supported, and the graphic image rendering service with ultralow delay and ultrahigh performance can be provided.

After the task slices are distributed to the server clusters, the task slices are scheduled to execute rendering tasks, which comprises the following steps:

In the embodiment, the GPU computing architecture is utilized to provide a bottom program for controlling the GPU and the CPU to run the graphics image rendering, and the functions of multithreading scheduling, multi-channel graphics data compression transmission, shader precompilation and the like are optimized aiming at the visual scene depth, so that the efficiency of video data transmission and video data processing of hardware channels is greatly improved.

Dividing the MPD description file into video fragments with fixed length according to the SRD label by using a Segment hierarchical data model at the bottom layer, and requesting corresponding video content from a server in the Tile region through a URL address in the Segment hierarchical data model.

And the MPD description file adopts adaptation set to represent slice video information of different Tile areas, analyzes the slice video information by using an MPD data structure, and sets a resource directory of a terminal server as an output directory of a decoding stream generation model by adopting an AdrenoGPU direct connection mode to perform multipath parallel high-efficiency rendering.

In this embodiment, the ARM array server adopts the h.264/h.265 coding standard, the high-efficiency coding performance achieves real-time coding of 4K and 144fps, the coding time is less than 4ms, the multi-resolution is flexibly switched, meanwhile, multiple paths of coding are supported to better support interaction scenes such as cloud game multiuser fight, assistance, live broadcast and the like, and coding is simultaneously carried out according to different coding standards or different network states, code rates or resolutions, and real-time regulation of the code rate based on game content is supported.

Second embodiment: a computer device for the computer of a computer system,

In this embodiment, the vrvie streaming media automation platform is adopted to monitor real-time running states of hardware, a system, a virtual machine, a container and a process at any time, monitor delay, image quality and smoothness of rendering effects presented by the SDK at any time according to customer requirements, and issue cloud instructions in real time for adjustment, so as to ensure that the running state of the whole platform is known and controllable.

Third embodiment: a computer-readable storage medium comprising a memory, a storage medium, and a memory,

In this embodiment, the storage medium supports coexistence of multiple storage schemes, and the cloud game platform storage system architecture divides the storage system into four levels, namely, central storage, regional root storage, sub-regional storage clusters and cloud game instance local storage from top to bottom, so that the performance, disaster recovery and cost are balanced as a whole, and the cloud game platform storage system architecture has corresponding management and scheduling systems such as mirror image distribution, storage scheduling and archiving scheduling, and is convenient for realizing various demands of services.

In the embodiment, the computing and the storage are separated in a cloud cluster form, the operation system is guided through the special storage cluster, the game is updated and managed, the game content is deployed, the rapid and efficient game management can be realized based on an internal distribution mechanism, the rendering server does not need to be configured with external storage, the deployment cost is reduced, the rapid reset is supported, and the efficient utilization of resources is realized.

The above embodiments are merely exemplary embodiments of the present application and are not intended to limit the present application. Various modifications and equivalent arrangements of this application will occur to those skilled in the art, and are intended to be within the spirit and scope of the application.

Claims

1. The video frame self-adaptive rendering processing method is characterized by comprising the following steps of:

2. The method of adaptive rendering processing of video frames according to claim 1,

the video frame code rate writes the video frame sequence number on the video frame data frame by frame, comprising:

otherwise, outputting the corresponding frame number.

3. The method for adaptive rendering processing of video frames according to claim 2, wherein,

encoding the video frame sequence number according to the video frame quality level, including:

4. The method for adaptive rendering processing of video frames according to claim 3,

the nvENC encoder encodes the video frame of the Tile region according to the video frame type parameter and the frame data motion vector, and comprises the following steps:

establishment from the parallel motion estimation threadsThe video frame coding of each Tile region is divided into intra-frame coding and inter-frame coding, each intra-frame coding is set to be parallel fields, each field is used as a slice, the fields are independently coded in parallel, each inter-frame coding adopts a parallel coding mode of interdependence between original video reference frames, and each coding thread performs pixel searching according to an integral pixel motion vector obtained by a parallel motion estimation thread to obtain video frame coding data based on the motion vector.

5. The method for adaptive rendering processing of video frames according to claim 3,

writing the coded video frame sequence number into a video rendering task through a parallel processing thread, dividing task slices according to the thread length of the video rendering task in a corresponding Tile region, and comprising the following steps:

6. The method for adaptive rendering processing of video frames according to claim 5, wherein,

7. The method of adaptive rendering processing of video frames of claim 6,

8. The method of claim 7, wherein,

the MPD description file adopts an SRD label to Segment video fragments in the task section through a bottom Segment layered data model, and requests corresponding video content from a server in the Tile region through a URL address in the Segment layered data model;

and the MPD description file adopts adaptation set to represent slice video information of different Tile areas, analyzes the slice video information by using an MPD data structure, and sets a resource directory of a terminal server as an output directory of a decoding stream generation model by adopting an AdrenoGPU direct connection mode to carry out multipath parallel efficient rendering.

9. A computer device, characterized in that,

wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor, whereby the method of any one of claims 1-8 is performed by the processor.

10. A computer-readable storage medium comprising,

the computer readable storage medium having stored therein computer executable instructions which, when executed by a processor, implement the method of any of claims 1-8.