CN116017003A - Self-adaptive VR360 video-on-demand method and system based on multiple artificial intelligence methods - Google Patents

Self-adaptive VR360 video-on-demand method and system based on multiple artificial intelligence methods Download PDF

Info

Publication number
CN116017003A
CN116017003A CN202310028902.5A CN202310028902A CN116017003A CN 116017003 A CN116017003 A CN 116017003A CN 202310028902 A CN202310028902 A CN 202310028902A CN 116017003 A CN116017003 A CN 116017003A
Authority
CN
China
Prior art keywords
video
adaptive
network
code rate
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310028902.5A
Other languages
Chinese (zh)
Inventor
闫彩霞
张凯喆
刘汇川
郑庆华
杜海鹏
王志文
曹坚翔
袁慕遥
王洋
张志浩
张未展
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202310028902.5A priority Critical patent/CN116017003A/en
Publication of CN116017003A publication Critical patent/CN116017003A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention provides a self-adaptive VR360 video-on-demand method and a system based on a plurality of artificial intelligence methods, which are characterized in that a generated countermeasure network is utilized to carry out significance detection on an original video, and the original video is dynamically divided into a plurality of space blocks according to detection results and stored in a server; when a video is requested and watched, a long and short memory network is used for establishing an extraction model of network track characteristics, and bandwidth information at a future moment is predicted; taking the predicted bandwidth information and the past viewport track information as state input of code rate decision, and training an optimal code rate corresponding to the A3C network decision by using a PPO algorithm; downloading and playing the corresponding video blocks according to the code rate decision result; the generation of the countermeasure network can be guaranteed, and the video area can be divided to the maximum extent; the network state can be fully extracted to predict the bandwidth, and effective input is provided for code rate self-adaptive decision; the method based on the viewport prediction can furthest utilize the network to carry out effective transmission, reduce bandwidth waste and effectively improve the viewing quality of users.

Description

Self-adaptive VR360 video-on-demand method and system based on multiple artificial intelligence methods
Technical Field
The invention belongs to the technical field of video transmission, and particularly relates to a self-adaptive VR360 video-on-demand method and system based on a plurality of artificial intelligence methods.
Background
With the widespread use of multimedia technology and intelligent terminals, video services have become one of the main modes of learning work and entertainment life for people. 360 degree video is becoming increasingly popular due to the tremendous advancement in panoramic cameras and head mounted devices. However, since 360 degree video is typically high resolution, the transmission requires extremely high bandwidth. In order to protect quality of experience (QoE) of users, 360-degree video streaming systems based on spatial blocking are proposed, assigning high/low bit rates to corresponding video frames in order to bring the highest viewing quality to users within a limited bandwidth. However, different videos should have different emphasis, and the focus is usually in the video center with the person as the core; video with scenes as the core often focuses on the edges of the video. Dividing the same spatial block for different videos may result in inaccuracy of transmission and waste of bandwidth. Therefore, dynamic spatial block division strategies for different videos have been developed, so that in order to make the possible focused positions of the user exist in one spatial block at the same time, the spatial blocks are divided according to the learned significance, thereby saving bandwidth and improving the QoE of the user.
In order to reduce video quality switching delay and improve user QoE, the bandwidth prediction problem at the user side needs to predict the future network bandwidth at the user side and pre-fetch the Guan Malv version of video block in combination with the current network condition, which is a time sequence prediction problem. In bandwidth prediction, bandwidth change in a period of time can be predicted through a previous bandwidth time sequence through a seq2seq model, and the performance of long-term prediction is further improved through learning feature weights through a attention mechanism, so that better bandwidth estimation is provided for subsequent video self-adaptive transmission and playing, and good experience quality of users is ensured. Fov prediction problem is also a problem of time series prediction in nature, so both we use a similar approach to learn training.
According to the search and the update of the applicant, the following patents related to the invention and belonging to the video transmission field are searched:
CN108063961B, a self-adaptive code rate video transmission method and system based on reinforcement learning.
Cn1594307 a subscribes to video on demand delivery.
The above patent 1 provides a self-adaptive code rate video transmission method and system based on reinforcement learning. The method is based on a deep neural network for code rate prediction, a state space corresponding to a video block to be downloaded is input into the code rate prediction neural network, and the code rate prediction neural network outputs a code rate strategy; downloading video blocks to be downloaded according to a code rate strategy output by the code rate prediction neural network; after each video block is downloaded, calculating a video playing quality index corresponding to the video block and returning the video playing quality index to the code rate prediction neural network; the code rate prediction neural network trains according to the returned video playing quality index and the state space corresponding to the video block which is downloaded recently. The invention reduces the labor time cost of rule setting and parameter tuning, and greatly improves the video quality experience.
The above-mentioned patent 2 provides a video delivery system for video-on-demand stored at least partially in the vicinity of a user location. The video delivery system has a large number of generally viewable channels within the user's location and content receivers connected to those channels. One of the plurality of channels is used to transmit a hidden video stream that cannot be viewed when it arrives at the user location. The content receiver includes a storage device and a video reproduction circuit. The storage device is connected to the hidden channel and is near the user location.
The above related patent 1 uses deep reinforcement learning prediction to input a state space corresponding to a video block to be downloaded into a code rate prediction neural network, outputs a code rate policy, and downloads the required video block according to the code rate policy. The state space described in patent 1 includes information such as video block throughput rate, downloading time, etc., and ignores the influence of network bandwidth information on video viewing quality and accurate measurement, so when the network bandwidth varies severely, the method is difficult to give a better code rate policy to influence user QoE, and the method is suitable for traditional video, and lacks important factors similar to a significance region and fov for 360 video. Patent 2 provides only an optimization method for ordinary video on demand transmission, and is not fully applicable to 360 video.
Technical Field
In order to solve the problems in the prior art, the invention provides a self-adaptive VR360 video-on-demand method and a system based on various artificial intelligence methods, which use various advanced methods such as saliency area detection, dynamic space block division, bandwidth prediction, view port change prediction and code rate self-adaptive decision to provide a method for solving the high bandwidth consumption of VR360 video for the first time.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: an adaptive VR360 video-on-demand method based on multiple artificial intelligence methods, comprising the steps of:
step1, generating a saliency area based on generation of an attention mechanism and processing an original video by an countermeasure network, carrying out dynamic space block division according to the saliency area, and storing the generated new video in a server;
step2, establishing an extraction model of network track characteristics by using a long and short memory network to measure network bandwidth;
and step3, taking a bandwidth prediction result and a view port change track as state input of a code rate self-adaptive decision, training an optimal code rate corresponding to the A3C network decision by using a PPO algorithm, selecting a video file corresponding to the code rate based on the optimal code rate self-adaptive result of the server, and downloading the video file to a buffer area for decoding.
In step1, the video is extracted frame by frame, and then the generation of the happy region is performed on the frame processing by using the generation countermeasure network model, specifically the steps are as follows:
step 1.1, performing salient region identification through a generating countermeasure network based on an attention mechanism, generating a generator overall structure in the countermeasure network as an encoder-decoder, and generating a salient region map, wherein a discriminator discriminates whether a predicted map or a real map is input, so that the predicted map and the real map are indistinguishable, and outputting a predicted map close to the real map;
and 1.2, further processing the generated saliency region map, processing the obtained saliency region by using a MinimalOverlapping Cover algorithm, and dividing different dynamic space blocks.
In the step 1.1, the encoder uses a VGG-16 model, and the initial parameters use parameters obtained through ImageNet pre-training; the feature map size generated by the decoder step by step corresponds to each layer of the generator, the features obtained by the generator are amplified step by step, the discriminator consists of three layers of convolution networks, and the last layer uses a sigmoid function for classification judgment.
Step 1.2, dividing the area into three parts: the method comprises the steps of generating dynamic space blocks according to region division in a core region, an edge region and an irrelevant region, processing an original video file according to the generated dynamic space block positions, generating a new dash video, storing the new dash video in a server side, and waiting for a client to call and play.
When the network bandwidth is measured by using an extraction model of the long and short memory network to establish network track characteristics, a bandwidth prediction model is established at a server end, the network bandwidth is predicted by using bandwidth history data, the bandwidth prediction model adopts a seq2seq model added with an attention mechanism, an encoder layer is a single-layer bidirectional GRU, the decoder layer is a single-layer unidirectional GRU, and a section layer is a fully connected neural network.
The step3 specifically comprises the following steps:
the client interacts with the server, and the server acquires the visual angle change sent by the client, so as to obtain an expected visual angle;
the server takes a bandwidth prediction result, a buffer area and an expected view angle as a state space of a code rate self-adaptive decision, the code rate selection is realized based on a reinforcement learning algorithm PPO of an A3C framework, and an optimal code rate self-adaptive strategy is finally obtained through interaction of three elements of an environment state, actions and a reward function
Figure BDA0004045880030000041
The client selects the video file corresponding to the code rate to download to the buffer area and decode; and rendering to the Unity system for playing according to the corresponding synchronous rendering logic, and continuously collecting the visual angle change of the user.
When the client interacts with the server, the user uses the mobile VR device to watch, and the mobile VR device is used as the client to collect the viewport change data in real time and send the viewport change data back to the server.
The environment state comprises an estimated view port position, an estimated bandwidth value, a buffer occupancy rate and a current video block saliency area position when a current video block is requested, an action space is a code rate allocation strategy of different space blocks of the current video block, and rewards are QoE matrix values after a section of play is finished;
when the server generates dynamic space block video, each space block is dividedThe blocks are numbered according to (i, j) and the spatial block size of the c-th video block at (i, j) is positioned as d c,ij (r), the size of the c-th video block is considered as z (c),
Figure BDA0004045880030000042
Figure BDA0004045880030000043
definition matrix->
Figure BDA0004045880030000044
Representing whether (i, j) video block is in the viewport, the sum of code rates in the viewport is: />
Figure BDA0004045880030000051
The bandwidth at time stamp t is N (t), at t the user c Initiating a request of a c-th video block at a moment, wherein the average downloading speed of the current block is N c There is a delay deltat between the c-th video block and the (c + 1) -th video block c Then
Figure BDA0004045880030000052
Figure BDA0004045880030000053
The video blocks are stored in a play buffer after being downloaded, the buffer is divided into a plurality of slots, each slot contains a data block, B (t) E [0, B max ]Defined as buffer occupancy at time stamp t, where B max Representing buffer capacity, will B max Video content set to a few seconds, c video blocks are downloaded in the start-up phase, the buffer area is ensured not to be empty, and the buffer area occupies B (c+1) After the start-up phase, each new block is downloaded, the buffer update follows:
Figure BDA0004045880030000054
where ρ (x) =max { x,0};
the QoE matrix is determined according to three factors of average view port quality, buffer time and average view port quality which influence user experience, and the average view port quality, the buffer time and the average view port quality are weighted and summed to obtain the QoE matrix.
When the video file corresponding to the code rate is selected to be downloaded to the buffer area and decoded, the segmented video is downloaded according to the mpd file, the video is rendered to a unit material ball to be played, and the collected new video port change track is fed back to the server end for subsequent prediction.
The invention also provides a self-adaptive VR360 video-on-demand system based on various artificial intelligence methods, which comprises a client and a server, wherein the client and the server transmit video data through a network, and the client is VR equipment; when video on demand is carried out, the client and the server are carried out according to the method of the invention.
Compared with the prior art, the invention has at least the following beneficial effects: the invention aims at a truly immersive experience, and a user can select and watch videos on a VR head display; various methods for improving user experience are used, such as dynamic space block division, bandwidth prediction and code rate self-adaptive decision aiming at VR360 video, so that the user experience is maximized; the method can ensure that the generated countermeasure network can effectively generate a salient region and furthest divide a video region according to a generation result; the network state can be fully extracted to predict the bandwidth, and effective input is provided for code rate self-adaptive decision; the method based on the viewport prediction can furthest utilize the network to carry out effective transmission, reduce bandwidth waste and effectively improve the viewing quality of users.
Drawings
Fig. 1 is a schematic diagram of the structure of the present invention.
Fig. 2 is a flow chart of the algorithm of the present invention.
Fig. 3 is a schematic diagram of processing a salient region.
Fig. 4 is a schematic diagram of a bandwidth prediction model structure.
Detailed Description
The technical scheme of the invention is described in detail below with reference to specific application examples.
Referring to fig. 1, 2 and 4, the present invention provides an adaptive VR360 video on demand system based on a variety of artificial intelligence methods, comprising the steps of:
step1, generating a saliency area based on the generation of an attention mechanism and processing an original video by an countermeasure network, carrying out dynamic space block division according to the saliency area, and storing the generated new video in a server for later request playing;
step 1.1, saliency area identification is performed on the countermeasure network through generation based on an attention mechanism. The overall structure of the generator in the countermeasure network is an encoder-decoder, which is responsible for generating a significant graph, and a discriminator discriminates whether a predicted graph or a real graph is input, and aims to make the predicted graph and the real graph indistinguishable, so that a predicted graph close to the real graph is output. The generator encoder part uses VGG-16 model, and initial parameters use parameters obtained through image Net pre-training; and the decoder part is used for gradually generating feature map sizes corresponding to layers of the encoder and gradually amplifying features obtained by the encoder. The discriminator is composed of three layers of convolution networks, and the last layer uses a sigmoid function for classification judgment.
And step 1.2, further processing the generated saliency area map to divide different space blocks. The resulting region of significance was processed using the Minimal Overlapping Cover (MNC) algorithm. The region is divided into three parts: core region, edge region, irrelevant region. And generating a space block according to the region division. The generation process is shown in fig. 3 below, where a is the original video picture, b is the picture processed by the salient region, and c is the division of the salient region fine granularity into many small spatial partitions. And finally, processing the original video file according to the generated space block position, generating a new dash video, storing the new dash video at a server side, and waiting for the calling and playing of the client side.
And 2, constructing a bandwidth prediction model at a server side, and predicting network bandwidth by utilizing bandwidth history data, wherein the bandwidth prediction model is formed by a seq2seq model added with an attention mechanism. The encoder layer is a single-layer bidirectional GRU, the decoder layer is a single-layer unidirectional GRU, and the layer is a very complex fully-connected neural network. The network model is shown in the following figure:
and step3, inputting the bandwidth prediction result and the view port change track as states of code rate self-adaptive decisions, and selecting a video file corresponding to the code rate by the client based on the code rate self-adaptive results of the server, downloading the video file to a buffer area and decoding the video file.
And 3.1, the client interacts with the server, and the server acquires the visual angle change sent by the client. The user views on the mobile VR device and the device collects viewport change data in real time and sends it back to the server.
Step 3.2, the server takes the bandwidth prediction result, the buffer area and the expected view angle as a state space of the code rate self-adaptive decision, the selection of the code rate is realized based on a reinforcement learning algorithm PPO (Proximal Policy Optimization) of the A3C framework, and the optimal code rate self-adaptive strategy is finally obtained through the interaction of three elements of environment state, action and reward function reward
Figure BDA0004045880030000071
The environmental state comprises an estimated viewport position when the current video block is requested, an estimated bandwidth value, a buffer occupancy, and a current video block saliency region position. The action space is a code rate allocation policy for different spatial partitions of the current video block. The prize is a QoE matrix value after the end of a play.
The following characteristic indices are defined: when the server generates the dynamic space block video, the space blocks are numbered according to (i, j), and the space block size of the c-th video block at (i, j) is positioned as d c,ij (r), the size of the c-th video block is considered as z (c),
Figure BDA0004045880030000072
Figure BDA0004045880030000073
definition matrix->
Figure BDA0004045880030000074
Representing whether the (i, j) video block is in the viewport. Then define the sum of code rates in the viewport as:/>
Figure BDA0004045880030000075
Defining the bandwidth at the timestamp t as N (t), assuming that the user is at t c Initiating a request for a c-th video block at a time, the average download speed of this block being N c There is a short delay deltat between the c-th video block and the (c + 1) -th video block c Then
Figure BDA0004045880030000076
Figure BDA0004045880030000081
The video blocks are stored in a play buffer after being downloaded, the buffer is divided into a plurality of slots, and each slot contains a data block. B (t) ∈ [0, B max ]Defined as buffer occupancy at time stamp t, where B max Representing the buffer capacity. B because of the large amount of interaction required to view 360 degrees of video max Typically set to a few seconds of video content. The start-up phase requires downloading c video blocks, ensuring that the buffer is not empty. The buffer occupancy B at this time (c+1) =c×t. After the start-up phase, every time a new block is downloaded, the buffer update is as follows:
Figure BDA0004045880030000082
where ρ (x) =max { x,0}.
Next, a QoE matrix is defined, with the target focused onC-th video block
Figure BDA0004045880030000083
Several factors that can affect the user experience between video blocks:
average viewport quality because only video blocks in the viewport will be seen by the user during VR360 video viewing, then only video in the viewport is considered to affect the user's experience, with average viewport quality being:
Figure BDA0004045880030000084
buffer time, which is an event that obviously results in a poor user experience, can be calculated by:
Figure BDA0004045880030000085
the average viewport temporary storage changes, and the stable code rate energy is considered to bring more comfortable experience to the user in the viewport, and frequent code rate switching can bring uncomfortable feeling to the user, so that the index is quantized into:
Figure BDA0004045880030000086
and finally defining QoE, and carrying out weighted summation on the three indexes, wherein the weight is changed according to the specific scene requirement.
Step 3.3, the client selects the video file corresponding to the code rate to download to the buffer area and decode; and rendering to a Unity system for playing according to the corresponding synchronous rendering logic, continuously collecting the visual angle change of the user, downloading the segmented video according to the mpd file by the client, rendering the video to a Unity material ball for playing, and feeding back the collected new visual opening change track to the server for subsequent prediction.
The method of the present invention will be specifically described in detail by the following steps.
Step1, placing the original video into a server, using ffmpeg to realize frame extraction operation, detecting a salient region for each period of video, and dividing the salient region for the video frame. The small-granularity spatial partitions are divided for the video, and then the small-granularity spatial partitions are combined according to the saliency areas to generate large-granularity spatial partitions. And generating an mpd file according to the generated space blocks, and storing the mpd file at a server side to wait for calling.
Step2, the client and the server establish connection. The client plays a video, firstly downloads the mpd file, downloads the initial space block and blocks, and starts playing when the buffer area meets the playing limit. The client begins to track changes in the user's viewport and sends the viewport back to the client.
Step3, the server monitors the bandwidth change in real time, and the server uses the seq2seq model to predict the bandwidth.
Step4, the server receives the user viewport change fed back by the client, uses a code rate self-adaptive decision model to make code rate decisions by combining with information such as bandwidth predicted values, significance areas and the like, and distributes and transmits corresponding video blocks for the client. Returning to step2 until the video ends.
By adopting the technical scheme, the invention provides a self-adaptive VR360 video-on-demand system based on a plurality of artificial intelligence methods, so as to improve the QoE of users with large bandwidth requirements for VR360 immersive video viewing. In this method, first a generating task is performed using a generating countermeasure network, and a segmentation process is performed on an initial VR360 video to generate a salient region. And generating a space block for the video by using an MNC algorithm according to the saliency area so as to realize the maximum bandwidth saving. In the playing process, the client transmits the change of the visual port of the user to the server in real time. The server uses the network model of seq2seq to predict the bandwidth and the view port change respectively, and generates a predicted bandwidth and a predicted view port change value. And the server terminal carries out code rate self-adaptive decision according to various state information, and distributes videos to the client terminal according to decision results.

Claims (10)

1. An adaptive VR360 video-on-demand method based on a plurality of artificial intelligence methods, comprising the steps of:
step1, generating a saliency area based on generation of an attention mechanism and processing an original video by an countermeasure network, carrying out dynamic space block division according to the saliency area, and storing the generated new video in a server;
step2, establishing an extraction model of network track characteristics by using a long and short memory network to measure network bandwidth;
and step3, taking a bandwidth prediction result and a view port change track as state input of a code rate self-adaptive decision, training an optimal code rate corresponding to the A3C network decision by using a PPO algorithm, selecting a video file corresponding to the code rate based on the optimal code rate self-adaptive result of the server, and downloading the video file to a buffer area for decoding.
2. The adaptive VR360 video on demand method based on multiple artificial intelligence methods of claim 1, wherein in step1, video is extracted frame by frame and then salient region generation is used to generate a countermeasure network model to feel the frame processing, comprising the following specific steps:
step 1.1, performing salient region identification through a generating countermeasure network based on an attention mechanism, generating a generator overall structure in the countermeasure network as an encoder-decoder, and generating a salient region map, wherein a discriminator discriminates whether a predicted map or a real map is input, so that the predicted map and the real map are indistinguishable, and outputting a predicted map close to the real map;
and 1.2, further processing the generated saliency region map, processing the obtained saliency region by using a MinimalOverlapping Cover algorithm, and dividing different dynamic space blocks.
3. The adaptive VR360 video on demand method based on multiple artificial intelligence methods of claim 2, wherein in step 1.1, the encoder uses VGG-16 model and the initial parameters use image net pre-trained parameters; the feature map size generated by the decoder step by step corresponds to each layer of the generator, the features obtained by the generator are amplified step by step, the discriminator consists of three layers of convolution networks, and the last layer uses a sigmoid function for classification judgment.
4. The adaptive VR360 video on demand method based on multiple artificial intelligence methods of claim 2, wherein said step 1.2 divides the area into three parts: the method comprises the steps of generating dynamic space blocks according to region division in a core region, an edge region and an irrelevant region, processing an original video file according to the generated dynamic space block positions, generating a new dash video, storing the new dash video in a server side, and waiting for a client to call and play.
5. The adaptive VR360 video on demand method based on multiple artificial intelligence methods of claim 1, wherein when using long and short memory networks to build network track feature extraction models to measure network bandwidth, a bandwidth prediction model is built at a server end, bandwidth history data is used to predict network bandwidth, the bandwidth prediction model adopts a seq2seq model with an attention mechanism, an encoder layer is a single-layer bidirectional GRU, a decoder layer is a single-layer unidirectional GRU, and an attention layer is a fully connected neural network.
6. The adaptive VR360 video on demand method based on multiple artificial intelligence methods of claim 1, wherein step3 specifically comprises the steps of:
the client interacts with the server, and the server acquires the visual angle change sent by the client, so as to obtain an expected visual angle;
the server takes a bandwidth prediction result, a buffer area and an expected view angle as a state space of a code rate self-adaptive decision, the code rate selection is realized based on a reinforcement learning algorithm PPO of an A3C framework, and an optimal code rate self-adaptive strategy is finally obtained through interaction of three elements of an environment state, actions and a reward function
Figure FDA0004045880020000021
The client selects the video file corresponding to the code rate to download to the buffer area and decode; and rendering to the Unity system for playing according to the corresponding synchronous rendering logic, and continuously collecting the visual angle change of the user.
7. The adaptive VR360 video on demand method based on multiple artificial intelligence methods of claim 6, wherein the user views using the mobile VR device while the client and the server interact, and the mobile VR device collects viewport variation data as a client in real time and sends it back to the server.
8. The adaptive VR360 video on demand method based on multiple artificial intelligence methods of claim 6, wherein the environmental state includes an estimated viewport position at the time of current video block request, an estimated bandwidth value, a buffer occupancy, a current video block saliency area position, an action space is a code rate allocation policy for different spatial partitions of the current video block, and a reward is a QoE matrix value after a playback is completed;
when the server generates the dynamic space block video, the space blocks are numbered according to (i, j), and the space block size of the c-th video block at (i, j) is positioned as d c,ij (r), the size of the c-th video block is considered as z (c),
Figure FDA0004045880020000031
Figure FDA0004045880020000032
definition matrix->
Figure FDA0004045880020000033
Representing whether (i, j) video block is in the viewport, the sum of code rates in the viewport is: />
Figure FDA0004045880020000034
The bandwidth at time stamp t is N (t), at t the user c Initiating a request of a c-th video block at a moment, wherein the average downloading speed of the current block is N c There is a delay deltat between the c-th video block and the (c + 1) -th video block c Then
Figure FDA0004045880020000035
Figure FDA0004045880020000036
The video blocks are stored in playing buffer after being downloaded, and the buffer is divided into a plurality of slotsEach slot contains a data block, B (t) ∈ [0, B max ]Defined as buffer occupancy at time stamp t, where B max Representing buffer capacity, will B max Video content set to a few seconds, c video blocks are downloaded in the start-up phase, the buffer area is ensured not to be empty, and the buffer area occupies B (c+1) After the start-up phase, each new block is downloaded, the buffer update follows:
Figure FDA0004045880020000037
where ρ (x) =max { x,0};
the QoE matrix is determined according to three factors of average view port quality, buffer time and average view port quality which influence user experience, and the average view port quality, the buffer time and the average view port quality are weighted and summed to obtain the QoE matrix.
9. The adaptive VR360 video on demand method based on multiple artificial intelligence methods of claim 6, wherein when selecting a video file corresponding to a code rate to download to a buffer and decode, downloading a block video according to an mpd file, rendering the video to a unit material ball for playing, and feeding back the collected new viewport variation trajectories to a server for subsequent prediction.
10. The self-adaptive VR360 video-on-demand system based on multiple artificial intelligence methods is characterized by comprising a client and a server, wherein the client and the server transmit video data through a network, and the client is VR equipment; in video-on-demand, the client and the server operate according to the method of any one of claims 1 to 9.
CN202310028902.5A 2023-01-09 2023-01-09 Self-adaptive VR360 video-on-demand method and system based on multiple artificial intelligence methods Pending CN116017003A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310028902.5A CN116017003A (en) 2023-01-09 2023-01-09 Self-adaptive VR360 video-on-demand method and system based on multiple artificial intelligence methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310028902.5A CN116017003A (en) 2023-01-09 2023-01-09 Self-adaptive VR360 video-on-demand method and system based on multiple artificial intelligence methods

Publications (1)

Publication Number Publication Date
CN116017003A true CN116017003A (en) 2023-04-25

Family

ID=86019536

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310028902.5A Pending CN116017003A (en) 2023-01-09 2023-01-09 Self-adaptive VR360 video-on-demand method and system based on multiple artificial intelligence methods

Country Status (1)

Country Link
CN (1) CN116017003A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116996661A (en) * 2023-09-27 2023-11-03 中国科学技术大学 Three-dimensional video display method, device, equipment and medium
CN117156175A (en) * 2023-10-30 2023-12-01 山东大学 Panoramic video stream QoE optimization method based on visual port prediction distance control

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116996661A (en) * 2023-09-27 2023-11-03 中国科学技术大学 Three-dimensional video display method, device, equipment and medium
CN116996661B (en) * 2023-09-27 2024-01-05 中国科学技术大学 Three-dimensional video display method, device, equipment and medium
CN117156175A (en) * 2023-10-30 2023-12-01 山东大学 Panoramic video stream QoE optimization method based on visual port prediction distance control
CN117156175B (en) * 2023-10-30 2024-01-30 山东大学 Panoramic video stream QoE optimization method based on visual port prediction distance control

Similar Documents

Publication Publication Date Title
CN116017003A (en) Self-adaptive VR360 video-on-demand method and system based on multiple artificial intelligence methods
Zhang et al. DRL360: 360-degree video streaming with deep reinforcement learning
Sun et al. Flocking-based live streaming of 360-degree video
CN112291620A (en) Video playing method and device, electronic equipment and storage medium
CN101262443B (en) A self-adapted real-time transmission method for mobile phone stream media
Wu et al. Dynamic resource allocation via video content and short-term traffic statistics
CN103905820A (en) Client side video quality self-adaption method and system based on SVC
Park et al. Navigation graph for tiled media streaming
Fu et al. Sequential reinforced 360-degree video adaptive streaming with cross-user attentive network
Jiang et al. A hierarchical buffer management approach to rate adaptation for 360-degree video streaming
CN113783944B (en) Video data processing method, device, system and equipment based on cloud edge cooperation
CN112714315A (en) Layered buffering method and system based on panoramic video
CN114584801A (en) Video resource caching method based on graph neural network recommendation algorithm
US20220408097A1 (en) Adaptively encoding video frames using content and network analysis
CN108810468B (en) Video transmission device and method for optimizing display effect
Feng et al. Perceptual quality aware adaptive 360-degree video streaming with deep reinforcement learning
CN113162895B (en) Dynamic coding method, streaming media quality determination method and electronic equipment
CN112672227B (en) Service processing method, device, node and storage medium based on edge node
CN112751865B (en) Data uplink optimization method and device
Huang et al. QoE-driven mobile 360 video streaming: Predictive view generation and dynamic tile selection
Zhang et al. Deep reinforcement learning based adaptive 360-degree video streaming with field of view joint prediction
CN111586414B (en) SVC and DASH-based 360-degree video stream scheduling method
Liu et al. Throughput Prediction-Enhanced RL for Low-Delay Video Application
Zeynali et al. BOLA360: Near-optimal View and Bitrate Adaptation for 360-degree Video Streaming
Li et al. CAST: An Intricate-Scene Aware Adaptive Bitrate Approach for Video Streaming via Parallel Training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination