CN115037962A - Video adaptive transmission method, device, terminal equipment and storage medium - Google Patents

Video adaptive transmission method, device, terminal equipment and storage medium Download PDF

Info

Publication number
CN115037962A
CN115037962A CN202210609323.5A CN202210609323A CN115037962A CN 115037962 A CN115037962 A CN 115037962A CN 202210609323 A CN202210609323 A CN 202210609323A CN 115037962 A CN115037962 A CN 115037962A
Authority
CN
China
Prior art keywords
video
result
user
decision
video content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210609323.5A
Other languages
Chinese (zh)
Other versions
CN115037962B (en
Inventor
王�琦
程志鹏
李康敬
杨忠尧
张志浩
张源鸿
张未展
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Xian Jiaotong University
MIGU Video Technology Co Ltd
MIGU Culture Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Xian Jiaotong University
MIGU Video Technology Co Ltd
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, Xian Jiaotong University, MIGU Video Technology Co Ltd, MIGU Culture Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202210609323.5A priority Critical patent/CN115037962B/en
Publication of CN115037962A publication Critical patent/CN115037962A/en
Application granted granted Critical
Publication of CN115037962B publication Critical patent/CN115037962B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Graphics (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention discloses a video self-adaptive transmission method, a device, terminal equipment and a storage medium, which are used for acquiring panoramic video data; the method comprises the steps of carrying out self-adaptive decision making based on a user visual angle prediction result obtained in advance to obtain a decision making result, adjusting panoramic video data according to the decision making result to obtain video content, and sending the video content and the decision making result to a player so that the player can determine whether to carry out video reconstruction on the video content according to the decision making result to obtain a target video. By predicting the user visual angle of the video data in advance, the user preference can be recognized accurately for a long time, and by performing adaptive decision on the video data based on the user visual angle prediction result, adjusting the panoramic video data according to the decision result to obtain video content, and sending the video content and the decision result to the player, the video quality can be improved while the bandwidth resource is adapted, so that the immersive watching experience of the user on the panoramic video is improved.

Description

Video adaptive transmission method, device, terminal equipment and storage medium
Technical Field
The present invention relates to the field of video transmission technologies, and in particular, to a video adaptive transmission method and apparatus, a terminal device, and a storage medium.
Background
360 panoramic video is as an emerging video application, gives people an immersive viewing experience. As the Adaptive Streaming media technology (HTTP Adaptive Streaming, HAS) gradually becomes the mainstream technology of Streaming media distribution, Adaptive transmission of 360 panoramic video streams plays a crucial role in ensuring good viewing experience of users.
In the current 360-degree panoramic video stream adaptive transmission strategy, one of the strategies is to analyze the remarkable characteristic points in 360-degree panoramic video by adopting a mathematical method and further determine the position of a user view angle according to the positions of the characteristic points, but the method ignores the preference of the user, and the obtained prediction accuracy rate can bring larger fluctuation due to the switching of the user; secondly, a cyclic neural network is adopted to predict the future visual angle of the user by learning the time relation between the viewing points of the user at different times, but the influence of the change of the 360-degree video content on the visual angle of the user is ignored, so that when the change range of the 360-degree video content is large, the method cannot accurately predict the change of the visual angle of the user.
Therefore, there is a need for a solution that improves the immersive viewing experience of a user for panoramic video.
Disclosure of Invention
The invention mainly aims to provide a video self-adaptive transmission method, a video self-adaptive transmission device, a terminal device and a storage medium, and aims to improve the immersive viewing experience of a user on panoramic video.
In order to achieve the above object, the present invention provides a video adaptive transmission method, where the video adaptive transmission method is applied to a server, and the video adaptive transmission method includes:
the video self-adaptive transmission method comprises the following steps:
acquiring panoramic video data;
the method comprises the steps of carrying out self-adaptive decision making based on a user visual angle prediction result obtained in advance to obtain a decision making result, adjusting panoramic video data according to the decision making result to obtain video content, and sending the video content and the decision making result to a player so that the player can determine whether to carry out video reconstruction on the video content according to the decision making result to obtain a target video.
Optionally, the step of performing adaptive decision based on a pre-obtained user view prediction result to obtain a decision result, adjusting the panoramic video data according to the decision result to obtain video content, and sending the video content to a player further includes:
acquiring a head movement track and a panoramic video image of a user;
coding the head movement track of the user to obtain time characteristic information;
extracting the saliency characteristics of the panoramic video image to obtain user preference characteristics;
and obtaining the user visual angle prediction result according to the time characteristic information and the user preference characteristics.
Optionally, the step of obtaining the user perspective prediction result according to the time characteristic information and the user preference feature includes:
decoding the time characteristic information through a decoder to obtain a user view angle predicted motion track of the current frame image;
and integrating the user view angle predicted motion trail and the user preference characteristics through a fully-connected neural network to obtain a user view angle predicted result of the current frame image.
Optionally, the step of obtaining the user perspective prediction result of the current frame image by integrating the user perspective prediction motion trajectory and the user preference feature through a fully connected neural network further includes:
inputting the user view prediction result of the current frame image into the decoder for decoding to obtain the user view prediction motion track of the next frame image, taking the next frame image as the current frame image, and returning to the execution step: and integrating the user view angle predicted motion track and the user preference characteristics through a fully connected neural network to obtain a user view angle predicted result of the current frame image until all video images are processed.
Optionally, the step of performing adaptive decision based on a pre-obtained user view prediction result to obtain a decision result, adjusting the panoramic video data according to the decision result to obtain video content, and sending the video content to a player includes:
partitioning and slicing the panoramic video data to obtain each sliced video, and acquiring a network state estimation result of the player;
performing adaptive decision through a multi-decision reinforcement learning model according to the network state estimation result and the user view prediction result to obtain a decision result, wherein the decision result comprises a target code rate and a reconstruction strategy;
repackaging each of the cut-block videos according to the target code rate to obtain the video content;
and sending the video content to the player.
Optionally, the video adaptive transmission method is applied to a player, and the video adaptive transmission method includes the following steps:
and receiving video content and a decision result sent by a server, and determining whether to carry out video reconstruction on the video content according to the decision result to obtain a target video.
Optionally, the receiving the video content and the decision result sent by the server, and determining whether to perform video reconstruction on the video content according to the decision result to obtain the target video includes:
receiving video content and a decision result sent by the server, wherein the decision result comprises a reconstruction strategy, the player comprises a first buffer area and a second buffer area, the first buffer area is used for caching the video content, and the second buffer area is used for caching a reconstructed video;
judging whether video reconstruction needs to be carried out on the video content according to the reconstruction strategy;
if the video content needs to be subjected to video reconstruction, performing video reconstruction through a pre-trained super-resolution reconstruction model to obtain a reconstructed video, and taking the reconstructed video as the target video;
and if video reconstruction of the video content is not needed, taking the video content as the target video.
In addition, to achieve the above object, the present invention also provides a video adaptive transmission apparatus, including:
the acquisition module is used for acquiring panoramic video data;
the transmission module is used for carrying out self-adaptive decision based on a user visual angle prediction result obtained in advance to obtain a decision result, adjusting the panoramic video data according to the decision result to obtain video content, and sending the video content and the decision result to a player so that the player can determine whether to carry out video reconstruction on the video content according to the decision result to obtain a target video.
In addition, to achieve the above object, the present invention further provides a terminal device, where the terminal device includes a memory, a processor, and a video adaptive transmission program stored in the memory and capable of running on the processor, and when the video adaptive transmission program is executed by the processor, the terminal device implements the steps of the video adaptive transmission method as described above.
Further, to achieve the above object, the present invention also provides a computer readable storage medium having a video adaptive transmission program stored thereon, which when executed by a processor implements the steps of the video adaptive transmission method as described above.
The embodiment of the invention provides a video self-adaptive transmission method, a video self-adaptive transmission device, terminal equipment and a storage medium, wherein panoramic video data are acquired; the method comprises the steps of carrying out self-adaptive decision making based on a user visual angle prediction result obtained in advance to obtain a decision making result, adjusting panoramic video data according to the decision making result to obtain video content, and sending the video content and the decision making result to a player so that the player can determine whether to carry out video reconstruction on the video content according to the decision making result to obtain a target video. By predicting the user visual angle of the video data in advance, the user preference can be recognized accurately for a long time, and by carrying out self-adaptive decision on the video data based on the user visual angle prediction result, adjusting the panoramic video data according to the decision result to obtain video content, and sending the video content and the decision result to the player, the video quality can be improved while the bandwidth resource is adapted, so that the immersive watching experience of the user on the panoramic video is improved.
Drawings
Fig. 1 is a functional block diagram of a terminal device to which the video adaptive transmission apparatus of the present invention belongs;
FIG. 2 is a flowchart illustrating an exemplary embodiment of a video adaptive transmission method according to the present invention;
FIG. 3 is a schematic diagram of a user perspective prediction framework in an embodiment of the invention;
FIG. 4 is a flowchart illustrating a video adaptive transmission method according to another exemplary embodiment of the present invention;
FIG. 5 is a flowchart illustrating the step S114 in the embodiment of FIG. 4;
FIG. 6 is a flowchart illustrating the step S20 in the embodiment of FIG. 2;
FIG. 7 is a flowchart illustrating a method for adaptive video transmission according to another exemplary embodiment of the present invention;
FIG. 8 is a schematic view of the detailed flow chart of step A10 in the embodiment of FIG. 7;
FIG. 9 is a schematic diagram illustrating a principle of a super-resolution-based panoramic video bitrate adaptive transmission method in an embodiment of the present invention;
fig. 10 is a schematic diagram of an adaptive transmission principle of the simulation system in the embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The main solution of the embodiment of the invention is as follows: by acquiring video data; and performing code rate decision on the video data based on a user visual angle prediction result obtained in advance to obtain video content, and sending the video content to a player. By predicting the user view angle of the video data in advance, the user preference can be recognized accurately for a long time, and by making a code rate decision on the video data based on the user view angle prediction result and transmitting the video content to the player, the video quality can be improved while the bandwidth resource is adapted, so that the immersive watching experience of the user on the panoramic video is improved.
The technical terms related to the embodiment of the invention are as follows:
quality of Experience (Quality of Experience, QoE): refers to the user's subjective perception of the quality and performance of devices, networks and systems, applications or services. QoE refers to the user's perceived difficulty in completing the entire process;
adaptive Streaming technology (HTTP Adaptive Streaming, HAS): the technology can intelligently sense the downloading speed of a user, then dynamically adjust the coding rate of the video and provide high-quality and smoother video presentation for the user;
rate Adaptive techniques (Adaptive Bitrate Streaming, ABR): sensing the change of the network environment, or automatically making reasonable code rate adjustment according to the buffer playing condition of the client, and improving (maximizing) the experience quality of users watching videos on line;
long Short-Term Memory network (LSTM): a time-cycle neural network is specially designed for solving the long-term dependence problem of a general RNN (recurrent neural network).
360 panoramic video is as an emerging video application, gives people an immersive viewing experience. As the HTTP adaptive streaming technology (HAS) gradually becomes the mainstream technology of streaming media distribution, the adaptive transmission of 360 panoramic video streams can not only greatly reduce the bandwidth consumption of transmission, but also ensure good viewing experience (QoE) for users. In the 360-view panoramic video stream adaptive transmission strategy, how to predict the user view angle (FoV) accurately for a long time and how to make an optimal adaptive code rate (ABR) transmission strategy to save network bandwidth and ensure a good immersive viewing experience for the user are currently major difficulties and challenges.
At present, in a 360-degree panoramic video stream adaptive transmission strategy, one of the methods is that a dynamic adaptive streaming media code rate allocation method for maintaining space-time consistency of 360-degree videos comprises a code rate adaptive algorithm, a view field (FoV) conversion model, a block priority calculation model and a block code rate allocation algorithm, a gaussian model and a zipov model are used for estimating a FoV view angle, priorities of all blocks of the 360-degree videos are calculated, and then a segment code rate required by downloading a current video segment is determined by comprehensively considering a buffer length and video quality through the code rate adaptive algorithm; and the other one is a viewpoint prediction model based on a cyclic neural network, a viewpoint tracking module based on a relevant filter and a fusion module. The time relation among the watching viewpoints of the user at different times is learned through training so as to predict the viewpoint sequence of a plurality of times in the future.
However, the first method adopts a mathematical method to analyze 360 salient feature points in the panoramic video and then determine the position of the user view angle according to the positions of the feature points, but ignores the preference of the user, the obtained Fov view angle is only based on the video information, and the obtained prediction accuracy rate also fluctuates greatly due to the switching of the user. The second method adopts the recurrent neural network to predict the future visual angle of the user by learning the time relation between the viewing points of the user at different times, but ignores the influence of the change of the 360-degree video content on the visual angle of the user, so that when the 360-degree video content has a large change range, the model cannot accurately predict the change of the visual angle of the user, and has certain limitation. Therefore, problems still exist in view prediction of users at present stage, which results in poor experience of users when watching 360 video.
The invention provides a novel code rate self-adaptive transmission method based on super-resolution as a solution, which mainly comprises two parts, namely a user visual angle prediction method and a 360-degree video transmission method based on super-resolution. Finally, a 360 video self-adaptive transmission prototype simulation system verifies that the method adopted by the proposal can effectively improve the watching experience of the user.
Specifically, referring to fig. 1, fig. 1 is a functional module schematic diagram of a terminal device to which the video adaptive transmission apparatus of the present invention belongs. The video adaptive transmission device can be a device which is independent of the terminal equipment and can carry out video adaptive transmission, and the device can be borne on the terminal equipment in a hardware or software mode. The terminal equipment can be an intelligent mobile terminal with a data processing function, such as a mobile phone, a tablet personal computer and the like, and can also be fixed terminal equipment or a server and the like with the data processing function.
In this embodiment, the terminal device to which the video adaptive transmission apparatus belongs at least includes an output module 110, a processor 120, a memory 130, and a communication module 140.
The memory 130 stores an operating system and a video adaptive transmission program, and the video adaptive transmission apparatus may perform adaptive decision-making on the acquired panoramic video data and a user view prediction result obtained in advance to obtain a decision-making result, and adjust the panoramic video data according to the decision-making result, and store information such as obtained video content and the like in the memory 130; the output module 110 may be a display screen or the like. The communication module 140 may include a WIFI module, a mobile communication module, a bluetooth module, and the like, and communicates with an external device or a server through the communication module 140.
Wherein the video adaptive transmission program in the memory 130 when executed by the processor implements the steps of:
acquiring panoramic video data;
the method comprises the steps of carrying out self-adaptive decision making based on a user visual angle prediction result obtained in advance to obtain a decision making result, adjusting panoramic video data according to the decision making result to obtain video content, and sending the video content and the decision making result to a player so that the player can determine whether to carry out video reconstruction on the video content according to the decision making result to obtain a target video.
Further, the video adaptive transmission program in the memory 130 when executed by the processor further implements the steps of:
acquiring a head motion track and a panoramic video image of a user;
coding the head movement track of the user to obtain time characteristic information;
extracting the saliency characteristics of the panoramic video image to obtain user preference characteristics;
and obtaining the user visual angle prediction result according to the time characteristic information and the user preference characteristics.
Further, the video adaptive transmission program in the memory 130 when executed by the processor further implements the steps of:
decoding the time characteristic information through a decoder to obtain a user view angle predicted motion track of the current frame image;
and integrating the user view angle predicted motion trail and the user preference characteristics through a fully connected neural network to obtain a user view angle predicted result of the current frame image.
Further, the video adaptive transmission program in the memory 130 when executed by the processor further implements the steps of:
inputting the user view prediction result of the current frame image into the decoder for decoding to obtain the user view prediction motion track of the next frame image, taking the next frame image as the current frame image, and returning to the execution step: and integrating the user view angle predicted motion track and the user preference characteristics through a fully connected neural network to obtain a user view angle predicted result of the current frame image until all video images are processed.
Further, the video adaptive transmission program in the memory 130 when executed by the processor further implements the steps of:
partitioning and slicing the panoramic video data to obtain each sliced video, and acquiring a network state estimation result of the player;
performing self-adaptive decision making through a multi-decision reinforced learning model according to the network state estimation result and the user view prediction result to obtain a decision making result, wherein the decision making result comprises a target code rate and a reconstruction strategy;
repackaging each of the cut-block videos according to the target code rate to obtain the video content;
and sending the video content to the player.
Further, the video adaptive transmission program in the memory 130 when executed by the processor further implements the steps of:
and receiving video content and a decision result sent by a server, and determining whether to carry out video reconstruction on the video content according to the decision result to obtain a target video.
Further, the video adaptive transmission program in the memory 130 when executed by the processor further implements the steps of:
receiving video content and a decision result sent by the server, wherein the decision result comprises a reconstruction strategy, the player comprises a first buffer area and a second buffer area, the first buffer area is used for caching the video content, and the second buffer area is used for caching a reconstructed video;
judging whether video reconstruction needs to be carried out on the video content according to the reconstruction strategy;
if the video content needs to be subjected to video reconstruction, performing video reconstruction through a pre-trained super-resolution reconstruction model to obtain a reconstructed video, and taking the reconstructed video as the target video;
and if video reconstruction of the video content is not needed, taking the video content as the target video.
According to the scheme, the panoramic video data are obtained specifically; the method comprises the steps of carrying out self-adaptive decision making based on a user visual angle prediction result obtained in advance to obtain a decision making result, adjusting panoramic video data according to the decision making result to obtain video content, and sending the video content and the decision making result to a player so that the player can determine whether to carry out video reconstruction on the video content according to the decision making result to obtain a target video. By predicting the user visual angle of the video data in advance, the user preference can be recognized accurately for a long time, and by performing adaptive decision on the video data based on the user visual angle prediction result, adjusting the panoramic video data according to the decision result to obtain video content, and sending the video content and the decision result to the player, the video quality can be improved while the bandwidth resource is adapted, so that the immersive watching experience of the user on the panoramic video is improved.
Based on the above terminal device architecture but not limited to the above architecture, embodiments of the method of the present invention are presented.
The main body of the method of this embodiment may be a video adaptive transmission apparatus or a terminal device, and the video adaptive transmission apparatus is used as an example in this embodiment.
Referring to fig. 2, fig. 2 is a flowchart illustrating an exemplary embodiment of a video adaptive transmission method according to the present invention. The video self-adaptive transmission method comprises the following steps:
step S10, acquiring panoramic video data;
the panoramic video is also called a 360-degree video and is a spherical video, the panoramic video covers horizontal 360 degrees and vertical 180-degree picture contents, a user can watch the picture contents in different areas by rotating the head after wearing the head-mounted display, the visual angle of the user watched by human eyes is about 110 degrees, and the visual angle area of the user only occupies one part of the panoramic video, so that a large amount of bandwidth resources can be wasted in panoramic transmission, video playing blockage and high time delay are easily brought, the video watching experience of the user cannot be ensured, therefore, the acquired video data needs to be subjected to user visual angle prediction and code rate decision, so that the bandwidth resources are effectively saved, and the user experience is improved. Heretofore, the video data may be acquired by receiving the video data from a video server through a gateway device and processing the video data.
Step S20, performing adaptive decision based on a user view prediction result obtained in advance to obtain a decision result, adjusting the panoramic video data according to the decision result to obtain video content, and sending the video content and the decision result to a player so that the player can determine whether to perform video reconstruction on the video content according to the decision result to obtain a target video.
After the panoramic video data is obtained, the user view angle can be predicted based on a pre-constructed user view angle prediction model, then adaptive decision is carried out on the video data based on the user view angle prediction result to obtain a decision result, and the panoramic video data is adjusted according to the decision result, so that video content is obtained and sent to a player.
Specifically, referring to fig. 3, fig. 3 is a schematic diagram illustrating a user perspective prediction framework principle in an embodiment of the present invention, and as shown in fig. 3, based on a concept of video content understanding, a frame 360 of panoramic video image is analyzed to extract an object that may be interested by a user. Then, a Long Short-Term Memory network (LSTM) is used as a basic model for extracting time characteristics, a model for extracting 360 panoramic video image content information characteristics by a user is added, and prediction of the user view angle is made by combining time characteristic information and user preference information of 360 panoramic videos in spatial dimensions.
The problem that probability distribution is inconsistent before and after prediction of a user view motion track time sequence can be solved based on an encoder-decoder model, but the structure still predicts a future view by utilizing a time sequence characteristic actually, the predicted robustness can be reduced along with the increase of a prediction step length, especially, deviation of a predicted value at a certain moment can be continuously transmitted to the future prediction, therefore, user preference information based on video content is integrated on the basis of the encoder-decoder structure, the input of an encoder is not output of a self prediction result any more, the time sequence characteristic and the user preference characteristic based on the video content are integrated through a full-connection neural network, and the corrected user view prediction result is used as the input of the next prediction.
The integrated video content-based user preference information mainly comprises semantic information (characteristics of a user interested area) of image content and position information in a video image corresponding to the semantic information, and as one implementation mode, the video content of a user watching area can be extracted according to user visual angle coordinates, and then the saliency characteristics of the video content of the user visual angle area are extracted. The video content feature can also be regarded as a time sequence, and the user preference feature at each time is integrated through a fully-connected neural network and then used as the input of the LSTM decoder network, and the integrated result of the fully-connected neural network is actually the output prediction result.
Furthermore, on the basis of user view angle prediction, code rate self-adaptive transmission is carried out on the user view angle, so that the performance of the 360-degree panoramic video transmission system can be effectively improved. The high-definition images are locally reconstructed by utilizing the computing power of the video playing client, the dependence on network bandwidth is reduced while the video image quality is maintained, and the experience quality of watching videos by users can be improved. And (4) applying a deep reinforcement learning model to make a decision on the download code rate and the image super-resolution reconstruction so as to achieve the optimal user experience quality.
In the embodiment, panoramic video data is acquired; the method comprises the steps of carrying out self-adaptive decision making based on a user visual angle prediction result obtained in advance to obtain a decision making result, adjusting panoramic video data according to the decision making result to obtain video content, and sending the video content and the decision making result to a player so that the player can determine whether to carry out video reconstruction on the video content according to the decision making result to obtain a target video. By predicting the user visual angle of the video data in advance, the user preference can be recognized accurately for a long time, and by carrying out self-adaptive decision on the video data based on the user visual angle prediction result, adjusting the panoramic video data according to the decision result to obtain video content, and sending the video content and the decision result to the player, the video quality can be improved while the bandwidth resource is adapted, so that the immersive watching experience of the user on the panoramic video is improved.
Referring to fig. 4, fig. 4 is a flowchart illustrating a video adaptive transmission method according to another exemplary embodiment of the present invention. Based on the embodiment shown in fig. 2, in this embodiment, before performing an adaptive decision based on a pre-obtained user view prediction result to obtain a decision result, adjusting the panoramic video data according to the decision result to obtain video content, and sending the video content to a player, the video adaptive transmission method further includes:
step S111, acquiring a head motion track and a panoramic video image of a user;
the user view angle can also be called a view point, accurate prediction of the user view point change is a key for improving user experience, and the quality of a video picture watched by a user is reduced or the picture is lost due to prediction errors. Before the user visual angle prediction is carried out, a user visual angle prediction model needs to be constructed, a head movement data set can be obtained through an open source database, and the data in the head movement data set is adopted for training to obtain the user visual angle prediction model.
In the process of predicting the user view angle by applying the user view angle prediction model, firstly, a head motion track of a user and a panoramic video image of a user watching area are collected, time characteristic information can be obtained by coding the head motion track of the user, and user preference characteristics can be obtained by extracting the panoramic video image of the user watching area.
Step S112, encoding the head movement track of the user to obtain time characteristic information;
specifically, the seq2seq model adopted in the embodiment of the present invention includes an Encoder (Encoder) and a Decoder (Decoder), and the Encoder encodes all input sequences into a unified semantic vector and then decodes the semantic vector by the Decoder. In the embodiment of the invention, an LSTM model is used for matching the historical track x in the time T ═ {1, 2, ·, T } t Encoding is carried out, and the future user view motion track is predicted by using another LSTM network as a decoder which uses the latest hidden state h of the encoder t And memory status c t To be initialized and to use the latest history data of the user viewing trajectory as input initial values of the decoder. Based on the encoder-decoder model, the problem of inconsistent probability distribution before and after the prediction of the user view motion track time sequence can be solved.
Step S113, performing significance characteristic extraction on the panoramic video image to obtain user preference characteristics;
further, video content of a user viewing area is extracted according to the user view angle coordinates, and then salient feature extraction is performed according to the extracted video content of the user view angle area (here, image content without distortion in the user view angle).
And finally, performing down-sampling on the saliency map, wherein each pixel point represents the saliency feature of a small region of the image. The video content feature can also be regarded as a time sequence, and the user preference feature at each time is integrated through a fully-connected neural network and then used as the input of the LSTM decoder network, and the integrated result of the fully-connected neural network is actually the output prediction result.
Step S114, obtaining the user view angle prediction result according to the time characteristic information and the user preference characteristics.
And inputting the user view prediction result into the decoder for decoding to obtain a next user view prediction motion track for predicting a next user view.
Furthermore, after the time characteristic information is obtained by encoding the head movement track of the user, the user preference characteristic is obtained by extracting the panoramic video image of the watching area of the user, the time characteristic information and the user preference characteristic based on the video content can be integrated through the full-connection neural network to obtain the user view angle prediction result, and in addition, the corrected user view angle prediction result can be used as the input of the next prediction.
According to the scheme, the head movement track and the panoramic video image of the user are obtained; coding the head movement track of the user to obtain time characteristic information; extracting the saliency characteristics of the panoramic video image to obtain user preference characteristics; and obtaining the user visual angle prediction result according to the time characteristic information and the user preference characteristics. The time characteristic information is obtained by encoding the head movement track of the user, and is combined with the user preference characteristics extracted according to the panoramic video image to obtain the user visual angle prediction result, so that the long-term effective comprehensive prediction of the user visual angle from the time dimension to the space dimension is realized, and the immersive watching experience of the user on the panoramic video is promoted.
Referring to fig. 5, fig. 5 is a schematic specific flowchart of step S114 in the embodiment of fig. 4. This embodiment is based on the embodiment shown in fig. 4, in this embodiment, the step S114 includes:
step S1141, decoding the time characteristic information through a decoder to obtain a user view angle predicted motion track of the current frame image;
specifically, in order to solve the problem of inconsistent distribution between historical data and future predicted values, the present invention adopts a seq2seq model, instead of a single LSTM model, in the seq2seq model, an Encoder (Encoder) and a Decoder (Decoder) are actually included, and the Encoder uses the Encoder to encode the dataAll input sequences are encoded into a unified semantic vector and then decoded by the decoder. In the embodiment of the invention, an LSTM model is used for comparing the history track x in the time T {1, 2 }, T } t Encoding is carried out, another LSTM network is used as a decoder, the future user view motion track is predicted, and the decoder uses the latest hidden state h of the encoder t And memory status c t To initialize and use the latest history data of the user's viewing trajectory as the input initial value of the decoder. LSTM decoder uses prediction result y at time t' -1 t’-1 The view of the prediction t' that is cycled as input, the length of the decoder cycle output can be adjusted according to the requirement of the prediction step size. Based on the encoder-decoder model, the problem of inconsistent probability distribution before and after the prediction of the user view motion track time sequence can be solved. As one of the embodiments, LSTM has a hidden layer of 2 and the number of neurons per layer is 128.
Step S1142, the predicted motion trajectory of the user view angle and the user preference feature are integrated through a full-connection neural network, and a user view angle prediction result of the current frame image is obtained.
Inputting the user view prediction result of the current frame image into the decoder for decoding to obtain the user view prediction motion track of the next frame image, taking the next frame image as the current frame image, and returning to the execution step: and integrating the user view angle predicted motion track and the user preference characteristics through a fully connected neural network to obtain a user view angle predicted result of the current frame image until all video images are processed.
Furthermore, after a predicted motion trajectory of a user view angle is obtained through decoding, a user preference feature is obtained through extracting a significant feature of a panoramic video image of a user watching area, the user preference feature can also be regarded as a time sequence, the user preference feature at each moment is integrated through a fully-connected neural network, then the integrated result is used as the input of a decoder LSTM network, the integrated result of the fully-connected neural network is actually the output predicted result of the user view angle, and the input processed by the whole decoder structure is a sequence which is continuous in time, so the predicted result can reflect the real trajectory of the user view angle.
In this embodiment, with the above scheme, the decoder specifically decodes the time characteristic information to obtain a predicted motion trajectory of the user view of the current frame image; and integrating the user view angle predicted motion trail and the user preference characteristics through a fully connected neural network to obtain a user view angle predicted result of the current frame image. The user visual angle prediction motion track is obtained through decoding, the user preference feature is obtained through extracting the significance feature of the panoramic video image of the user watching area, the panoramic video image is integrated through the fully-connected neural network, the integrated result is used as the input of the LSTM network of the subsequent decoder, and the user visual angle prediction result is finally output.
Referring to fig. 6, fig. 6 is a schematic specific flowchart of step S20 in the embodiment of fig. 2. This embodiment is based on the embodiment shown in fig. 2, and in this embodiment, the step S20 includes:
step S201, partitioning and slicing the panoramic video data to obtain each sliced video, and acquiring a network state estimation result of the player;
specifically, since the main functions of the server in the embodiment of the present invention are to perform DASH transformation of videos, train a hyper-diversity network of videos, store videos and SR networks, and process requests from clients, i.e., players. The tools mainly used for DASH conversion of video here are Kvazaar, GPAC and FFmpeg. The main role of Kvazaar is to re-encode and generate a video content without motion restriction, so that each video slice area is encoded independently, that is, each slice of the video can be played independently. The video is then partitioned and repackaged into DASH-formatted video content using GPAC to accommodate the player's network state.
Since the network status of the player is crucial to the viewing experience of the user, the network status of the player needs to be accurately estimated. The network state of the player is estimated mainly by adopting a bandwidth prediction algorithm, and a single-value method, an average method or a weighted average method and the like can be adopted. As one implementation mode, the downloading rate of the video segments can be calculated according to the downloading data volume and the downloading time, the downloading rate represents the size of the available bandwidth to a certain extent, the single-value method directly takes the downloading rate of the current video segment as the predicted bandwidth of the next moment, and the average method takes the downloading rate of all historical video segments as the predicted bandwidth of the next moment.
Step S202, carrying out self-adaptive decision making according to the network state estimation result and the user visual angle prediction result through a multi-decision reinforcement learning model to obtain a decision making result, wherein the decision making result comprises a target code rate and a reconstruction strategy;
further, after a network state estimation result is obtained, a proper bitrate version can be allocated to tiles in a 360-degree panoramic video, and by predicting the view angle of a user, the most proper bitrate version is dynamically selected according to the state of the player, so that the watching experience (including video picture quality, video playing fluency and the like) of the user is improved by utilizing the characteristics that a streaming media can be encoded into small files with multiple bitrate versions, and each file can be played independently. Meanwhile, when bandwidth resources are seriously insufficient, even if the accuracy of bandwidth estimation is very high, due to the limitation of bandwidth, video content with high definition code rate cannot be transmitted, the player can only ensure the fluency of video playing at the moment, but the fluency cannot compensate for the reduction of user watching experience caused by the reduction of video quality, so that the mode of adding the image super-resolution technology in the self-adaptive transmission system is adopted in the embodiment of the invention, and the player can reconstruct the high definition video content by utilizing the computing power of the client when the network bandwidth is insufficient.
Specifically, in order to solve the multi-decision problem of bitrate decision and super-resolution reconstruction, a 360-panorama video bitrate adaptive transmission frame with a double-buffer mechanism (double-buffer) is further constructed in the embodiment of the present invention, wherein one buffer is used for caching downloaded video content, and the other buffer is used for caching video content reconstructed by SR.
For the downloading situation in the double-buffer mechanism, local video content is reconstructed through the SR, and the client can still improve the quality of pictures by reconstructing high-definition video content no matter the quality of pictures of user view angles is reduced due to the prediction error of the user view angles or the quality of pictures is reduced due to the fact that the video content with lower code rate is downloaded due to insufficient bandwidth. But video jams may also occur because the client's local computing resources are insufficient to handle the real-time reconstruction requirements. The decision on how to make download and reconstruction at the same time is very challenging, since both decision actions compete with each other in time consumption.
Specifically, a state space state is defined first. The environment in reinforcement learning refers to all things interacting with an intelligent agent, and in the environment of 360-degree panoramic video code rate adaptive transmission, the state space refers to all information related to video code rate decision control. Specifically, the state space includes prediction of network throughput, current buffer occupancy of the player, the code rate size of the last chunk, the residual number of chunks, historical chunk download time, the size of the next chunk at different code rates, and the user view position, and the state of the state space may be expressed as shown in formula (1):
Figure BDA0003672574510000151
wherein S k The state when the player has downloaded the kth video slice,
Figure BDA0003672574510000152
-network throughput prediction of past k video slices;
Figure BDA0003672574510000153
downloading of past k video slicesTime, which is the time interval over which the network throughput measurement is expressed;
Figure BDA0003672574510000154
-the size of each tile in the next video slice at different code rates; b is k -current buffer occupancy size; c k -the number of video slice blocks still remaining;
Figure BDA0003672574510000155
-the rate of downloading of each tile in the previous video slice; v. of k -the user perspective of the predicted next chunk.
For action in a state space, a 360-panorama video code rate adaptive transmission framework based on image super-resolution needs to simultaneously decide the video code rate inside a 360-panorama video user view and whether to locally apply SR to reconstruct high-resolution video content. Wherein the action of deciding the video code rate is A 1 The action of deciding whether to perform local SR reconstruction is defined as a {1M, 2M, 5M, 10M, 20M } bps 2 And {0, 1}, wherein 1 represents that image super-resolution reconstruction is performed, and 0 represents that image super-resolution reconstruction is not performed. A reward is a reward that a smart agent may obtain from an environment after performing an action and after the action is applied to the environment. In the model, the quality representation model of the video chunk constructed in the foregoing is used to define a reward function, which is specifically represented as:
r k =QoE k
in the formula: QoE k -quality of the kth video chunk.
Since reinforcement learning focuses on the long-term cumulative return obtained for a strategy, the introduction of a discount factor γ may better describe the impact of the reward on the cumulative reward in the time dimension, resulting in a cumulative discount reward, as follows:
Figure BDA0003672574510000161
since there are two decisions, in order to be able to better coordinate the competitive relationship between the two, the strategy considered is represented by θ ═ θ 1 ,θ 2 Parameterized game with two agents, the set of agent policies is pi ═ pi { [ pi ] } 1 ,π 2 Then the expected benefit J (θ) of agent i i )=E[R i ]The gradient of (c) is as follows:
Figure BDA0003672574510000162
wherein the content of the first and second substances,
Figure BDA0003672574510000163
is a centralized action value function, takes the action of two agents plus state information as input, and then outputs the Q value of agent i. To simplify the calculation, s i Is an observed value that includes all agents. And since the goal of both agent optimization is to maximize the overall video playback user quality of experience, both agents here use the same reward (any way of rewarding can be used). Including tuples in an empirical replay buffer<x,x′,a 1 ,a 2 ,r 1 ,r 2 >The experience of all agents is recorded. Centralized action value function Q u i Updated according to the following formula:
Figure BDA0003672574510000164
Figure BDA0003672574510000165
in the formula: μ' — a set of policies for interacting with an environment,
Figure BDA0003672574510000166
the above scheme is implemented using multiple processes, each representing an agent, and each agent representing a decision policy in the environment. Each agent directly interacts with the environment where the system is located through an independent observation value, and a large number of action-state tuples are obtained in a short time. Since each agent has a state concerned by itself, the agent only sees an environment variable (observed value) which affects itself in the action execution process, but the reward acquisition is a global reward brought by the action execution, that is, the exploration result is a change of the whole environment caused by the action made by itself, and the exploration process of each agent is independent of other agents. In the decision network training process, all observed values obtained by the search of the agents are considered. The rewards earned by integrating decisions made by multiple agents may evaluate the effectiveness of the decisions made by the agents.
Each policy network selects an action based on a certain probability, the policy network evaluates the score of the current action based on the actions in the empirical replay buffer, and then the policy network modifies the probability of the selected action based on the score, i.e., updates the action policy. Wherein, the 1D-CNN layer in the policy network comprises 128 filters, and the size of each filter is set to be 4; for a fully connected FC layer, it contains 128 units.
The super-resolution technology applied in the self-adaptive transmission system can reduce the dependence of the system on network bandwidth resources, wherein a code rate decision and a super-resolution reconstruction decision are in a competitive and cooperative relationship, and on one hand, the code rate decision and the super-resolution reconstruction decision are both used for improving the video playing quality, but simultaneously, the code rate decision and the super-resolution reconstruction decision mutually influence the perception of the code rate decision and the super-resolution reconstruction decision on the environment, so that the optimal decision cannot be made. In order to solve the multi-decision problem of code rate decision and super-resolution reconstruction, on the basis of a 360-panorama video code rate self-adaptive transmission method based on super-resolution, a multi-decision reinforcement learning model is introduced to improve the effectiveness and robustness of an algorithm. The decision result obtained by the self-adaptive decision comprises a target code rate and a reconstruction strategy, the video content adaptive to the network state of the player can be obtained through the target code rate, and meanwhile, the player can judge whether the video content needs to be reconstructed according to the reconstruction strategy.
Step S203, repackaging each of the cut videos according to the target code rate to obtain the video content;
a video content without motion limitation can be generated by video DASH conversion, so that each video block area is independent from each other, i.e. each block of the video can be played independently. The video is then partitioned and repackaged into DASH-formatted video content using GPAC. In addition, FFmpeg can be used to re-encode the video into multiple-rate video versions, and the operation of re-encoding the video into multiple-rate video versions in the embodiment of the present invention is arranged before DASH conversion of the video by kvazar, so as to finally obtain video content suitable for the player.
And step S204, sending the video content to the player.
Furthermore, after the video content of the optimal code rate version is obtained through the 360-panorama video code rate adaptive transmission framework, the corresponding video content can be sent to the player, and the code rate is dynamically selected according to network change and a user view angle prediction result by the adaptive algorithm, so that the playing jam can be avoided, and smooth and high-definition watching experience is provided for the user.
According to the scheme, the panoramic video data is partitioned and diced to obtain each diced video, and a network state estimation result of the player is obtained; performing adaptive decision through a multi-decision reinforcement learning model according to the network state estimation result and the user view prediction result to obtain a decision result, wherein the decision result comprises a target code rate and a reconstruction strategy; repackaging each block video according to the target code rate to obtain the video content; and sending the video content to the player. By carrying out code rate self-adaptive transmission on the user visual angle on the basis of user visual angle prediction, the performance of a panoramic video transmission system can be effectively improved, and the watching experience of a user is improved. And (4) applying a deep reinforcement learning model to make a decision on the download code rate and the image super-resolution reconstruction so as to achieve the optimal user experience quality. On the basis of user visual angle prediction, code rate self-adaptive transmission is carried out on the user visual angle, so that the performance of a panoramic video transmission system is effectively improved, and the immersive watching experience of a user on a panoramic video is improved.
Referring to fig. 7, fig. 7 is a flowchart illustrating a video adaptive transmission method according to another exemplary embodiment of the present invention, where the video adaptive transmission method is applied to a player, and the video adaptive transmission method includes:
step A10, receiving the video content and the decision result sent by the server, and determining whether to perform video reconstruction on the video content according to the decision result to obtain a target video.
When the server performs self-adaptive decision-making based on a user view angle prediction result obtained in advance to obtain a decision-making result, the panoramic video data is adjusted according to the decision-making result to obtain video content, the video content and the decision-making result are sent to the player, the player serves as a client to receive the video content and the decision-making result, whether the video content is reconstructed or not is judged according to a reconstruction strategy in the decision-making result, and then a target video is obtained for a user to watch the target video.
Referring to fig. 8, fig. 8 is a schematic flowchart illustrating a specific process of step a10 in the embodiment of fig. 7. This embodiment is based on the embodiment shown in fig. 7, and in this embodiment, the step a10 includes:
step A101, receiving video content and a decision result sent by the server, wherein the decision result comprises a reconstruction policy, the player comprises a first buffer area and a second buffer area, the first buffer area is used for caching the video content, and the second buffer area is used for caching a reconstructed video;
step A102, judging whether video reconstruction needs to be carried out on the video content according to the reconstruction strategy;
step A103, if video reconstruction is needed to be carried out on the video content, carrying out video reconstruction through a pre-trained super-resolution reconstruction model to obtain a reconstructed video, and taking the reconstructed video as the target video;
and A104, if video reconstruction of the video content is not needed, taking the video content as the target video. Judging whether the network state estimation result is lower than a preset threshold value or not;
specifically, after the network state estimation result of the player is obtained, it is required to judge whether the estimation result of the network state is lower than a preset threshold, where the preset threshold may be adjusted according to an actual application situation, that is, when bandwidth resources are seriously insufficient, it is easily caused that video content with high definition code rate cannot be transmitted, the player can only guarantee fluency during video playing, but it is difficult to compensate for the reduction of user viewing experience caused by the reduction of video quality, and thus it is required to judge according to the network state estimation result to determine whether video reconstruction is required through an image super-resolution technology, so as to improve the viewing experience of the user.
Furthermore, in order to maximize the viewing experience of the user, the embodiment of the invention adopts the image super-resolution reconstruction technology, so that the player can reconstruct the high-definition video content by utilizing the computing power of the client when the network bandwidth is insufficient. And if the video reconstruction is judged to be needed according to the reconstruction strategy in the decision result, performing the video reconstruction through the super-resolution model to obtain the target video, and before the target video is obtained, training based on the head motion data set to obtain the super-resolution model.
In the embodiment of the invention, the super-resolution model MDSR is modified to achieve a better effect, the SR model is trained by using 360 panoramic videos (including 195 4K videos) in an open-source head motion data set, the 195 4K videos are used as original videos with the highest resolution, the videos are recoded to generate video data sets with lower resolutions (2K, 1080p, 720p and 480p), the video images are divided into 20 × 10 areas, and each area is used as a video tile (tiles) to train the SR model by using the video tiles.
More specifically, before the super-resolution model is used for video reconstruction, training of the super-resolution model is completed, and the training data used in the training process can be a head motion data set acquired from an open source database, so that 360 panoramic videos in the head motion data set are selected.
Further, after the head motion data set is acquired, a plurality of panoramic videos meeting the preset definition are selected, in the embodiment of the present invention, 195 4K videos are selected as the original videos with the highest resolution, and then the panoramic videos are re-encoded.
Further, after the original panoramic video with the highest resolution is selected, the encoder may re-encode such panoramic video to generate a lower resolution video data set, for example, the lower resolution video data sets including 2K, 1080p, 720p, and 480p are generated in the embodiment of the present invention, and then video slicing may be performed on the lower resolution video data set.
After the lower resolution video data set is generated by the encoder, the video images can be divided into several regions, for example, the video images are divided into 20 × 10 regions, each region serves as a video slice (tiles), and the super-resolution model is trained by using the video slices.
After the video images are divided into regions to obtain the video blocks, the corresponding video blocks with higher resolution can be generated through the video block training super-resolution model with lower resolution, and after the training is finished, the super-resolution model can reconstruct the video with lower resolution into the video with higher resolution under the condition that the bandwidth resources of the player are seriously insufficient, so that the watching experience of a user is effectively improved.
In this embodiment, by receiving the video content and the decision result sent by the server, where the decision result includes a reconstruction policy, the player includes a first buffer and a second buffer, the first buffer is used to cache the video content, and the second buffer is used to cache a reconstructed video; judging whether video reconstruction needs to be carried out on the video content according to the reconstruction strategy; if the video content needs to be subjected to video reconstruction, performing video reconstruction through a pre-trained super-resolution reconstruction model to obtain a reconstructed video, and taking the reconstructed video as the target video; and if video reconstruction of the video content is not needed, taking the video content as the target video. The quality of the panoramic video content is effectively improved through video reconstruction, and therefore the immersive watching experience of a user on the panoramic video is improved. A method for adaptively selecting video block code rate by reinforcement learning is introduced, and a video block with a high resolution version is generated by an image super-resolution technology, so that the quality of a video picture watched by a user and the computing resources and network bandwidth resources of a client are effectively balanced, and the experience quality of the user watching a streaming media video is improved.
In addition, an embodiment of the present invention further provides a video adaptive transmission apparatus, where the video adaptive transmission apparatus includes:
the acquisition module is used for acquiring panoramic video data;
the transmission module is used for carrying out self-adaptive decision-making based on a user visual angle prediction result obtained in advance to obtain a decision-making result, adjusting the panoramic video data according to the decision-making result to obtain video content, and sending the video content and the decision-making result to a player so that the player can determine whether to carry out video reconstruction on the video content according to the decision-making result to obtain a target video.
Referring to fig. 9, fig. 9 is a schematic diagram illustrating a principle of a super-resolution-based panoramic video code rate adaptive transmission method in an embodiment of the present invention, and as shown in fig. 9, the method mainly includes two parts, namely, a user view angle prediction method and a super-resolution-based 360 video transmission method. Finally, a 360 video self-adaptive transmission prototype simulation system verifies that the method can effectively improve the watching experience of the user.
Referring to fig. 10, fig. 10 is a schematic diagram illustrating an adaptive transmission principle of a simulation system according to an embodiment of the present invention, and as shown in fig. 10, the simulation system is extended based on pentive, and supports for 360 panoramic video slicing are added and a super-resolution model SR module is added. The simulation system comprises a server and a client, wherein the server has the main functions of DASH conversion of videos, training of a hyper-division network of the videos, storage of the videos and an SR network and processing of requests from the client. The client adds a user visual angle prediction module and an SR super-resolution module which are adaptive to a 360-degree panoramic video adaptive transmission scheme under an original adaptive transmission framework. Wherein the content downloaded by the client comprises video content watched by the user and the hyper-score model. The 360 video adaptive transmission prototype simulation system can obtain a comparison result more quickly in a short time.
In addition, an embodiment of the present invention further provides a terminal device, where the terminal device includes a memory, a processor, and a video adaptive transmission program that is stored in the memory and is executable on the processor, and when the video adaptive transmission program is executed by the processor, the steps of the video adaptive transmission method described above are implemented.
Since the video adaptive transmission program is executed by the processor, all technical solutions of all the foregoing embodiments are adopted, so that at least all the beneficial effects brought by all the technical solutions of all the foregoing embodiments are achieved, and details are not repeated herein.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where a video adaptive transmission program is stored, and when being executed by a processor, the video adaptive transmission program implements the steps of the video adaptive transmission method as described above.
Since the video adaptive transmission program is executed by the processor, all technical solutions of all the foregoing embodiments are adopted, so that at least all the beneficial effects brought by all the technical solutions of all the foregoing embodiments are achieved, and details are not repeated herein.
Compared with the prior art, the video self-adaptive transmission method, the device, the terminal equipment and the storage medium provided by the embodiment of the invention acquire panoramic video data; the method comprises the steps of carrying out self-adaptive decision making based on a user visual angle prediction result obtained in advance to obtain a decision making result, adjusting panoramic video data according to the decision making result to obtain video content, and sending the video content and the decision making result to a player so that the player can determine whether to carry out video reconstruction on the video content according to the decision making result to obtain a target video. By predicting the user visual angle of the video data in advance, the user preference can be recognized accurately for a long time, and by carrying out self-adaptive decision on the video data based on the user visual angle prediction result, adjusting the panoramic video data according to the decision result to obtain video content, and sending the video content and the decision result to the player, the video quality can be improved while the bandwidth resource is adapted, so that the immersive watching experience of the user on the panoramic video is improved.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, a controlled terminal, or a network device) to execute the method of each embodiment of the present application.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A video adaptive transmission method is applied to a server side, and is characterized by comprising the following steps:
acquiring panoramic video data;
the method comprises the steps of carrying out self-adaptive decision making based on a user visual angle prediction result obtained in advance to obtain a decision making result, adjusting panoramic video data according to the decision making result to obtain video content, and sending the video content and the decision making result to a player so that the player can determine whether to carry out video reconstruction on the video content according to the decision making result to obtain a target video.
2. The method for adaptive video transmission according to claim 1, wherein the step of performing an adaptive decision based on a pre-obtained user view prediction result to obtain a decision result, adjusting the panoramic video data according to the decision result to obtain video content, and sending the video content to a player further comprises:
acquiring a head movement track and a panoramic video image of a user;
coding the head movement track of the user to obtain time characteristic information;
extracting the saliency characteristics of the panoramic video image to obtain user preference characteristics;
and obtaining the user view angle prediction result according to the time characteristic information and the user preference characteristics.
3. The adaptive video transmission method as claimed in claim 2, wherein the step of obtaining the user view prediction result according to the temporal characteristic information and the user preference characteristic comprises:
decoding the time characteristic information through a decoder to obtain a user view angle predicted motion track of the current frame image;
and integrating the user view angle predicted motion trail and the user preference characteristics through a fully connected neural network to obtain a user view angle predicted result of the current frame image.
4. The adaptive video transmission method according to claim 3, wherein the step of obtaining the user view prediction result of the current frame image by integrating the user view prediction motion trajectory with the user preference feature through a fully connected neural network further comprises:
inputting the user view prediction result of the current frame image into the decoder for decoding to obtain the user view prediction motion track of the next frame image, taking the next frame image as the current frame image, and returning to the execution step: and integrating the user view angle predicted motion track and the user preference characteristic through a fully-connected neural network to obtain a user view angle predicted result of the current frame image until all video images are processed.
5. The adaptive video transmission method according to claim 1, wherein the step of performing an adaptive decision based on a pre-obtained user view prediction result to obtain a decision result, adjusting the panoramic video data according to the decision result to obtain video content, and sending the video content to a player comprises:
partitioning and slicing the panoramic video data to obtain each sliced video, and acquiring a network state estimation result of the player;
performing self-adaptive decision making through a multi-decision reinforced learning model according to the network state estimation result and the user view prediction result to obtain a decision making result, wherein the decision making result comprises a target code rate and a reconstruction strategy;
repackaging each of the cut-block videos according to the target code rate to obtain the video content;
and sending the video content to the player.
6. A video adaptive transmission method, wherein the video adaptive transmission method is applied to a player, and the video adaptive transmission method comprises the following steps:
and receiving video content and a decision result sent by a server, and determining whether to carry out video reconstruction on the video content according to the decision result to obtain a target video.
7. The video adaptive transmission method according to claim 6, wherein the step of receiving the video content and the decision result sent by the server and determining whether to perform video reconstruction on the video content according to the decision result to obtain the target video comprises:
receiving video content and a decision result sent by the server, wherein the decision result comprises a reconstruction strategy, the player comprises a first buffer area and a second buffer area, the first buffer area is used for caching the video content, and the second buffer area is used for caching a reconstructed video;
judging whether video reconstruction needs to be carried out on the video content according to the reconstruction strategy;
if the video content needs to be subjected to video reconstruction, performing video reconstruction through a pre-trained super-resolution reconstruction model to obtain a reconstructed video, and taking the reconstructed video as the target video;
and if video reconstruction of the video content is not needed, taking the video content as the target video.
8. A video adaptive transmission apparatus, characterized in that the video adaptive transmission apparatus comprises:
the acquisition module is used for acquiring panoramic video data;
the transmission module is used for carrying out self-adaptive decision based on a user visual angle prediction result obtained in advance to obtain a decision result, adjusting the panoramic video data according to the decision result to obtain video content, and sending the video content and the decision result to a player so that the player can determine whether to carry out video reconstruction on the video content according to the decision result to obtain a target video.
9. A terminal device, characterized in that the terminal device comprises a memory, a processor and a video adaptive transmission program stored on the memory and executable on the processor, the video adaptive transmission program when executed by the processor implementing the steps of the video adaptive transmission method according to any one of claims 1-5 or 6-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a video adaptive transmission program, which when executed by a processor implements the steps of the video adaptive transmission method according to any one of claims 1-5 or 6-7.
CN202210609323.5A 2022-05-31 2022-05-31 Video self-adaptive transmission method, device, terminal equipment and storage medium Active CN115037962B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210609323.5A CN115037962B (en) 2022-05-31 2022-05-31 Video self-adaptive transmission method, device, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210609323.5A CN115037962B (en) 2022-05-31 2022-05-31 Video self-adaptive transmission method, device, terminal equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115037962A true CN115037962A (en) 2022-09-09
CN115037962B CN115037962B (en) 2024-03-12

Family

ID=83123019

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210609323.5A Active CN115037962B (en) 2022-05-31 2022-05-31 Video self-adaptive transmission method, device, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115037962B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115589499A (en) * 2022-10-08 2023-01-10 刘兴 Remote education playing code stream distribution control system and method
CN116708843A (en) * 2023-08-03 2023-09-05 清华大学 User experience quality feedback regulation system in semantic communication process
CN117596376A (en) * 2024-01-18 2024-02-23 深圳大学 360-degree video intelligent edge transmission method, system, wearable device and medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170180800A1 (en) * 2015-09-09 2017-06-22 Vantrix Corporation Method and System for Selective Content Processing Based on a Panoramic Camera and a Virtual-Reality Headset
US20190364204A1 (en) * 2018-05-25 2019-11-28 Microsoft Technology Licensing, Llc Adaptive panoramic video streaming using composite pictures
US10560759B1 (en) * 2018-10-23 2020-02-11 At&T Intellectual Property I, L.P. Active network support on adaptive virtual reality video transmission
CN110827198A (en) * 2019-10-14 2020-02-21 唐山学院 Multi-camera panoramic image construction method based on compressed sensing and super-resolution reconstruction
CN112953922A (en) * 2021-02-03 2021-06-11 西安电子科技大学 Self-adaptive streaming media control method, system, computer equipment and application
CN113313123A (en) * 2021-06-11 2021-08-27 西北工业大学 Semantic inference based glance path prediction method
CN113395505A (en) * 2021-06-21 2021-09-14 河海大学 Panoramic video coding optimization algorithm based on user field of view
CN113573140A (en) * 2021-07-09 2021-10-29 西安交通大学 Code rate self-adaptive decision-making method supporting face detection and real-time super-resolution
CN113905221A (en) * 2021-09-30 2022-01-07 福州大学 Stereo panoramic video asymmetric transmission stream self-adaption method and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170180800A1 (en) * 2015-09-09 2017-06-22 Vantrix Corporation Method and System for Selective Content Processing Based on a Panoramic Camera and a Virtual-Reality Headset
US20190364204A1 (en) * 2018-05-25 2019-11-28 Microsoft Technology Licensing, Llc Adaptive panoramic video streaming using composite pictures
US10560759B1 (en) * 2018-10-23 2020-02-11 At&T Intellectual Property I, L.P. Active network support on adaptive virtual reality video transmission
CN110827198A (en) * 2019-10-14 2020-02-21 唐山学院 Multi-camera panoramic image construction method based on compressed sensing and super-resolution reconstruction
CN112953922A (en) * 2021-02-03 2021-06-11 西安电子科技大学 Self-adaptive streaming media control method, system, computer equipment and application
CN113313123A (en) * 2021-06-11 2021-08-27 西北工业大学 Semantic inference based glance path prediction method
CN113395505A (en) * 2021-06-21 2021-09-14 河海大学 Panoramic video coding optimization algorithm based on user field of view
CN113573140A (en) * 2021-07-09 2021-10-29 西安交通大学 Code rate self-adaptive decision-making method supporting face detection and real-time super-resolution
CN113905221A (en) * 2021-09-30 2022-01-07 福州大学 Stereo panoramic video asymmetric transmission stream self-adaption method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
YONGKAI HUO ET AL.: "Unequal Error Protection Aided Region of Interest Aware Wireless Panoramic Video", 《IEEE ACCESS》, vol. 7, pages 80262 - 80276, XP011732705, DOI: 10.1109/ACCESS.2019.2921880 *
张博文等: "面向三维视频的虚拟视点合成技术研究进展", 《计算机工程与应用》, vol. 57, no. 2, pages 12 - 17 *
李雅茹: "全景视频的压缩及后处理", 《中国学位论文全文数据库》 *
董振等: "虚拟现实视频处理与传输技术", 《电信科学》, no. 08, pages 51 - 58 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115589499A (en) * 2022-10-08 2023-01-10 刘兴 Remote education playing code stream distribution control system and method
CN115589499B (en) * 2022-10-08 2023-09-29 深圳市东恒达智能科技有限公司 Remote education playing code stream distribution control system and method
CN116708843A (en) * 2023-08-03 2023-09-05 清华大学 User experience quality feedback regulation system in semantic communication process
CN116708843B (en) * 2023-08-03 2023-10-31 清华大学 User experience quality feedback regulation system in semantic communication process
CN117596376A (en) * 2024-01-18 2024-02-23 深圳大学 360-degree video intelligent edge transmission method, system, wearable device and medium
CN117596376B (en) * 2024-01-18 2024-04-19 深圳大学 360-Degree video intelligent edge transmission method, system, wearable device and medium

Also Published As

Publication number Publication date
CN115037962B (en) 2024-03-12

Similar Documents

Publication Publication Date Title
CN115037962B (en) Video self-adaptive transmission method, device, terminal equipment and storage medium
Yaqoob et al. A survey on adaptive 360 video streaming: Solutions, challenges and opportunities
Xie et al. 360ProbDASH: Improving QoE of 360 video streaming using tile-based HTTP adaptive streaming
US10666962B2 (en) Training end-to-end video processes
CN108833880B (en) Method and device for predicting viewpoint and realizing optimal transmission of virtual reality video by using cross-user behavior mode
Chiariotti A survey on 360-degree video: Coding, quality of experience and streaming
Zhang et al. Video super-resolution and caching—An edge-assisted adaptive video streaming solution
CN106537923B (en) The technology of adaptive video stream
CN109286855B (en) Panoramic video transmission method, transmission device and transmission system
CN110248212B (en) Multi-user 360-degree video stream server-side code rate self-adaptive transmission method and system
CN113905221B (en) Stereoscopic panoramic video asymmetric transport stream self-adaption method and system
CN107211193A (en) The intelligent adaptive video streaming method and system of sensory experience quality estimation driving
Park et al. Advancing user quality of experience in 360-degree video streaming
US20200404241A1 (en) Processing system for streaming volumetric video to a client device
WO2022000298A1 (en) Reinforcement learning based rate control
Li et al. A super-resolution flexible video coding solution for improving live streaming quality
KR102129115B1 (en) Method and apparatus for transmitting adaptive video in real time using content-aware neural network
CN117596376B (en) 360-Degree video intelligent edge transmission method, system, wearable device and medium
Nguyen et al. Super-resolution based bitrate adaptation for HTTP adaptive streaming for mobile devices
Xie et al. Perceptually optimized quality adaptation of viewport-dependent omnidirectional video streaming
CN112911347B (en) Virtual reality video transmission method, system, server side and client side
CN111277857B (en) Streaming media scheduling method and device
da Mata Liborio Filho et al. Super-resolution with perceptual quality for improved live streaming delivery on edge computing
Khan A Taxonomy for Generative Adversarial Networks in Dynamic Adaptive Streaming Over HTTP
CN114666620B (en) Self-adaptive streaming media method based on visual sensitivity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant