CN114040257A

CN114040257A - Self-adaptive video stream transmission playing method, device, equipment and storage medium

Info

Publication number: CN114040257A
Application number: CN202111421136.6A
Authority: CN
Inventors: 杨俊彦; 王朔遥
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2021-11-26
Filing date: 2021-11-26
Publication date: 2022-02-11
Anticipated expiration: 2041-11-26
Also published as: CN114040257B

Abstract

The embodiment of the application belongs to the cross field of artificial intelligence and high-efficiency data transmission technology, and relates to a self-adaptive video stream transmission playing method, a self-adaptive video stream transmission playing device, computer equipment and a storage medium. According to the method and the device, after the first video segment of the target video is obtained, when the second video segment needs to be obtained, the playing feedback information of the client and the average bandwidth of the past several video segments during downloading are collected, when the next video segment is requested to be downloaded, the code rate of the requested video segment and the decision of whether to trigger the enhancement factor of the video enhancement module of the client are made by using the information and the depth reinforcement learning algorithm based on the A3C framework, the dual self-adaptive capacity of the downloading code rate and the enhancement factor of the terminal video quality is realized, and the video playing experience of a user is greatly improved.

Description

Self-adaptive video stream transmission playing method, device, equipment and storage medium

Technical Field

The present application relates to the field of data transmission technology in artificial intelligence, and in particular, to a method and an apparatus for adaptive video streaming transmission and playback, a computer device, and a storage medium.

Background

In recent years, the increase of network traffic is ever more rapid, and the popularization of intelligent end user equipment and the rapid development of communication technology enable the rapid increase of mobile data traffic in each month. According to the visual network index report promulgated by cisco, it is predicted that monthly mobile data traffic will break through 396 egits by 2022, compared to 122 egits per month mobile traffic in 2017; video traffic will double four by 2022 and will rise from 75% to 82% in the total traffic. Video service providers, such as Tencent video and Dougeli, have introduced their own adaptive bitrate video streaming service to guarantee a desired QoE (Quality of Experience) for users. The purpose of the adaptive video streaming is to dynamically adjust the bit rate of the transmitted video to achieve the QoE maximization.

An existing mainstream Adaptive video Streaming transmission and playing method is a Dynamic Adaptive Streaming (Dynamic Adaptive Streaming over HTTP) QoE improvement system based on DASH (The Dynamic Adaptive Streaming over HTTP). Specifically, by introducing a KKT condition, mathematical methods such as Lagrangian functions and the like are constructed to predict future network bandwidth, and then a proper video code rate is selected by combining the occupation condition of a client buffer area.

In recent years, another adaptive video streaming transmission and playing method has emerged, namely a system based on reinforcement learning is used to improve the QoE, and the terminal computing capability is utilized to a certain extent, so as to improve the video quality through video super-resolution processing.

However, the applicant finds that the existing adaptive video streaming playing method generally has some disadvantages, the traditional adaptive video streaming playing method is very dependent on network bandwidth, and dynamic or poor bandwidth may mislead prediction of future bandwidth, thereby reducing QoE of users. Meanwhile, the traditional adaptive video streaming transmission method does not fully utilize the computing power of the terminal equipment, and the downloaded video is enhanced to further improve the video quality. The adaptive video streaming transmission method considering the computing power of the terminal equipment has the problems of larger quality loss caused by the change of resolution ratio and complex structure, slow reasoning and low calling rate of a super-resolution module of the terminal video. Therefore, the dual self-adaptive video distribution mechanism combined with terminal processing provided by the invention is an important solution for supporting distribution and playing of massive video streams.

Disclosure of Invention

An object of the embodiments of the present application is to provide a method, an apparatus, a computer device, and a storage medium for adaptive video streaming transmission and playback, which can dynamically adjust a video segment download code rate according to a network bandwidth condition, and perform quality enhancement processing on a downloaded video by fully utilizing a limited terminal computing capability, so as to further improve video quality and implement double adaptation of code rate adjustment and terminal quality enhancement processing.

In order to solve the foregoing technical problem, an embodiment of the present application provides a method for adaptive video streaming transmission and playing, which adopts the following technical solutions:

receiving a video playing request carrying a target video identifier;

sending a first video segment downloading request corresponding to the target video identifier to a server according to a wireless uplink;

receiving a first video segment which is sent by the server according to a wireless downlink and corresponds to the target video identification;

according to the playing of the first video segment by a playing module, first feedback information corresponding to the first video segment is obtained;

when a second video segment needs to be obtained, calculating a second video segment code rate decision and an enhancement factor decision corresponding to the second video segment according to the average bandwidth, the first feedback information and a depth reinforcement learning algorithm based on an A3C frame;

sending a second video segment downloading request corresponding to the second video segment code rate decision to the server according to the second video segment code rate decision;

receiving a second video segment sent by the server;

performing enhancement processing operation on the second video segment according to the enhancement factor decision to obtain a second video segment to be played;

and playing the second video segment to be played according to the playing module.

In order to solve the foregoing technical problem, an embodiment of the present application further provides an adaptive video streaming playing apparatus, which adopts the following technical solutions:

the playing request acquisition unit is used for receiving a video playing request carrying a target video identifier;

a first request sending unit, configured to send a first video segment downloading request corresponding to the target video identifier to a server according to a wireless uplink;

the first video segment receiving unit is used for receiving a first video segment which is transmitted by the server according to a wireless downlink and corresponds to the target video identifier;

the first video segment playing unit is used for playing the first video segment according to a playing module and acquiring first feedback information corresponding to the first video segment;

the decision calculation unit is used for calculating a second video segment code rate decision and an enhancement factor decision corresponding to a second video segment according to the average bandwidth, the first feedback information and a depth reinforcement learning algorithm based on an A3C frame when the second video segment needs to be acquired;

a second request sending unit, configured to send, to the server according to the second video segment bitrate decision, a second video segment downloading request corresponding to the second video segment bitrate decision;

the second video segment receiving unit is used for receiving a second video segment sent by the server;

the enhancement processing unit is used for carrying out enhancement processing operation on the second video segment according to the enhancement factor decision to obtain a second video segment to be played;

and the second video segment playing unit is used for playing the second video segment to be played according to the playing module.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

comprising a memory having computer readable instructions stored therein and a processor implementing the steps of the adaptive video streaming playing method as described above when executing the computer readable instructions.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

the computer readable storage medium has stored thereon computer readable instructions which, when executed by a processor, implement the steps of the adaptive video streaming playing method as described above.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:

the application provides a self-adaptive video stream transmission and playing method, which comprises the following steps: receiving a video playing request carrying a target video identifier; sending a first video segment downloading request corresponding to the target video identifier to a server according to a wireless uplink; receiving a first video segment which is sent by the server according to a wireless downlink and corresponds to the target video identification; according to the playing of the first video segment by a playing module, first feedback information corresponding to the first video segment is obtained; when a second video segment needs to be obtained, calculating a second video segment code rate decision and an enhancement factor decision corresponding to the second video segment according to the average bandwidth, the first feedback information and a depth reinforcement learning algorithm based on an A3C frame; sending a second video segment downloading request corresponding to the second video segment code rate decision to the server according to the second video segment code rate decision; receiving a second video segment sent by the server; performing enhancement processing operation on the second video segment according to the enhancement factor decision to obtain a second video segment to be played; and playing the second video segment to be played according to the playing module. According to the method and the device, after the first video segment of the target video is obtained, when the second video segment needs to be obtained, the playing feedback information of the client and the average bandwidth of the past several video segments during downloading are collected, when the next video segment is requested to be downloaded, the code rate of the requested video segment and the decision of whether to trigger the enhancement factor of the video enhancement module of the client are made by using the information and the depth reinforcement learning algorithm based on the A3C framework, the dual self-adaptive capacity of the downloading code rate and the enhancement factor of the terminal video quality is realized, and the video playing experience of a user is greatly improved.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

fig. 2 is a flowchart of an implementation of an adaptive video streaming playing method according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a reinforcement learning algorithm based on the A3C framework according to an embodiment of the present application;

FIG. 4 is a graph illustrating a comparison of performance provided by the first embodiment of the present application;

fig. 5 is a schematic structural diagram of an adaptive video streaming playing apparatus according to a second embodiment of the present application;

FIG. 6 is a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, an exemplary system architecture diagram is shown in which the present application may be applied, and for ease of illustration, only relevant portions of the present application are shown.

In the embodiment of the present application, we consider the most typical adaptive Video Streaming System, i.e. Video-on-Demand Streaming System, for our System. In an on-demand streaming system, a video is compressed from a series of images, and a complete video file is divided into a series of video segments of fixed duration. A video segment is encoded into a plurality of code rate copies at the server side, so that a user can select the most appropriate code rate for each video segment according to the playing statistical data and bandwidth network conditions of the user. Specifically, we encode the video into copies of five different bit rates and store them on the server, the five different bit rates being distributed as: 2Mbps, 2.5Mbps, 3Mbps, 3.5Mbps, and 4 Mbps.

In the embodiment of the present application, fig. 1 illustrates an adaptive video streaming system model designed by us in conjunction with terminal video processing, in which a solid arrow represents a Data Flow (Data Flow) and a dotted arrow represents a Signal Flow (Signal Flow). The model consists of a mobile client and a server. The mobile client comprises five parts, namely a Central Processor (Central Processor), a Download Buffer (Download Buffer), a video Enhancement module (Enhancement), a play Buffer (Playback Buffer) and a player. Because a video enhancement module is integrated in a mobile client, a double-storage structure is required to be designed to respectively store video segments downloaded from a server to the client and video segments processed by the video enhancement module, and a player plays the video segments stored in a cache in sequence. And the central processing unit collects the feedback information of the client playing and the average bandwidth of the last several video segments during downloading, and makes a decision by using the information when requesting to download the next video segment. The decision making by the central processor includes two parts: 1) requesting a code rate of a video segment; 2) and triggering the enhancement factor of the client video enhancement module. The enhancement factor is a binary value, namely when the value is 1, the video enhancement module is called to process the downloaded video segment to improve the quality of the video segment; the video enhancement module will not be called when the value is 0. The server stores multiple rate copies of the video and a corresponding MPD (Media Presentation Description) file. When a user watches a video, the mobile client sends a request signal to the server through the wireless uplink to request to download a video segment with the most appropriate code rate, and the server transmits the video segment requested by the client through the wireless downlink after receiving the request signal.

With continuing reference to fig. 2, a flowchart of an implementation of an adaptive video streaming playing method provided in an embodiment of the present application is shown, and for convenience of description, only the portion related to the present application is shown.

The self-adaptive video stream transmission playing method comprises the following steps:

step S201: receiving a video playing request carrying a target video identifier;

step S202: sending a first video segment downloading request corresponding to the target video identifier to a server according to the wireless uplink;

step S203: receiving a first video segment which is transmitted by a server according to a wireless downlink and corresponds to a target video identifier;

step S204: playing a first video segment according to a playing module, and acquiring first feedback information corresponding to the first video segment;

step S205: when a second video segment needs to be acquired, calculating a second video segment code rate decision and an enhancement factor decision corresponding to the second video segment according to the average bandwidth, the first feedback information and a depth reinforcement learning algorithm based on an A3C frame;

step S206: sending a second video segment downloading request corresponding to the second video segment code rate decision to the server according to the second video segment code rate decision;

step S207: receiving a second video segment sent by the server;

step S208: performing enhancement processing operation on the second video segment according to the enhancement factor decision to obtain a second video segment to be played;

step S209: and playing the second video segment to be played according to the playing module.

In the embodiment of the application, the system is mathematically modeled, and a decision task of a central processing unit is realized by an adaptive video stream policy algorithm based on Deep Reinforcement Learning (Deep Reinforcement Learning), so that the decision of the central processing unit can achieve the double adaptive capacity of a download code rate and a terminal video quality enhancement factor, and is named as ENAVS.

In the embodiment of the application, two buffers exist in the mobile client of the system, and during the playing process of the player, the client downloads the video segment requested to be downloaded into the download buffer. Meanwhile, the video enhancement module extracts the video segments which are completely downloaded from the download buffer for quality enhancement, and pushes the video blocks to the play buffer after the enhancement processing is finished. Thus, a video segment needs to go through three phases from being requested to be downloaded to being finally played on the client player: 1) downloading the video segment; 2) processing a video segment; 3) and playing the video segment. For example, for the video segment with the number being, when the last requested video segment (the video segment with the number being) is downloaded and the video segment stored in the client downloading buffer does not reach the upper limit, the client sends a request signal for downloading the video segment to the server and starts to download the video segment; after the video segment is downloaded and the quality enhancement of the last requested video segment is completed and pushed to the playing buffer, the video segment starts the quality enhancement processing; when the video segment completes the quality enhancement processing and is stored in the playing buffer and the playing of the previous video segment is finished, the video segment starts to be played.

In the present embodiment, we assume that QoE consists of three parts:

1) average video quality: in the adaptive video streaming system, the quality of the video can be represented in two ways, namely directly representing the quality of the video by using the average code rate of the video and directly representing the quality of the video by using the average PSNR of video frames. In the system, as a terminal video enhancement module is introduced, an enhancement algorithm can enhance the downloaded video segment, and the quality of the video is not accurate by directly using the code rate, so that the average quality of the video is represented by using the average PSNR.

2) Average mass difference: generally, a user prefers to have the bitrate change between adjacent video segments as smooth as possible, and a drastic change of bitrate may reduce the user's experience in watching the video. Therefore, we take into account the rate variation between adjacent video blocks as a penalty term.

3) Average rebuffering time: when there are no playable video blocks in the play buffer, the player will enter a rebuffering state, and the video playing will be paused, which may lead to a degraded user experience. Therefore, we take the average rebuffering time into account as another penalty term.

Thus, the QoE equation can be expressed as:

wherein N represents the total number of video segments, a_iRepresenting the decision made on the video segment with the number i, including the download code rate and whether to perform video enhancement processing, R_iIndicating the bitrate of the video segment download with number i,

indicating the time of rebuffering the video segment numbered i before playing.

In the present embodiment, in the semantics of reinforcement learning, the decision maker is called an agent, and the agent keeps interacting with the environment. At each decision time, the agent will select an action driven by a certain policy according to the current state, and after the agent interacts with the environment, the environment returns the benefit of the action and updates the state of the agent. The goal of reinforcement learning is to find an optimal strategy to maximize decision gain. Fig. 3 illustrates a reinforcement learning algorithm structure based on the A3C framework, the algorithm aims to maximize QoE of users by learning to an optimal strategy through a large amount of data. There are two deep neural networks of the same structure in the A3C algorithm: the actor network (actor network) and the judger network (critic network) share the same state as input, the output of the actor network is the learned optimal strategy, and the output of the judger network is the value of the current state. The actor network is intended to select an action based on the current state and current policy, and the critic network is intended to assist the actor network in learning an optimal policy to maximize revenue.

In the system, the central processing unit is equivalent to an agent for reinforcement learning, and the action selected by the agent is the decision that the central processing unit needs to make, i.e. the code rate of downloading the video block and whether the video block is subjected to reinforcement processing at the client, and the benefit of the action is QoE generated by the action, i.e. the benefit generated by requesting to download the video segment with the number i is:

wherein (alpha)₁，α₂，α₃) The coefficients for the various items of the revenue target may represent the user's preference for different targets, e.g., when α₃When the value is large, the tolerance degree of the user to the re-buffering is small, and the re-buffering duration should be reduced as much as possible. When the state of the agent is constructed, we consider the network status and the client playing feedback information in the past period of time, for example, when a video segment with the number i is requested to be downloaded, the state of the agent is:

each term is specifically represented as follows:

·P_ithe progress of video downloading is the proportion of the number of downloaded video segments to the total number;

·

respectively the states of a download buffer, a play buffer, a video quality enhancement module and a player in the client;

·

to the past k₁Average bandwidth of individual video segments as downloaded;

·

to the past k₂Average elapsed time for video enhancement processing;

·R_i-1the downloading code rate of the last video segment;

·{PSNR(a_i) PSNR for all decision options of the current video segment.

Where δ represents the number of video segments pushed to the play buffer after the enhancement process has been completed, k₁，k₂Is two hyper-parameters, representing the history length of the record.

In the embodiment of the present application, referring to fig. 4, a simulation experiment is performed on a real network bandwidth trace to prove the effectiveness of the model proposed by us. The performance comparison method comprises the following steps:

B-DASH (Bandwidth-based DASH): B-DASH predicts the current network conditions by averaging the average network bandwidth over the last 5 video segments downloaded;

GA (Greedy algorithm): the GA introduces a video enhancement module at a terminal, and the video enhancement module uses a greedy strategy and carries out decision based on the PSNR of the maximized current video segment;

pensieve: pensieve is an adaptive video streaming algorithm that learns a streaming strategy without enhancement based on the A3C network.

By introducing a video enhancement module into the mobile terminal equipment, a depth reinforcement learning algorithm based on an A3C framework is proposed. As shown in fig. 3, compared to the above three algorithms, the proposed algorithm achieves 5% -14% improvement in overall performance. The main findings of the experimental results can be summarized as follows:

B-DASH vs. Pensive: the performance of Pensieve exceeds 3% -8% of that of B-DASH, which shows that the adaptive video stream model based on deep reinforcement learning can predict the dynamic bandwidth more accurately, select a more appropriate code rate and provide better experience for users.

Pensieve vs. ENVAS: the ENVAS performance exceeds Pensieve by 5% -7%, and the result shows that the quality of the downloaded video segment can be further improved by the video enhancement module introduced into the mobile terminal.

GA vs. envas: the ENVAS performance exceeds GA 5% -9%, and the result shows that the trained deep reinforcement learning strategy can dynamically adjust the code rate of the downloaded video segment and whether the video segment is enhanced to create higher video quality than a greedy algorithm.

In an embodiment of the present application, a method for adaptive video streaming playing is provided, including: receiving a video playing request carrying a target video identifier; sending a first video segment downloading request corresponding to the target video identifier to a server according to a wireless uplink; receiving a first video segment which is sent by the server according to a wireless downlink and corresponds to the target video identification; according to the playing of the first video segment by a playing module, first feedback information corresponding to the first video segment is obtained; when a second video segment needs to be obtained, calculating a second video segment code rate decision and an enhancement factor decision corresponding to the second video segment according to the average bandwidth, the first feedback information and a depth reinforcement learning algorithm based on an A3C frame; sending a second video segment downloading request corresponding to the second video segment code rate decision to the server according to the second video segment code rate decision; receiving a second video segment sent by the server; performing enhancement processing operation on the second video segment according to the enhancement factor decision to obtain a second video segment to be played; and playing the second video segment to be played according to the playing module. According to the method and the device, after the first video segment of the target video is obtained, when the second video segment needs to be obtained, the playing feedback information of the client and the average bandwidth of the past several video segments during downloading are collected, when the next video segment is requested to be downloaded, the code rate of the requested video segment and the decision of whether to trigger the enhancement factor of the video enhancement module of the client are made by using the information and the depth reinforcement learning algorithm based on the A3C framework, the dual self-adaptive capacity of the downloading code rate and the enhancement factor of the terminal video quality is realized, and the video playing experience of a user is greatly improved.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, can include processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

Example two

With further reference to fig. 5, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an adaptive video streaming playing apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices in particular.

As shown in fig. 5, the adaptive video streaming playing device 100 according to the present embodiment includes: the video playing system comprises a playing request acquisition unit 110, a first request transmitting unit 120, a first video segment receiving unit 130, a first video segment playing unit 140, a decision calculation unit 150, a second request transmitting unit 160, a second video segment receiving unit 170, an enhancement processing unit 180, and a second video segment playing unit 190. Wherein:

a playing request obtaining unit 110, configured to receive a video playing request carrying a target video identifier;

a first request sending unit 120, configured to send a first video segment downloading request corresponding to the target video identifier to a server according to a wireless uplink;

a first video segment receiving unit 130, configured to receive a first video segment corresponding to the target video identifier and sent by the server according to a wireless downlink;

a first video segment playing unit 140, configured to obtain first feedback information corresponding to the first video segment according to the playing of the first video segment by a playing module;

a decision calculating unit 150, configured to calculate, when a second video segment needs to be acquired, a second video segment code rate decision and an enhancement factor decision corresponding to the second video segment according to the average bandwidth, the first feedback information, and a depth-enhanced learning algorithm based on an A3C framework;

a second request sending unit 160, configured to send, to the server according to the second video segment bitrate decision, a second video segment downloading request corresponding to the second video segment bitrate decision;

a second video segment receiving unit 170, configured to receive a second video segment sent by the server;

the enhancement processing unit 180 is configured to perform enhancement processing operation on the second video segment according to the enhancement factor decision to obtain a second video segment to be played;

the second video segment playing unit 190 is configured to play the second video segment to be played according to the playing module.

In the present embodiment, we assume that QoE consists of three parts:

Thus, the QoE equation can be expressed as:

indicating the time of rebuffering the video segment numbered i before playing.

each term is specifically represented as follows:

·P_ifor progress of video download, i.e.The number of downloaded video segments is a proportion of the total number;

·

·c_i-1，c_i-2，...，

to the past k₁Average bandwidth of individual video segments as downloaded;

·

to the past k₂Average elapsed time for video enhancement processing;

·R_i-1the downloading code rate of the last video segment;

·{PSNR(a_i) PSNR for all decision options of the current video segment.

In an embodiment of the present application, an adaptive video streaming playing apparatus 100 is provided, including: a playing request obtaining unit 110, configured to receive a video playing request carrying a target video identifier; a first request sending unit 120, configured to send a first video segment downloading request corresponding to the target video identifier to a server according to a wireless uplink; a first video segment receiving unit 130, configured to receive a first video segment corresponding to the target video identifier and sent by the server according to a wireless downlink; a first video segment playing unit 140, configured to obtain first feedback information corresponding to the first video segment according to the playing of the first video segment by a playing module; a decision calculating unit 150, configured to calculate, when a second video segment needs to be acquired, a second video segment code rate decision and an enhancement factor decision corresponding to the second video segment according to the average bandwidth, the first feedback information, and a depth-enhanced learning algorithm based on an A3C framework; a second request sending unit 160, configured to send, to the server according to the second video segment bitrate decision, a second video segment downloading request corresponding to the second video segment bitrate decision; a second video segment receiving unit 170, configured to receive a second video segment sent by the server; the enhancement processing unit 180 is configured to perform enhancement processing operation on the second video segment according to the enhancement factor decision to obtain a second video segment to be played; the second video segment playing unit 190 is configured to play the second video segment to be played according to the playing module. According to the method and the device, after the first video segment of the target video is obtained, when the second video segment needs to be obtained, the playing feedback information of the client and the average bandwidth of the past several video segments during downloading are collected, when the next video segment is requested to be downloaded, the code rate of the requested video segment and the decision of whether to trigger the enhancement factor of the video enhancement module of the client are made by using the information and the depth reinforcement learning algorithm based on the A3C framework, the dual self-adaptive capacity of the downloading code rate and the enhancement factor of the terminal video quality is realized, and the video playing experience of a user is greatly improved.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 6, fig. 6 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 300 includes a memory 310, a processor 320, and a network interface 330 communicatively coupled to each other via a system bus. It is noted that only computer device 300 having

components

310 and 330 is shown, but it is understood that not all of the shown components are required and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 310 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 310 may be an internal storage unit of the computer device 300, such as a hard disk or a memory of the computer device 300. In other embodiments, the memory 310 may also be an external storage device of the computer device 300, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 300. Of course, the memory 310 may also include both internal and external storage devices of the computer device 300. In this embodiment, the memory 310 is generally used for storing an operating system installed on the computer device 300 and various types of application software, such as computer readable instructions of an adaptive video streaming playing method. In addition, the memory 310 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 320 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 320 is generally operative to control overall operation of the computer device 300. In this embodiment, the processor 320 is configured to execute computer readable instructions or process data stored in the memory 310, for example, execute computer readable instructions of the adaptive video streaming playing method.

The network interface 330 may include a wireless network interface or a wired network interface, and the network interface 330 is generally used to establish a communication connection between the computer device 300 and other electronic devices.

According to the computer equipment, after a first video segment of a target video is obtained, when a second video segment needs to be obtained, feedback information of playing of a client and average bandwidth of downloading of a plurality of past video segments are collected, when a next video segment is requested to be downloaded, the information and a depth reinforcement learning algorithm based on an A3C frame are utilized to make a decision whether to trigger an enhancement factor of a video enhancement module of the client or not, the capacity of double self-adaption of the downloading code rate and the enhancement factor of the video quality of a terminal is achieved, and video playing experience of a user is greatly improved.

The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the adaptive video streaming playing method as described above.

According to the computer-readable storage medium, after a first video segment of a target video is obtained, when a second video segment needs to be obtained, client playing feedback information and average bandwidth of several past video segments during downloading are collected, when a next video segment is requested to be downloaded, the information and a depth reinforcement learning algorithm based on an A3C frame are used for making decisions on the code rate of the requested video segment and whether to trigger an enhancement factor of a client video enhancement module, the dual self-adaption capability of the downloading code rate and a terminal video quality enhancement factor is realized, and the video playing experience of a user is greatly improved.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. An adaptive video streaming playing method, applied to a mobile client, includes the following steps:

receiving a video playing request carrying a target video identifier;

receiving a second video segment sent by the server;

2. The adaptive video streaming playing method according to claim 1, wherein the step of calculating a second video segment bitrate decision and an enhancement factor decision corresponding to the second video segment according to the average bandwidth, the first feedback information and a depth reinforcement learning algorithm based on A3C framework specifically comprises:

PSNR (a) based on a first video segment quality in said first feedback information_i) Code rate disturbance | R_i-R_i-1L, and Khattan time length

Calculating a first video segment user quality of experience (QoE), said QoE being expressed as:

indicating the time of rebuffering before playing the video segment with number i;

decision a in QoE that maximizes score value_iDetermining the second video segment bitrate decision and the enhancement factor decision.

3. The adaptive video streaming playing method according to claim 1, wherein the A3C framework comprises an actor network for outputting an optimal policy and a judger network for outputting a current status value.

4. The adaptive video streaming playing method according to claim 1, wherein the enhancement factor decision includes an enhanced video decision and a non-enhanced video decision, and the step of performing enhancement processing operation on the second video segment according to the enhancement factor decision to obtain a second video segment to be played comprises:

if the enhancement factor decision is an enhancement video decision, calling a video enhancement module to perform video enhancement operation on the second video segment to obtain a second video segment to be played;

and if the enhancement factor decision is a non-enhancement video decision, determining the second video segment as the second video segment to be played.

5. An adaptive video streaming playing device, applied to a mobile client, comprising:

6. The adaptive video streaming playback device according to claim 5, wherein the decision calculation unit includes:

a score calculating subunit for calculating the PSNR (a) according to the first video segment quality in the first feedback information_i) Disturbance | R_i-R_i-1L, and Khattan time length

Calculating a quality of experience score, QoE, for the user of the first video segment, said QoE being expressed as:

a decision determining subunit, configured to determine the decision a in the QoE with the largest score value_iDetermining the second video segment bitrate decision and the enhancement factor decision.

7. The adaptive video streaming playback device of claim 5, wherein the A3C framework comprises an actor network for outputting the most policy and a judger network for outputting the current status value.

8. The adaptive video streaming playing device according to claim 7, wherein the enhancement factor decision comprises an enhanced video decision and a non-enhanced video decision, and the enhancement processing unit comprises:

the first enhancement processing subunit is configured to, if the enhancement factor decision is an enhancement video decision, invoke a video enhancement module to perform video enhancement operation on the second video segment to obtain the second video segment to be played;

and the second enhancement processing subunit is configured to determine the second video segment as the second video segment to be played if the enhancement factor decision is a non-enhancement video decision.

9. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of the adaptive video streaming playback method of any of claims 1 to 6.

10. A computer readable storage medium having computer readable instructions stored thereon, which when executed by a processor implement the steps of the adaptive video streaming playback method according to any of claims 1 to 6.