CN110049315A

CN110049315A - A method of improving live video system user Quality of experience

Info

Publication number: CN110049315A
Application number: CN201910343561.4A
Authority: CN
Inventors: 张志才; 付芳
Original assignee: Shanxi University
Current assignee: Shanxi University
Priority date: 2019-04-26
Filing date: 2019-04-26
Publication date: 2019-07-23
Anticipated expiration: 2039-04-26
Also published as: CN110049315B

Abstract

The invention discloses a kind of methods for improving live video system user Quality of experience, the present invention is directed to improve the Quality of experience of user in live video system, reduce time delay while guaranteeing the quality of viewing video.Different from existing literature, obtainable computing resource and radio spectrum resources are all modeled as random process, the more wireless communications environment of approaching to reality by the present invention.It is Markov judging process by Modelling of Dynamic System, since motion space therein and state space are all continuous multidimensional, traditional nitrification enhancement such as depth Q learning network and the Policy-Gradient lower difficulty of efficiency in such issues that processing are larger.In this regard, joint video circulation code strategy, user's scheduling strategy and resource allocation methods of the present invention, propose online Actor-Critic nitrification enhancement, eligibility trace is all introduced in the part Actor and the part Critic, accelerates learning process.Theoretical simulation proves its performance significantly better than depth Q learning network, and convergence rate is also faster than Policy-Gradient algorithm.

Description

A method of improving live video system user Quality of experience

Technical field

The present invention relates to the 5th generation wireless communication technology fields more particularly to a kind of raising live video system user to experience The method of quality.

Background technique

Live video has very big application value in rescue, route guidance and amusement and recreation.Video flowing is according to matter Amount difference can be divided into multiple versions, which version of transmission of video is affected by several factors: the 1. bandwidth of wireless channel environment Limitation；2. user preferences are different；3. the video format that mobile device is supported.It can be incited somebody to action in downlink using Video Transcoding Technology Live video becomes the different several versions of clarity.But but there are some problems in existing live video system.

Problem one: the burden of core net is big and delay problem.It is common at present since the calculation amount of video code conversion is very big Way is the computing capability powerful by cloud computing system, is required according to different users, live video data is unloaded to far Cheng Yunduan is handled, and a video is first transcoded into multiple format and quality beyond the clouds, then pass through internet and core network It is used to user.The problem of doing so is to substantially increase the burden of core net, brings serious time delay.

Problem two: the low efficiency problem of resource optimization.Due to the dynamic characteristic of wireless network, it is desirable to efficient using all Radio spectrum resources and computing resource obtained by network edge just become highly difficult.

Problem three: the one-sidedness problem of user experience quality (Quality ofExperience, QoE) function research.One Aspect, some documents only consider video quality, if D.Wang et al. proposed a kind of adaptive video code conversion frame in 2018, According to the wireless channel conditions of time-varying come Joint regulation transcoding strategy and radio spectrum resources distribution method, it is intended to maximize user QoE；On the other hand, some documents only consider time delay, as Q.He et al. proposed a kind of video calculated based on mist in 2017 Transcoding frame is intended to reduce time delay, and Y.Zhu et al. proposed a kind of cloud edge cooperative system in 2018, comprehensive to use cloud resource With the resource of non-occupied terminal spectators, to reduce cost and time delay.However video quality and time delay are all extremely important for users, User's body is seriously affected although video high definition is often paused when user experience 1. sacrificing the video that time delay brings high quality The amount of checking the quality；2. sacrificing the video that video quality brings low time delay, although video smoothness does not see Chu when user experience, also can Seriously affect user experience quality.

Problem four: the performance difference low efficiency problem of traditional nitrification enhancement.It is one by live video system modelling Markov judging process (Markov Decision Process, MDP), state space therein and motion space are all to connect Continuous multidimensional, poor, the traditional Policy-Gradient algorithm of performances such as traditional value iteration nitrification enhancement such as Q study and SARSA Learning efficiency low convergence extends when causing slowly.When handling continuous space problem, Actor-Critic algorithm is calculated than above-mentioned two class Method is even better to be widely studied, as R.Li et al. updates Actor-Critic algorithm, Y.Wei et al. in proposition single step in 2014 It proposed to maximize system energy efficiency, H.Yang et al. based on the resource allocation methods of Actor-Critic algorithm in 2018 Actor-Critic algorithm is used in Internet of things system in 2019.However, existing for current Actor-Critic algorithm Problem is: only introducing eligibility trace in the part Critic, learning efficiency is lower.

Summary of the invention

To solve the disadvantage that the prior art and deficiency, a kind of side for improving live video system user Quality of experience is provided Method, to improve user experience quality.In cloud auxiliary heterogeneous network, using mobile edge calculations and SDN technology, video is turned Code strategy, user's scheduling strategy and resource allocation carry out combined optimization, and using Actor-Critic algorithm is improved, in Actor Part and the part Critic all carry out multistep update using eligibility trace, to achieve the purpose that improve learning efficiency.

Provided for achieving the object of the present invention it is a kind of improve live video system user Quality of experience method, include

Step 1: it is markov decision process by Modelling of Dynamic System, referred to as MDP, including tetra- parameters of S, A, P, r, State space S therein includes three parts: 1) computing resource obtained by mobile edge calculations server, 2) wireless obtained by Frequency spectrum resource, 3) Signal to Interference plus Noise Ratio of wireless channel downlink；Motion space A therein includes four parts: 1) user dispatches plan Slightly, 2) transcoding strategy, 3) computating resource allocation strategy, 4) radio spectrum resources allocation strategy；P is state transition probability matrix；r It is Reward Program, is obtained by following steps；

Step 2: live video stream is divided into several segments, plays out and handles, each section of play time length It is L, a bit of video flowing under handling while playing a bit of video, handling the required time is denoted as T_n,kIf guaranteeing continuous Broadcasting need to meet T_n,k≤ L, otherwise generates time delay D_n,k；

Step 3: MEC server carries out transcoding processing to video flowing first, goes to from original high-quality video stream lower The video flowing of quality, consumed time are as follows:

Wherein parameter c_n,k=1 indicates that user provides service, c by small base station_n,k=0 expression user provides clothes by macro base station Business,Indicate original video stream,Indicate that user receives by the video flowing after MEC server transcoding, Indicate by video flowing fromVersion is transcoded ontoComputing resource needed for version, f_n,kIt indicates to be taken by the MEC of the SBS of serial number n Business device distributes to the computing resource of k-th of user；Indicate that the calculating that k-th of user is distributed to by the MEC server of MBS provides Source；

Step 4: using the mass function Z of video flowing in step 3 to video stream transcoding_n,k, it is calculated by following formula

Wherein parameterWithIndicate the resolution ratio and bit rate of transcoding rear video stream, q_n,kAnd b_n,kRespectively indicate original The resolution ratio and bit rate of beginning video flowing, ln are natural logrithm oeprators, and ζ and ξ are positive numbers；

Step 5: then by the video after step 3 transcoding by downlink transmission to mobile terminal device, it is consumed Time are as follows:

WhereinIt is the bit rate of step 4 transit code rear video stream,Indicate the size of transcoding rear video stream, The numerical value can be directly obtained by media flow splitter software, B_n,kIt indicates to distribute to k-th of user's by the small base station of serial number n Radio spectrum resources；Indicate the radio spectrum resources that k-th of user is distributed to by macro base station,Table Show accessible momentary rate, is calculated by following formulaIts Middle G_n,kIt indicates that the small base station of serial number n can be supplied to the spectrum efficiency of k-th of user, is calculated by following formulaWherein ρ_n,kIt is that the letter of the downlink from the small base station of serial number n to k-th of user dry is made an uproar Than,It indicates that macro base station can be supplied to the spectrum efficiency of k-th of user, is calculated by following formulaWhereinIt is the Signal to Interference plus Noise Ratio from macro base station to the downlink of k-th of user；

Step 6: mobile terminal device is decoded the video flowing received, and the time consumed by video decoding is denoted asFor a constant；

Step 7: according to step 3, step 5 and step 6 as a result, obtaining total time T in step 2_n,kValue, with equation table It is shown asTime delay D is obtained simultaneously_n,kValue, with following formula meter Calculate D_n,k=T_n,k-L；

Step 8: according to step 4 and step 7 as a result, the Reward Program r for obtaining MDP is as follows

WhereinIt is the price of video flow quality, unit is $, Z_n,kIt is the mass function of video flowing in step 4, υ_n,kIt is The price of time delay, unit are $, D_n,kIt is time delay function in step 7；

Step 9: above-mentioned MDP is solved the problems, such as using the Actor-Critic algorithm of double eligibility traces, first at the beginning of progress parameter Beginningization initializes the eligibility trace rate of decay λ of the part Actor_θ∈ is [0,1) and the eligibility trace rate of decay λ of the part Critic_ω∈ [0,1), initialize the state value function parameter ω of the part policing parameter θ and Critic of the part Actor；Initialize the portion Actor Divide and the eligibility trace vector of the part Critic is null vector, initializes the learning rate α of the part Actor_a,t> 0 and the part Critic Learning rate α_c,t> 0；The maximum value of one the number of iterations is set, starts step 10 and is iterated；

Step 10: in each iteration, a movement being obtained according to the probability-distribution function π of movement (a | s, θ) selection A,Wherein μ (s, θ_μ) it is the mean value being just distributed very much, σ (s, θ_σ) be The mean square deviation being just distributed very much,θ_μAnd θ_σIt is the parameter of estimator, θ=[θ_μ,θ_σ]^T, wherein s table Show current state, θ is current strategies, and the movement is executed under current state, so that it may obtain the reward value of this movement, state It is transformed into next state from current state, and obtains the instant reward value r of NextState_t+1；

Step 11: more new state feature vector φ (s), with a linear estimator come learning state value function V^π(s), estimate Meter method is V^π(s) ≈ V (s, ω)=ω^Tφ (s), wherein ω is the parameter of the state value function of the part Critic, ω^TIt is ω Transposition, in order to accelerate learning process, Actor part and Critic part all using eligibility trace carry out multistep update；

Step 12: updating Timing Difference function δ, update method is δ=r_t+1+γ_ωV(s_t+1,ω)-V(s_t, ω), wherein r_t+1+γ_ωV(s_t+1, ω) be NextState total reward value, γ_ωIt is decay factor numerical value between 0 to 1, V (s_t, ω) and it is to work as Reward value under preceding state；

Step 13: updating the eligibility trace vector z (ω, t) of the part Critic, update method are as follows:

WhereinIt is the gradient of parameter ω, λ_ω∈ [0,1) is decaying ginseng Number, z (ω, t-1) is the eligibility trace vector in the part Critic of the t-1 time slot；

Step 14: updating the parameter ω (t) of state value function, update method is ω (t+1)=ω (t)+α_c,tδz(ω, T), wherein α_c,tIt is the learning rate of the part Critic, meets

Step 15: updating the eligibility trace vector z (θ, t) of the part Actor, update method are as follows:

WhereinIt is the gradient of parameter θ, γ_θλ_θIt is decaying ginseng Number, z (θ, t-1) is the eligibility trace vector in the part Actor of the t-1 time slot；

Step 16: updating the policing parameter θ of next time slot_t+1, update method is θ (t+1)=θ (t)+α_a,tδ z (θ, t),

Wherein α_a,tIt is the learning rate of the part Actor, is a positive number, and meets

Step 17: updating mean μ (s, the θ being just distributed very much in step 15_μ), update method isIt updates Meansquaredeviationσ (s, the θ being just distributed very much in step 15_σ), this value is positive number, and update method isSentence Whether disconnected iteration restrains, or reaches the upper limit of the number of iterations, if not reaching the upper limit of the number of iterations and iteration does not have Convergence, then return step 10 continues iteration, if reaching the upper limit of the number of iterations or iteration has restrained, terminates iteration.

The beneficial effects of the present invention are:

Compared with prior art, present invention has an advantage that

1) present invention provides cloud computing service in the network edge close to mobile subscriber using mobile edge calculations system, Video data can be reduced and carry out transmission back in core net, the data transmission burden of core net is greatly alleviated, by video code conversion Calculating task be unloaded to the edge network near the close base station of user, reduce propagation delay time；

2) user experience quality function, including video quality and time delay are more fully defined, wherein video quality, which is used, divides Resolution and bit rate are measured, time delay as a penalty factor, be defined such that optimization aim definitely --- improve Video quality reduces time delay；

3) present invention improves over Actor-Critic algorithms, all introduce money in the part Actor and the part Critic for the first time Lattice mark carries out multistep update, accelerates convergence to improve learning efficiency, accelerates learning process, achievees the purpose that reduce time delay, It handles on continuous state space and continuous motion space problem, compared with traditional nitrification enhancement, performance is more preferable, receives Hold back speed faster；

4) different from existing literature, the present invention innovatively models obtainable computing resource and radio spectrum resources For random process, the more wireless communications environment of approaching to reality.

A kind of method improving live video system user Quality of experience provided by the invention assists heterogeneous network in cloud In, using mobile edge calculations and SDN technology, video code conversion strategy, user's scheduling strategy and resource allocation combine excellent Change, and using Actor-Critic algorithm is improved, multistep is all carried out more using eligibility trace in the part Actor and the part Critic Newly, user experience quality can be improved, and achieve the purpose that improve learning efficiency.

Detailed description of the invention

Below in conjunction with attached drawing, specific embodiments of the present invention will be described in further detail, in which:

Fig. 1 is the applicable scene figure of present invention invention.

Specific embodiment

As shown in Figure 1, the center of this service area have an individual macro base station (Microcell Base Station, MBS) and many small base stations (Small Base Station, SBS), for carrying out transcoding and the transmission of video flowing, each base station A MEC server is connected, in a wired manner all for providing the service of calculating.Network structure is divided into three layers, uppermost to be Application layer is divided into a variety of quality versions according to the difference of video resolution and bit rate, resolution ratio there are four types of 224p, 360p, The bit rate of 720p and 1080p, corresponding video flowing are followed successively by 400kbps, 1Mbps, 1.5Mbps and 2Mbps.Intermediate one layer is Control layer, including user's distribution, computational resource allocation, radio spectrum resources distribution, resolution ratio and bit rate selection.It is bottom One layer is basic facility layer, including core net, SBS and the MEC server connecting with it, MBS and the MEC service connecting with it Device, the offer end of original video stream and user terminal；All facilities in infrastructure layer are all wirelessly connected to control SDN controller in preparative layer, therefore control layer and infrastructure layer are separation, all wireless connections between them all pass through OpenFlow agreement configures.At the beginning of each time slot, infrastructure layer can send the status information of facility to control Layer, such as: can get radio spectrum resources and computing resource.Based on these status informations, SDN controller is sent to information is controlled Corresponding facility.

Specific embodiment: the service area of a 1km*1km size, center are distributed with a MBS, other positions with Machine is distributed 10 SBS, and each SBS can provide service for multiple users, there is 3 users in the service area of each SBS.MBS and By single antenna transmissions between SBS, the frequency spectrum resource that the occupied frequency spectrum resource of MBS and SBS are distributed is orthogonal.Wireless communication Road model is based on 3GPP standard, and the transimission power from SBS to user's downlink is 50mW, from MBS to user's downlink Transimission power be 20W；Background Noise Power is -174dB.The computing resource obtained by the SBS of serial number n in a time slot The random distribution in [0,5GHz] range, computing resource obtained by mean value 2.5GHz, MBS in [0,100GHz] range with Machine distribution, mean value 50GHz.Radio spectrum resources obtained by the SBS of serial number n are in [0,10MHz] model in a time slot Enclose interior random distribution, radio spectrum resources obtained by mean value 5MHz, MBS random distribution, mean value in [0,20MHz] range For 10MHz.Video is divided into several segments and plays and handle, each section of play time length L=2s.

Initially set up network model, video flow model, mobile edge calculations model, downlink communication model and effectiveness letter Exponential model specifies optimization aim, is MDP by Modelling of Dynamic System, and be achieved by the steps of:

Step 3: moving edge calculations server first and transcoding processing is carried out to video flowing, from original high-quality video stream The video flowing of lower quality is gone to, the consumed time is as follows:

Wherein parameter c_n,k=1 indicates that user provides service, c by small base station_n,k=0 expression user provides clothes by macro base station Business,Indicate original video stream,Indicate that user receives by the video flowing after mobile edge calculations server transcoding,Indicate by video flowing fromVersion is transcoded ontoComputing resource needed for version, f_n,kIt indicates by serial number n The mobile edge calculations server of small base station distribute to the computing resource of k-th of user；Indicate the mobile side by macro base station Edge calculation server distributes to the computing resource of k-th of user；

Step 17: updating mean μ (s, the θ being just distributed very much in step 10_μ), update method isIt updates Meansquaredeviationσ (s, the θ being just distributed very much in step 15_σ), this value is positive number, and update method isSentence Whether disconnected iteration restrains, or reaches the upper limit of the number of iterations, if not reaching the upper limit of the number of iterations and iteration does not have Convergence, then return step 10 continues iteration, if reaching the upper limit of the number of iterations or iteration has restrained, terminates iteration.

Compared with prior art, present invention has an advantage that

Above embodiments are not limited to the technical solution of the embodiment itself, can be incorporated between embodiment new Embodiment.The above embodiments are merely illustrative of the technical solutions of the present invention and is not intended to limit it, all without departing from the present invention Any modification of spirit and scope or equivalent replacement, shall fall within the scope of the technical solution of the present invention.

Claims

1. a kind of method for improving live video system user Quality of experience, it is characterised in that: include

Step 1: it is markov decision process by Modelling of Dynamic System, referred to as MDP, including tetra- parameters of S, A, P, r, wherein State space S include three parts: 1) wireless frequency spectrum obtained by computing resource obtained by mobile edge calculations server, 2) Resource, 3) Signal to Interference plus Noise Ratio of wireless channel downlink；Motion space A therein includes four parts: 1) user's scheduling strategy, 2) Transcoding strategy, 3) computating resource allocation strategy, 4) radio spectrum resources allocation strategy；P is state transition probability matrix；R is back Function is reported, is obtained by following steps；

Step 2: live video stream is divided into several segments, plays out and handles, each section of play time length is L, A bit of video flowing under handling while playing a bit of video, handling the required time is denoted as T_n,kIf guaranteeing continuous play T need to be met_n,k≤ L, otherwise generates time delay D_n,k；

Step 3: moving edge calculations server first and transcoding processing is carried out to video flowing, gone to from original high-quality video stream The video flowing of lower quality, consumed time are as follows:

Wherein parameter c_n,k=1 indicates that user provides service, c by small base station_n,k=0 expression user provides service by macro base station, Indicate original video stream,Indicate that user receives by the video flowing after mobile edge calculations server transcoding,Indicate by video flowing fromVersion is transcoded ontoComputing resource needed for version, f_n,kIt indicates by serial number n The mobile edge calculations server of small base station distribute to the computing resource of k-th of user；Indicate the mobile side by macro base station Edge calculation server distributes to the computing resource of k-th of user；

Wherein parameterWithIndicate the resolution ratio and bit rate of transcoding rear video stream, q_n,kAnd b_n,kRespectively indicate original view The resolution ratio and bit rate of frequency stream, ln is natural logrithm oeprator, and ξ and ξ are positive numbers；

Step 5: and then the video after step 3 transcoding is passed through into downlink transmission to mobile terminal device, consumed time Are as follows:

WhereinIt is the bit rate of step 4 transit code rear video stream,Indicate the size of transcoding rear video stream, the numerical value It can be directly obtained by media flow splitter software, B_n,kIndicate the wireless frequency that k-th of user is distributed to by the small base station of serial number n Spectrum resource；Indicate the radio spectrum resources that k-th of user is distributed to by macro base station,Indicate reachable The momentary rate arrived, is calculated by following formulaWherein G_n,kTable Show that the small base station of serial number n can be supplied to the spectrum efficiency of k-th of user, is calculated by following formulaWherein ρ_n,kIt is that the letter of the downlink from the small base station of serial number n to k-th of user dry is made an uproar Than,It indicates that macro base station can be supplied to the spectrum efficiency of k-th of user, is calculated by following formulaWhereinIt is the Signal to Interference plus Noise Ratio from macro base station to the downlink of k-th of user；

Step 6: mobile terminal device is decoded the video flowing received, and the time consumed by video decoding is denoted asFor One constant；

Step 7: according to step 3, step 5 and step 6 as a result, obtaining total time T in step 2_n,kValue, be expressed as with equationTime delay D is obtained simultaneously_n,kValue, with following formula calculate D_n,k =T_n,k-L；

WhereinIt is the price of video flow quality, unit is $, Z_n,kIt is the mass function of video flowing in step 4, υ_n,kIt is time delay Price, unit is $, D_n,kIt is time delay function in step 7；

Step 9: above-mentioned MDP, first progress parameter initialization are solved the problems, such as using the Actor-Critic algorithm of double eligibility traces, Initialize the eligibility trace rate of decay λ of the part Actor_θ∈ is [0,1) and the eligibility trace rate of decay λ of the part Critic_ω∈[0, 1) the state value function parameter ω of the part policing parameter θ and Critic of the part Actor, is initialized；Initialize Actor part and The eligibility trace vector of the part Critic is null vector, initializes the learning rate α of the part Actor_a,t> 0 and the part Critic Habit rate α_c,t> 0；The maximum value of one the number of iterations is set, starts step 10 and is iterated；

Step 10: in each iteration, a movement a is obtained according to the probability-distribution function π of movement (a | s, θ) selection,Wherein μ (s, θ_μ) it is the mean value being just distributed very much, σ (s, θ_σ) it is just The mean square deviation being distributed very much,θ_μAnd θ_σIt is the parameter of estimator, θ=[θ_μ,θ_σ]^T, wherein s is indicated Current state, θ are current strategies, and the movement is executed under current state, so that it may obtain this movement reward value, state from Current state is transformed into next state, and obtains the instant reward value r of NextState_t+1；

Step 11: more new state feature vector φ (s), with a linear estimator come learning state value function V^π(s), estimation side Method is V^π(s) ≈ V (s, ω)=ω^Tφ (s), wherein ω is the parameter of the state value function of the part Critic, ω^TIt is turning for ω It sets, in order to accelerate learning process, multistep update is all carried out using eligibility trace in the part Actor and the part Critic；

Step 12: updating Timing Difference function δ, update method is δ=r_t+1+γ_ωV(s_t+1,ω)-V(s_t, ω), wherein r_t+1+ γ_ωV(s_t+1, ω) be NextState total reward value, γ_ωIt is decay factor numerical value between 0 to 1, V (s_t, ω) and it is current shape Reward value under state；

WhereinIt is the gradient of parameter ω, λ_ω∈ [0,1) it is attenuation parameter, z (ω, t-1) is the eligibility trace vector in the part Critic of the t-1 time slot；

Step 14: updating the parameter ω (t) of state value function, update method is ω (t+1)=ω (t)+α_c,tδ z (ω, t), Middle α_c,tIt is the learning rate of the part Critic, meets

WhereinIt is the gradient of parameter θ, γ_θλ_θIt is attenuation parameter, z (θ, t-1) is the eligibility trace vector in the part Actor of the t-1 time slot；

Step 17: updating mean μ (s, the θ being just distributed very much in step 10_μ), update method isUpdate step Meansquaredeviationσ (s, the θ being just distributed very much in 15_σ), this value is positive number, and update method isJudgement changes Whether generation restrains, or reaches the upper limit of the number of iterations, if not reaching the upper limit of the number of iterations and iteration does not restrain, Then return step 10 continues iteration, if reaching the upper limit of the number of iterations or iteration has restrained, terminates iteration.