CN110049315A - A method of improving live video system user Quality of experience - Google Patents

A method of improving live video system user Quality of experience Download PDF

Info

Publication number
CN110049315A
CN110049315A CN201910343561.4A CN201910343561A CN110049315A CN 110049315 A CN110049315 A CN 110049315A CN 201910343561 A CN201910343561 A CN 201910343561A CN 110049315 A CN110049315 A CN 110049315A
Authority
CN
China
Prior art keywords
user
video
critic
parameter
actor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910343561.4A
Other languages
Chinese (zh)
Other versions
CN110049315B (en
Inventor
张志才
付芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi University
Original Assignee
Shanxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi University filed Critical Shanxi University
Priority to CN201910343561.4A priority Critical patent/CN110049315B/en
Publication of CN110049315A publication Critical patent/CN110049315A/en
Application granted granted Critical
Publication of CN110049315B publication Critical patent/CN110049315B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N17/00Diagnosis, testing or measuring for television systems or their details
    • H04N17/004Diagnosis, testing or measuring for television systems or their details for digital television systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/647Control signaling between network components and server or clients; Network processes for video distribution between server and clients, e.g. controlling the quality of the video stream, by dropping packets, protecting content from unauthorised alteration within the network, monitoring of network load, bridging between two different networks, e.g. between IP and wireless
    • H04N21/64784Data processing by the network
    • H04N21/64792Controlling the complexity of the content stream, e.g. by dropping packets

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention discloses a kind of methods for improving live video system user Quality of experience, the present invention is directed to improve the Quality of experience of user in live video system, reduce time delay while guaranteeing the quality of viewing video.Different from existing literature, obtainable computing resource and radio spectrum resources are all modeled as random process, the more wireless communications environment of approaching to reality by the present invention.It is Markov judging process by Modelling of Dynamic System, since motion space therein and state space are all continuous multidimensional, traditional nitrification enhancement such as depth Q learning network and the Policy-Gradient lower difficulty of efficiency in such issues that processing are larger.In this regard, joint video circulation code strategy, user's scheduling strategy and resource allocation methods of the present invention, propose online Actor-Critic nitrification enhancement, eligibility trace is all introduced in the part Actor and the part Critic, accelerates learning process.Theoretical simulation proves its performance significantly better than depth Q learning network, and convergence rate is also faster than Policy-Gradient algorithm.

Description

A method of improving live video system user Quality of experience
Technical field
The present invention relates to the 5th generation wireless communication technology fields more particularly to a kind of raising live video system user to experience The method of quality.
Background technique
Live video has very big application value in rescue, route guidance and amusement and recreation.Video flowing is according to matter Amount difference can be divided into multiple versions, which version of transmission of video is affected by several factors: the 1. bandwidth of wireless channel environment Limitation;2. user preferences are different;3. the video format that mobile device is supported.It can be incited somebody to action in downlink using Video Transcoding Technology Live video becomes the different several versions of clarity.But but there are some problems in existing live video system.
Problem one: the burden of core net is big and delay problem.It is common at present since the calculation amount of video code conversion is very big Way is the computing capability powerful by cloud computing system, is required according to different users, live video data is unloaded to far Cheng Yunduan is handled, and a video is first transcoded into multiple format and quality beyond the clouds, then pass through internet and core network It is used to user.The problem of doing so is to substantially increase the burden of core net, brings serious time delay.
Problem two: the low efficiency problem of resource optimization.Due to the dynamic characteristic of wireless network, it is desirable to efficient using all Radio spectrum resources and computing resource obtained by network edge just become highly difficult.
Problem three: the one-sidedness problem of user experience quality (Quality ofExperience, QoE) function research.One Aspect, some documents only consider video quality, if D.Wang et al. proposed a kind of adaptive video code conversion frame in 2018, According to the wireless channel conditions of time-varying come Joint regulation transcoding strategy and radio spectrum resources distribution method, it is intended to maximize user QoE;On the other hand, some documents only consider time delay, as Q.He et al. proposed a kind of video calculated based on mist in 2017 Transcoding frame is intended to reduce time delay, and Y.Zhu et al. proposed a kind of cloud edge cooperative system in 2018, comprehensive to use cloud resource With the resource of non-occupied terminal spectators, to reduce cost and time delay.However video quality and time delay are all extremely important for users, User's body is seriously affected although video high definition is often paused when user experience 1. sacrificing the video that time delay brings high quality The amount of checking the quality;2. sacrificing the video that video quality brings low time delay, although video smoothness does not see Chu when user experience, also can Seriously affect user experience quality.
Problem four: the performance difference low efficiency problem of traditional nitrification enhancement.It is one by live video system modelling Markov judging process (Markov Decision Process, MDP), state space therein and motion space are all to connect Continuous multidimensional, poor, the traditional Policy-Gradient algorithm of performances such as traditional value iteration nitrification enhancement such as Q study and SARSA Learning efficiency low convergence extends when causing slowly.When handling continuous space problem, Actor-Critic algorithm is calculated than above-mentioned two class Method is even better to be widely studied, as R.Li et al. updates Actor-Critic algorithm, Y.Wei et al. in proposition single step in 2014 It proposed to maximize system energy efficiency, H.Yang et al. based on the resource allocation methods of Actor-Critic algorithm in 2018 Actor-Critic algorithm is used in Internet of things system in 2019.However, existing for current Actor-Critic algorithm Problem is: only introducing eligibility trace in the part Critic, learning efficiency is lower.
Summary of the invention
To solve the disadvantage that the prior art and deficiency, a kind of side for improving live video system user Quality of experience is provided Method, to improve user experience quality.In cloud auxiliary heterogeneous network, using mobile edge calculations and SDN technology, video is turned Code strategy, user's scheduling strategy and resource allocation carry out combined optimization, and using Actor-Critic algorithm is improved, in Actor Part and the part Critic all carry out multistep update using eligibility trace, to achieve the purpose that improve learning efficiency.
Provided for achieving the object of the present invention it is a kind of improve live video system user Quality of experience method, include
Step 1: it is markov decision process by Modelling of Dynamic System, referred to as MDP, including tetra- parameters of S, A, P, r, State space S therein includes three parts: 1) computing resource obtained by mobile edge calculations server, 2) wireless obtained by Frequency spectrum resource, 3) Signal to Interference plus Noise Ratio of wireless channel downlink;Motion space A therein includes four parts: 1) user dispatches plan Slightly, 2) transcoding strategy, 3) computating resource allocation strategy, 4) radio spectrum resources allocation strategy;P is state transition probability matrix;r It is Reward Program, is obtained by following steps;
Step 2: live video stream is divided into several segments, plays out and handles, each section of play time length It is L, a bit of video flowing under handling while playing a bit of video, handling the required time is denoted as Tn,kIf guaranteeing continuous Broadcasting need to meet Tn,k≤ L, otherwise generates time delay Dn,k
Step 3: MEC server carries out transcoding processing to video flowing first, goes to from original high-quality video stream lower The video flowing of quality, consumed time are as follows:
Wherein parameter cn,k=1 indicates that user provides service, c by small base stationn,k=0 expression user provides clothes by macro base station Business,Indicate original video stream,Indicate that user receives by the video flowing after MEC server transcoding, Indicate by video flowing fromVersion is transcoded ontoComputing resource needed for version, fn,kIt indicates to be taken by the MEC of the SBS of serial number n Business device distributes to the computing resource of k-th of user;Indicate that the calculating that k-th of user is distributed to by the MEC server of MBS provides Source;
Step 4: using the mass function Z of video flowing in step 3 to video stream transcodingn,k, it is calculated by following formula
Wherein parameterWithIndicate the resolution ratio and bit rate of transcoding rear video stream, qn,kAnd bn,kRespectively indicate original The resolution ratio and bit rate of beginning video flowing, ln are natural logrithm oeprators, and ζ and ξ are positive numbers;
Step 5: then by the video after step 3 transcoding by downlink transmission to mobile terminal device, it is consumed Time are as follows:
WhereinIt is the bit rate of step 4 transit code rear video stream,Indicate the size of transcoding rear video stream, The numerical value can be directly obtained by media flow splitter software, Bn,kIt indicates to distribute to k-th of user's by the small base station of serial number n Radio spectrum resources;Indicate the radio spectrum resources that k-th of user is distributed to by macro base station,Table Show accessible momentary rate, is calculated by following formulaIts Middle Gn,kIt indicates that the small base station of serial number n can be supplied to the spectrum efficiency of k-th of user, is calculated by following formulaWherein ρn,kIt is that the letter of the downlink from the small base station of serial number n to k-th of user dry is made an uproar Than,It indicates that macro base station can be supplied to the spectrum efficiency of k-th of user, is calculated by following formulaWhereinIt is the Signal to Interference plus Noise Ratio from macro base station to the downlink of k-th of user;
Step 6: mobile terminal device is decoded the video flowing received, and the time consumed by video decoding is denoted asFor a constant;
Step 7: according to step 3, step 5 and step 6 as a result, obtaining total time T in step 2n,kValue, with equation table It is shown asTime delay D is obtained simultaneouslyn,kValue, with following formula meter Calculate Dn,k=Tn,k-L;
Step 8: according to step 4 and step 7 as a result, the Reward Program r for obtaining MDP is as follows
WhereinIt is the price of video flow quality, unit is $, Zn,kIt is the mass function of video flowing in step 4, υn,kIt is The price of time delay, unit are $, Dn,kIt is time delay function in step 7;
Step 9: above-mentioned MDP is solved the problems, such as using the Actor-Critic algorithm of double eligibility traces, first at the beginning of progress parameter Beginningization initializes the eligibility trace rate of decay λ of the part Actorθ∈ is [0,1) and the eligibility trace rate of decay λ of the part Criticω∈ [0,1), initialize the state value function parameter ω of the part policing parameter θ and Critic of the part Actor;Initialize the portion Actor Divide and the eligibility trace vector of the part Critic is null vector, initializes the learning rate α of the part Actora,t> 0 and the part Critic Learning rate αc,t> 0;The maximum value of one the number of iterations is set, starts step 10 and is iterated;
Step 10: in each iteration, a movement being obtained according to the probability-distribution function π of movement (a | s, θ) selection A,Wherein μ (s, θμ) it is the mean value being just distributed very much, σ (s, θσ) be The mean square deviation being just distributed very much,θμAnd θσIt is the parameter of estimator, θ=[θμσ]T, wherein s table Show current state, θ is current strategies, and the movement is executed under current state, so that it may obtain the reward value of this movement, state It is transformed into next state from current state, and obtains the instant reward value r of NextStatet+1
Step 11: more new state feature vector φ (s), with a linear estimator come learning state value function Vπ(s), estimate Meter method is Vπ(s) ≈ V (s, ω)=ωTφ (s), wherein ω is the parameter of the state value function of the part Critic, ωTIt is ω Transposition, in order to accelerate learning process, Actor part and Critic part all using eligibility trace carry out multistep update;
Step 12: updating Timing Difference function δ, update method is δ=rt+1ωV(st+1,ω)-V(st, ω), wherein rt+1ωV(st+1, ω) be NextState total reward value, γωIt is decay factor numerical value between 0 to 1, V (st, ω) and it is to work as Reward value under preceding state;
Step 13: updating the eligibility trace vector z (ω, t) of the part Critic, update method are as follows:
WhereinIt is the gradient of parameter ω, λω∈ [0,1) is decaying ginseng Number, z (ω, t-1) is the eligibility trace vector in the part Critic of the t-1 time slot;
Step 14: updating the parameter ω (t) of state value function, update method is ω (t+1)=ω (t)+αc,tδz(ω, T), wherein αc,tIt is the learning rate of the part Critic, meets
Step 15: updating the eligibility trace vector z (θ, t) of the part Actor, update method are as follows:
WhereinIt is the gradient of parameter θ, γθλθIt is decaying ginseng Number, z (θ, t-1) is the eligibility trace vector in the part Actor of the t-1 time slot;
Step 16: updating the policing parameter θ of next time slott+1, update method is θ (t+1)=θ (t)+αa,tδ z (θ, t),
Wherein αa,tIt is the learning rate of the part Actor, is a positive number, and meets
Step 17: updating mean μ (s, the θ being just distributed very much in step 15μ), update method isIt updates Meansquaredeviationσ (s, the θ being just distributed very much in step 15σ), this value is positive number, and update method isSentence Whether disconnected iteration restrains, or reaches the upper limit of the number of iterations, if not reaching the upper limit of the number of iterations and iteration does not have Convergence, then return step 10 continues iteration, if reaching the upper limit of the number of iterations or iteration has restrained, terminates iteration.
The beneficial effects of the present invention are:
Compared with prior art, present invention has an advantage that
1) present invention provides cloud computing service in the network edge close to mobile subscriber using mobile edge calculations system, Video data can be reduced and carry out transmission back in core net, the data transmission burden of core net is greatly alleviated, by video code conversion Calculating task be unloaded to the edge network near the close base station of user, reduce propagation delay time;
2) user experience quality function, including video quality and time delay are more fully defined, wherein video quality, which is used, divides Resolution and bit rate are measured, time delay as a penalty factor, be defined such that optimization aim definitely --- improve Video quality reduces time delay;
3) present invention improves over Actor-Critic algorithms, all introduce money in the part Actor and the part Critic for the first time Lattice mark carries out multistep update, accelerates convergence to improve learning efficiency, accelerates learning process, achievees the purpose that reduce time delay, It handles on continuous state space and continuous motion space problem, compared with traditional nitrification enhancement, performance is more preferable, receives Hold back speed faster;
4) different from existing literature, the present invention innovatively models obtainable computing resource and radio spectrum resources For random process, the more wireless communications environment of approaching to reality.
A kind of method improving live video system user Quality of experience provided by the invention assists heterogeneous network in cloud In, using mobile edge calculations and SDN technology, video code conversion strategy, user's scheduling strategy and resource allocation combine excellent Change, and using Actor-Critic algorithm is improved, multistep is all carried out more using eligibility trace in the part Actor and the part Critic Newly, user experience quality can be improved, and achieve the purpose that improve learning efficiency.
Detailed description of the invention
Below in conjunction with attached drawing, specific embodiments of the present invention will be described in further detail, in which:
Fig. 1 is the applicable scene figure of present invention invention.
Specific embodiment
As shown in Figure 1, the center of this service area have an individual macro base station (Microcell Base Station, MBS) and many small base stations (Small Base Station, SBS), for carrying out transcoding and the transmission of video flowing, each base station A MEC server is connected, in a wired manner all for providing the service of calculating.Network structure is divided into three layers, uppermost to be Application layer is divided into a variety of quality versions according to the difference of video resolution and bit rate, resolution ratio there are four types of 224p, 360p, The bit rate of 720p and 1080p, corresponding video flowing are followed successively by 400kbps, 1Mbps, 1.5Mbps and 2Mbps.Intermediate one layer is Control layer, including user's distribution, computational resource allocation, radio spectrum resources distribution, resolution ratio and bit rate selection.It is bottom One layer is basic facility layer, including core net, SBS and the MEC server connecting with it, MBS and the MEC service connecting with it Device, the offer end of original video stream and user terminal;All facilities in infrastructure layer are all wirelessly connected to control SDN controller in preparative layer, therefore control layer and infrastructure layer are separation, all wireless connections between them all pass through OpenFlow agreement configures.At the beginning of each time slot, infrastructure layer can send the status information of facility to control Layer, such as: can get radio spectrum resources and computing resource.Based on these status informations, SDN controller is sent to information is controlled Corresponding facility.
Specific embodiment: the service area of a 1km*1km size, center are distributed with a MBS, other positions with Machine is distributed 10 SBS, and each SBS can provide service for multiple users, there is 3 users in the service area of each SBS.MBS and By single antenna transmissions between SBS, the frequency spectrum resource that the occupied frequency spectrum resource of MBS and SBS are distributed is orthogonal.Wireless communication Road model is based on 3GPP standard, and the transimission power from SBS to user's downlink is 50mW, from MBS to user's downlink Transimission power be 20W;Background Noise Power is -174dB.The computing resource obtained by the SBS of serial number n in a time slot The random distribution in [0,5GHz] range, computing resource obtained by mean value 2.5GHz, MBS in [0,100GHz] range with Machine distribution, mean value 50GHz.Radio spectrum resources obtained by the SBS of serial number n are in [0,10MHz] model in a time slot Enclose interior random distribution, radio spectrum resources obtained by mean value 5MHz, MBS random distribution, mean value in [0,20MHz] range For 10MHz.Video is divided into several segments and plays and handle, each section of play time length L=2s.
Initially set up network model, video flow model, mobile edge calculations model, downlink communication model and effectiveness letter Exponential model specifies optimization aim, is MDP by Modelling of Dynamic System, and be achieved by the steps of:
Step 1: it is markov decision process by Modelling of Dynamic System, referred to as MDP, including tetra- parameters of S, A, P, r, State space S therein includes three parts: 1) computing resource obtained by mobile edge calculations server, 2) wireless obtained by Frequency spectrum resource, 3) Signal to Interference plus Noise Ratio of wireless channel downlink;Motion space A therein includes four parts: 1) user dispatches plan Slightly, 2) transcoding strategy, 3) computating resource allocation strategy, 4) radio spectrum resources allocation strategy;P is state transition probability matrix;r It is Reward Program, is obtained by following steps;
Step 2: live video stream is divided into several segments, plays out and handles, each section of play time length It is L, a bit of video flowing under handling while playing a bit of video, handling the required time is denoted as Tn,kIf guaranteeing continuous Broadcasting need to meet Tn,k≤ L, otherwise generates time delay Dn,k
Step 3: moving edge calculations server first and transcoding processing is carried out to video flowing, from original high-quality video stream The video flowing of lower quality is gone to, the consumed time is as follows:
Wherein parameter cn,k=1 indicates that user provides service, c by small base stationn,k=0 expression user provides clothes by macro base station Business,Indicate original video stream,Indicate that user receives by the video flowing after mobile edge calculations server transcoding,Indicate by video flowing fromVersion is transcoded ontoComputing resource needed for version, fn,kIt indicates by serial number n The mobile edge calculations server of small base station distribute to the computing resource of k-th of user;Indicate the mobile side by macro base station Edge calculation server distributes to the computing resource of k-th of user;
Step 4: using the mass function Z of video flowing in step 3 to video stream transcodingn,k, it is calculated by following formula
Wherein parameterWithIndicate the resolution ratio and bit rate of transcoding rear video stream, qn,kAnd bn,kRespectively indicate original The resolution ratio and bit rate of beginning video flowing, ln are natural logrithm oeprators, and ζ and ξ are positive numbers;
Step 5: then by the video after step 3 transcoding by downlink transmission to mobile terminal device, it is consumed Time are as follows:
WhereinIt is the bit rate of step 4 transit code rear video stream,Indicate the size of transcoding rear video stream, The numerical value can be directly obtained by media flow splitter software, Bn,kIt indicates to distribute to k-th of user's by the small base station of serial number n Radio spectrum resources;Indicate the radio spectrum resources that k-th of user is distributed to by macro base station,Table Show accessible momentary rate, is calculated by following formulaIts Middle Gn,kIt indicates that the small base station of serial number n can be supplied to the spectrum efficiency of k-th of user, is calculated by following formulaWherein ρn,kIt is that the letter of the downlink from the small base station of serial number n to k-th of user dry is made an uproar Than,It indicates that macro base station can be supplied to the spectrum efficiency of k-th of user, is calculated by following formulaWhereinIt is the Signal to Interference plus Noise Ratio from macro base station to the downlink of k-th of user;
Step 6: mobile terminal device is decoded the video flowing received, and the time consumed by video decoding is denoted asFor a constant;
Step 7: according to step 3, step 5 and step 6 as a result, obtaining total time T in step 2n,kValue, with equation table It is shown asTime delay D is obtained simultaneouslyn,kValue, with following formula meter Calculate Dn,k=Tn,k-L;
Step 8: according to step 4 and step 7 as a result, the Reward Program r for obtaining MDP is as follows
WhereinIt is the price of video flow quality, unit is $, Zn,kIt is the mass function of video flowing in step 4, υn,kIt is The price of time delay, unit are $, Dn,kIt is time delay function in step 7;
Step 9: above-mentioned MDP is solved the problems, such as using the Actor-Critic algorithm of double eligibility traces, first at the beginning of progress parameter Beginningization initializes the eligibility trace rate of decay λ of the part Actorθ∈ is [0,1) and the eligibility trace rate of decay λ of the part Criticω∈ [0,1), initialize the state value function parameter ω of the part policing parameter θ and Critic of the part Actor;Initialize the portion Actor Divide and the eligibility trace vector of the part Critic is null vector, initializes the learning rate α of the part Actora,t> 0 and the part Critic Learning rate αc,t> 0;The maximum value of one the number of iterations is set, starts step 10 and is iterated;
Step 10: in each iteration, a movement being obtained according to the probability-distribution function π of movement (a | s, θ) selection A,Wherein μ (s, θμ) it is the mean value being just distributed very much, σ (s, θσ) be The mean square deviation being just distributed very much,θμAnd θσIt is the parameter of estimator, θ=[θμσ]T, wherein s table Show current state, θ is current strategies, and the movement is executed under current state, so that it may obtain the reward value of this movement, state It is transformed into next state from current state, and obtains the instant reward value r of NextStatet+1
Step 11: more new state feature vector φ (s), with a linear estimator come learning state value function Vπ(s), estimate Meter method is Vπ(s) ≈ V (s, ω)=ωTφ (s), wherein ω is the parameter of the state value function of the part Critic, ωTIt is ω Transposition, in order to accelerate learning process, Actor part and Critic part all using eligibility trace carry out multistep update;
Step 12: updating Timing Difference function δ, update method is δ=rt+1ωV(st+1,ω)-V(st, ω), wherein rt+1ωV(st+1, ω) be NextState total reward value, γωIt is decay factor numerical value between 0 to 1, V (st, ω) and it is to work as Reward value under preceding state;
Step 13: updating the eligibility trace vector z (ω, t) of the part Critic, update method are as follows:
WhereinIt is the gradient of parameter ω, λω∈ [0,1) is decaying ginseng Number, z (ω, t-1) is the eligibility trace vector in the part Critic of the t-1 time slot;
Step 14: updating the parameter ω (t) of state value function, update method is ω (t+1)=ω (t)+αc,tδz(ω, T), wherein αc,tIt is the learning rate of the part Critic, meets
Step 15: updating the eligibility trace vector z (θ, t) of the part Actor, update method are as follows:
WhereinIt is the gradient of parameter θ, γθλθIt is decaying ginseng Number, z (θ, t-1) is the eligibility trace vector in the part Actor of the t-1 time slot;
Step 16: updating the policing parameter θ of next time slott+1, update method is θ (t+1)=θ (t)+αa,tδ z (θ, t),
Wherein αa,tIt is the learning rate of the part Actor, is a positive number, and meets
Step 17: updating mean μ (s, the θ being just distributed very much in step 10μ), update method isIt updates Meansquaredeviationσ (s, the θ being just distributed very much in step 15σ), this value is positive number, and update method isSentence Whether disconnected iteration restrains, or reaches the upper limit of the number of iterations, if not reaching the upper limit of the number of iterations and iteration does not have Convergence, then return step 10 continues iteration, if reaching the upper limit of the number of iterations or iteration has restrained, terminates iteration.
Compared with prior art, present invention has an advantage that
1) present invention provides cloud computing service in the network edge close to mobile subscriber using mobile edge calculations system, Video data can be reduced and carry out transmission back in core net, the data transmission burden of core net is greatly alleviated, by video code conversion Calculating task be unloaded to the edge network near the close base station of user, reduce propagation delay time;
2) user experience quality function, including video quality and time delay are more fully defined, wherein video quality, which is used, divides Resolution and bit rate are measured, time delay as a penalty factor, be defined such that optimization aim definitely --- improve Video quality reduces time delay;
3) present invention improves over Actor-Critic algorithms, all introduce money in the part Actor and the part Critic for the first time Lattice mark carries out multistep update, accelerates convergence to improve learning efficiency, accelerates learning process, achievees the purpose that reduce time delay, It handles on continuous state space and continuous motion space problem, compared with traditional nitrification enhancement, performance is more preferable, receives Hold back speed faster;
4) different from existing literature, the present invention innovatively models obtainable computing resource and radio spectrum resources For random process, the more wireless communications environment of approaching to reality.
A kind of method improving live video system user Quality of experience provided by the invention assists heterogeneous network in cloud In, using mobile edge calculations and SDN technology, video code conversion strategy, user's scheduling strategy and resource allocation combine excellent Change, and using Actor-Critic algorithm is improved, multistep is all carried out more using eligibility trace in the part Actor and the part Critic Newly, user experience quality can be improved, and achieve the purpose that improve learning efficiency.
Above embodiments are not limited to the technical solution of the embodiment itself, can be incorporated between embodiment new Embodiment.The above embodiments are merely illustrative of the technical solutions of the present invention and is not intended to limit it, all without departing from the present invention Any modification of spirit and scope or equivalent replacement, shall fall within the scope of the technical solution of the present invention.

Claims (1)

1. a kind of method for improving live video system user Quality of experience, it is characterised in that: include
Step 1: it is markov decision process by Modelling of Dynamic System, referred to as MDP, including tetra- parameters of S, A, P, r, wherein State space S include three parts: 1) wireless frequency spectrum obtained by computing resource obtained by mobile edge calculations server, 2) Resource, 3) Signal to Interference plus Noise Ratio of wireless channel downlink;Motion space A therein includes four parts: 1) user's scheduling strategy, 2) Transcoding strategy, 3) computating resource allocation strategy, 4) radio spectrum resources allocation strategy;P is state transition probability matrix;R is back Function is reported, is obtained by following steps;
Step 2: live video stream is divided into several segments, plays out and handles, each section of play time length is L, A bit of video flowing under handling while playing a bit of video, handling the required time is denoted as Tn,kIf guaranteeing continuous play T need to be metn,k≤ L, otherwise generates time delay Dn,k
Step 3: moving edge calculations server first and transcoding processing is carried out to video flowing, gone to from original high-quality video stream The video flowing of lower quality, consumed time are as follows:
Wherein parameter cn,k=1 indicates that user provides service, c by small base stationn,k=0 expression user provides service by macro base station, Indicate original video stream,Indicate that user receives by the video flowing after mobile edge calculations server transcoding,Indicate by video flowing fromVersion is transcoded ontoComputing resource needed for version, fn,kIt indicates by serial number n The mobile edge calculations server of small base station distribute to the computing resource of k-th of user;Indicate the mobile side by macro base station Edge calculation server distributes to the computing resource of k-th of user;
Step 4: using the mass function Z of video flowing in step 3 to video stream transcodingn,k, it is calculated by following formula
Wherein parameterWithIndicate the resolution ratio and bit rate of transcoding rear video stream, qn,kAnd bn,kRespectively indicate original view The resolution ratio and bit rate of frequency stream, ln is natural logrithm oeprator, and ξ and ξ are positive numbers;
Step 5: and then the video after step 3 transcoding is passed through into downlink transmission to mobile terminal device, consumed time Are as follows:
WhereinIt is the bit rate of step 4 transit code rear video stream,Indicate the size of transcoding rear video stream, the numerical value It can be directly obtained by media flow splitter software, Bn,kIndicate the wireless frequency that k-th of user is distributed to by the small base station of serial number n Spectrum resource;Indicate the radio spectrum resources that k-th of user is distributed to by macro base station,Indicate reachable The momentary rate arrived, is calculated by following formulaWherein Gn,kTable Show that the small base station of serial number n can be supplied to the spectrum efficiency of k-th of user, is calculated by following formulaWherein ρn,kIt is that the letter of the downlink from the small base station of serial number n to k-th of user dry is made an uproar Than,It indicates that macro base station can be supplied to the spectrum efficiency of k-th of user, is calculated by following formulaWhereinIt is the Signal to Interference plus Noise Ratio from macro base station to the downlink of k-th of user;
Step 6: mobile terminal device is decoded the video flowing received, and the time consumed by video decoding is denoted asFor One constant;
Step 7: according to step 3, step 5 and step 6 as a result, obtaining total time T in step 2n,kValue, be expressed as with equationTime delay D is obtained simultaneouslyn,kValue, with following formula calculate Dn,k =Tn,k-L;
Step 8: according to step 4 and step 7 as a result, the Reward Program r for obtaining MDP is as follows
WhereinIt is the price of video flow quality, unit is $, Zn,kIt is the mass function of video flowing in step 4, υn,kIt is time delay Price, unit is $, Dn,kIt is time delay function in step 7;
Step 9: above-mentioned MDP, first progress parameter initialization are solved the problems, such as using the Actor-Critic algorithm of double eligibility traces, Initialize the eligibility trace rate of decay λ of the part Actorθ∈ is [0,1) and the eligibility trace rate of decay λ of the part Criticω∈[0, 1) the state value function parameter ω of the part policing parameter θ and Critic of the part Actor, is initialized;Initialize Actor part and The eligibility trace vector of the part Critic is null vector, initializes the learning rate α of the part Actora,t> 0 and the part Critic Habit rate αc,t> 0;The maximum value of one the number of iterations is set, starts step 10 and is iterated;
Step 10: in each iteration, a movement a is obtained according to the probability-distribution function π of movement (a | s, θ) selection,Wherein μ (s, θμ) it is the mean value being just distributed very much, σ (s, θσ) it is just The mean square deviation being distributed very much,θμAnd θσIt is the parameter of estimator, θ=[θμσ]T, wherein s is indicated Current state, θ are current strategies, and the movement is executed under current state, so that it may obtain this movement reward value, state from Current state is transformed into next state, and obtains the instant reward value r of NextStatet+1
Step 11: more new state feature vector φ (s), with a linear estimator come learning state value function Vπ(s), estimation side Method is Vπ(s) ≈ V (s, ω)=ωTφ (s), wherein ω is the parameter of the state value function of the part Critic, ωTIt is turning for ω It sets, in order to accelerate learning process, multistep update is all carried out using eligibility trace in the part Actor and the part Critic;
Step 12: updating Timing Difference function δ, update method is δ=rt+1ωV(st+1,ω)-V(st, ω), wherein rt+1+ γωV(st+1, ω) be NextState total reward value, γωIt is decay factor numerical value between 0 to 1, V (st, ω) and it is current shape Reward value under state;
Step 13: updating the eligibility trace vector z (ω, t) of the part Critic, update method are as follows:
WhereinIt is the gradient of parameter ω, λω∈ [0,1) it is attenuation parameter, z (ω, t-1) is the eligibility trace vector in the part Critic of the t-1 time slot;
Step 14: updating the parameter ω (t) of state value function, update method is ω (t+1)=ω (t)+αc,tδ z (ω, t), Middle αc,tIt is the learning rate of the part Critic, meets
Step 15: updating the eligibility trace vector z (θ, t) of the part Actor, update method are as follows:
WhereinIt is the gradient of parameter θ, γθλθIt is attenuation parameter, z (θ, t-1) is the eligibility trace vector in the part Actor of the t-1 time slot;
Step 16: updating the policing parameter θ of next time slott+1, update method is θ (t+1)=θ (t)+αa,tδ z (θ, t),
Wherein αa,tIt is the learning rate of the part Actor, is a positive number, and meets
Step 17: updating mean μ (s, the θ being just distributed very much in step 10μ), update method isUpdate step Meansquaredeviationσ (s, the θ being just distributed very much in 15σ), this value is positive number, and update method isJudgement changes Whether generation restrains, or reaches the upper limit of the number of iterations, if not reaching the upper limit of the number of iterations and iteration does not restrain, Then return step 10 continues iteration, if reaching the upper limit of the number of iterations or iteration has restrained, terminates iteration.
CN201910343561.4A 2019-04-26 2019-04-26 Method for improving user experience quality of live video system Expired - Fee Related CN110049315B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910343561.4A CN110049315B (en) 2019-04-26 2019-04-26 Method for improving user experience quality of live video system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910343561.4A CN110049315B (en) 2019-04-26 2019-04-26 Method for improving user experience quality of live video system

Publications (2)

Publication Number Publication Date
CN110049315A true CN110049315A (en) 2019-07-23
CN110049315B CN110049315B (en) 2020-04-24

Family

ID=67279613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910343561.4A Expired - Fee Related CN110049315B (en) 2019-04-26 2019-04-26 Method for improving user experience quality of live video system

Country Status (1)

Country Link
CN (1) CN110049315B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111245845A (en) * 2020-01-14 2020-06-05 北京邮电大学 Data processing method based on mobile edge calculation in space-ground heterogeneous network
CN112511197A (en) * 2020-12-01 2021-03-16 南京工业大学 Unmanned aerial vehicle auxiliary elastic video multicast method based on deep reinforcement learning
CN112887314A (en) * 2021-01-27 2021-06-01 重庆邮电大学 Time-delay-sensing cloud and mist cooperative video distribution method
CN113114756A (en) * 2021-04-08 2021-07-13 广西师范大学 Video cache updating method for self-adaptive code rate selection in mobile edge calculation
CN114786137A (en) * 2022-04-21 2022-07-22 重庆邮电大学 Cache-enabled multi-quality video distribution method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007515866A (en) * 2003-11-13 2007-06-14 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Method and apparatus for smoothing the overall quality of video transmitted over a wireless medium
CN103888849A (en) * 2014-04-11 2014-06-25 北京工业大学 Computing and wireless resource cooperative dispatching method in mobile cloud video transmission
CN108307510A (en) * 2018-02-28 2018-07-20 北京科技大学 A kind of power distribution method in isomery subzone network
CN109068391A (en) * 2018-09-27 2018-12-21 青岛智能产业技术研究院 Car networking communication optimization algorithm based on edge calculations and Actor-Critic algorithm
WO2019002465A1 (en) * 2017-06-28 2019-01-03 Deepmind Technologies Limited Training action selection neural networks using apprenticeship

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007515866A (en) * 2003-11-13 2007-06-14 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Method and apparatus for smoothing the overall quality of video transmitted over a wireless medium
CN103888849A (en) * 2014-04-11 2014-06-25 北京工业大学 Computing and wireless resource cooperative dispatching method in mobile cloud video transmission
WO2019002465A1 (en) * 2017-06-28 2019-01-03 Deepmind Technologies Limited Training action selection neural networks using apprenticeship
CN108307510A (en) * 2018-02-28 2018-07-20 北京科技大学 A kind of power distribution method in isomery subzone network
CN109068391A (en) * 2018-09-27 2018-12-21 青岛智能产业技术研究院 Car networking communication optimization algorithm based on edge calculations and Actor-Critic algorithm

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111245845A (en) * 2020-01-14 2020-06-05 北京邮电大学 Data processing method based on mobile edge calculation in space-ground heterogeneous network
CN112511197A (en) * 2020-12-01 2021-03-16 南京工业大学 Unmanned aerial vehicle auxiliary elastic video multicast method based on deep reinforcement learning
CN112887314A (en) * 2021-01-27 2021-06-01 重庆邮电大学 Time-delay-sensing cloud and mist cooperative video distribution method
CN113114756A (en) * 2021-04-08 2021-07-13 广西师范大学 Video cache updating method for self-adaptive code rate selection in mobile edge calculation
CN114786137A (en) * 2022-04-21 2022-07-22 重庆邮电大学 Cache-enabled multi-quality video distribution method
CN114786137B (en) * 2022-04-21 2023-06-20 重庆邮电大学 Cache-enabled multi-quality video distribution method

Also Published As

Publication number Publication date
CN110049315B (en) 2020-04-24

Similar Documents

Publication Publication Date Title
CN110049315A (en) A method of improving live video system user Quality of experience
Luo et al. Adaptive video streaming with edge caching and video transcoding over software-defined mobile networks: A deep reinforcement learning approach
CN109857546A (en) The mobile edge calculations discharging method of multiserver and device based on Lyapunov optimization
CN110531617A (en) Multiple no-manned plane 3D hovering position combined optimization method, device and unmanned plane base station
Ayala-Romero et al. vrAIn: Deep learning based orchestration for computing and radio resources in vRANs
CN111918339A (en) AR task unloading and resource allocation method based on reinforcement learning in mobile edge network
CN114595632A (en) Mobile edge cache optimization method based on federal learning
CN109982434B (en) Wireless resource scheduling integrated intelligent control system and method and wireless communication system
Xu et al. Multi-agent reinforcement learning based distributed transmission in collaborative cloud-edge systems
Huang et al. Utility-oriented resource allocation for 360-degree video transmission over heterogeneous networks
CN110049566A (en) A kind of downlink power distributing method based on multiple no-manned plane secondary communication path
Chen et al. Wireless multiplayer interactive virtual reality game systems with edge computing: Modeling and optimization
CN110233755A (en) The computing resource and frequency spectrum resource allocation method that mist calculates in a kind of Internet of Things
CN114116047A (en) V2I unloading method for vehicle-mounted computation-intensive application based on reinforcement learning
Feng et al. Vabis: Video adaptation bitrate system for time-critical live streaming
Yu et al. User-centric heterogeneous-action deep reinforcement learning for virtual reality in the metaverse over wireless networks
CN113395723A (en) 5G NR downlink scheduling delay optimization system based on reinforcement learning
CN114219094B (en) Communication cost and model robustness optimization method based on multi-task federal learning
CN112887314B (en) Time delay perception cloud and mist cooperative video distribution method
CN110190982B (en) Non-orthogonal multiple access edge computation time and energy consumption optimization based on fair time
CN103796293B (en) A kind of power distribution method under high ferro communication construction
CN114340017A (en) Heterogeneous network resource slicing method with eMBB and URLLC mixed service
CN114786137B (en) Cache-enabled multi-quality video distribution method
Tian et al. Cloud game computing offload based on Multi-Agent Reinforcement Learning
CN103313063B (en) A kind of H.264/AVC video dispatching method based on dual decoding simulation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200424

CF01 Termination of patent right due to non-payment of annual fee