CN110049315A - A method of improving live video system user Quality of experience - Google Patents
A method of improving live video system user Quality of experience Download PDFInfo
- Publication number
- CN110049315A CN110049315A CN201910343561.4A CN201910343561A CN110049315A CN 110049315 A CN110049315 A CN 110049315A CN 201910343561 A CN201910343561 A CN 201910343561A CN 110049315 A CN110049315 A CN 110049315A
- Authority
- CN
- China
- Prior art keywords
- user
- video
- critic
- parameter
- actor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000001228 spectrum Methods 0.000 claims abstract description 29
- 230000033001 locomotion Effects 0.000 claims abstract description 19
- 230000008569 process Effects 0.000 claims abstract description 15
- 238000013468 resource allocation Methods 0.000 claims abstract description 9
- 238000012545 processing Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 28
- 238000004364 calculation method Methods 0.000 claims description 18
- 230000005540 biological transmission Effects 0.000 claims description 10
- 238000005315 distribution function Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 abstract description 6
- 238000004088 simulation Methods 0.000 abstract 1
- 238000006243 chemical reaction Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 241000208340 Araliaceae Species 0.000 description 4
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 4
- 235000003140 Panax quinquefolius Nutrition 0.000 description 4
- 235000008434 ginseng Nutrition 0.000 description 4
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000017105 transposition Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000003595 mist Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N17/00—Diagnosis, testing or measuring for television systems or their details
- H04N17/004—Diagnosis, testing or measuring for television systems or their details for digital television systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/60—Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client
- H04N21/63—Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
- H04N21/647—Control signaling between network components and server or clients; Network processes for video distribution between server and clients, e.g. controlling the quality of the video stream, by dropping packets, protecting content from unauthorised alteration within the network, monitoring of network load, bridging between two different networks, e.g. between IP and wireless
- H04N21/64784—Data processing by the network
- H04N21/64792—Controlling the complexity of the content stream, e.g. by dropping packets
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Mobile Radio Communication Systems (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The invention discloses a kind of methods for improving live video system user Quality of experience, the present invention is directed to improve the Quality of experience of user in live video system, reduce time delay while guaranteeing the quality of viewing video.Different from existing literature, obtainable computing resource and radio spectrum resources are all modeled as random process, the more wireless communications environment of approaching to reality by the present invention.It is Markov judging process by Modelling of Dynamic System, since motion space therein and state space are all continuous multidimensional, traditional nitrification enhancement such as depth Q learning network and the Policy-Gradient lower difficulty of efficiency in such issues that processing are larger.In this regard, joint video circulation code strategy, user's scheduling strategy and resource allocation methods of the present invention, propose online Actor-Critic nitrification enhancement, eligibility trace is all introduced in the part Actor and the part Critic, accelerates learning process.Theoretical simulation proves its performance significantly better than depth Q learning network, and convergence rate is also faster than Policy-Gradient algorithm.
Description
Technical field
The present invention relates to the 5th generation wireless communication technology fields more particularly to a kind of raising live video system user to experience
The method of quality.
Background technique
Live video has very big application value in rescue, route guidance and amusement and recreation.Video flowing is according to matter
Amount difference can be divided into multiple versions, which version of transmission of video is affected by several factors: the 1. bandwidth of wireless channel environment
Limitation;2. user preferences are different;3. the video format that mobile device is supported.It can be incited somebody to action in downlink using Video Transcoding Technology
Live video becomes the different several versions of clarity.But but there are some problems in existing live video system.
Problem one: the burden of core net is big and delay problem.It is common at present since the calculation amount of video code conversion is very big
Way is the computing capability powerful by cloud computing system, is required according to different users, live video data is unloaded to far
Cheng Yunduan is handled, and a video is first transcoded into multiple format and quality beyond the clouds, then pass through internet and core network
It is used to user.The problem of doing so is to substantially increase the burden of core net, brings serious time delay.
Problem two: the low efficiency problem of resource optimization.Due to the dynamic characteristic of wireless network, it is desirable to efficient using all
Radio spectrum resources and computing resource obtained by network edge just become highly difficult.
Problem three: the one-sidedness problem of user experience quality (Quality ofExperience, QoE) function research.One
Aspect, some documents only consider video quality, if D.Wang et al. proposed a kind of adaptive video code conversion frame in 2018,
According to the wireless channel conditions of time-varying come Joint regulation transcoding strategy and radio spectrum resources distribution method, it is intended to maximize user
QoE;On the other hand, some documents only consider time delay, as Q.He et al. proposed a kind of video calculated based on mist in 2017
Transcoding frame is intended to reduce time delay, and Y.Zhu et al. proposed a kind of cloud edge cooperative system in 2018, comprehensive to use cloud resource
With the resource of non-occupied terminal spectators, to reduce cost and time delay.However video quality and time delay are all extremely important for users,
User's body is seriously affected although video high definition is often paused when user experience 1. sacrificing the video that time delay brings high quality
The amount of checking the quality;2. sacrificing the video that video quality brings low time delay, although video smoothness does not see Chu when user experience, also can
Seriously affect user experience quality.
Problem four: the performance difference low efficiency problem of traditional nitrification enhancement.It is one by live video system modelling
Markov judging process (Markov Decision Process, MDP), state space therein and motion space are all to connect
Continuous multidimensional, poor, the traditional Policy-Gradient algorithm of performances such as traditional value iteration nitrification enhancement such as Q study and SARSA
Learning efficiency low convergence extends when causing slowly.When handling continuous space problem, Actor-Critic algorithm is calculated than above-mentioned two class
Method is even better to be widely studied, as R.Li et al. updates Actor-Critic algorithm, Y.Wei et al. in proposition single step in 2014
It proposed to maximize system energy efficiency, H.Yang et al. based on the resource allocation methods of Actor-Critic algorithm in 2018
Actor-Critic algorithm is used in Internet of things system in 2019.However, existing for current Actor-Critic algorithm
Problem is: only introducing eligibility trace in the part Critic, learning efficiency is lower.
Summary of the invention
To solve the disadvantage that the prior art and deficiency, a kind of side for improving live video system user Quality of experience is provided
Method, to improve user experience quality.In cloud auxiliary heterogeneous network, using mobile edge calculations and SDN technology, video is turned
Code strategy, user's scheduling strategy and resource allocation carry out combined optimization, and using Actor-Critic algorithm is improved, in Actor
Part and the part Critic all carry out multistep update using eligibility trace, to achieve the purpose that improve learning efficiency.
Provided for achieving the object of the present invention it is a kind of improve live video system user Quality of experience method, include
Step 1: it is markov decision process by Modelling of Dynamic System, referred to as MDP, including tetra- parameters of S, A, P, r,
State space S therein includes three parts: 1) computing resource obtained by mobile edge calculations server, 2) wireless obtained by
Frequency spectrum resource, 3) Signal to Interference plus Noise Ratio of wireless channel downlink;Motion space A therein includes four parts: 1) user dispatches plan
Slightly, 2) transcoding strategy, 3) computating resource allocation strategy, 4) radio spectrum resources allocation strategy;P is state transition probability matrix;r
It is Reward Program, is obtained by following steps;
Step 2: live video stream is divided into several segments, plays out and handles, each section of play time length
It is L, a bit of video flowing under handling while playing a bit of video, handling the required time is denoted as Tn,kIf guaranteeing continuous
Broadcasting need to meet Tn,k≤ L, otherwise generates time delay Dn,k;
Step 3: MEC server carries out transcoding processing to video flowing first, goes to from original high-quality video stream lower
The video flowing of quality, consumed time are as follows:
Wherein parameter cn,k=1 indicates that user provides service, c by small base stationn,k=0 expression user provides clothes by macro base station
Business,Indicate original video stream,Indicate that user receives by the video flowing after MEC server transcoding,
Indicate by video flowing fromVersion is transcoded ontoComputing resource needed for version, fn,kIt indicates to be taken by the MEC of the SBS of serial number n
Business device distributes to the computing resource of k-th of user;Indicate that the calculating that k-th of user is distributed to by the MEC server of MBS provides
Source;
Step 4: using the mass function Z of video flowing in step 3 to video stream transcodingn,k, it is calculated by following formula
Wherein parameterWithIndicate the resolution ratio and bit rate of transcoding rear video stream, qn,kAnd bn,kRespectively indicate original
The resolution ratio and bit rate of beginning video flowing, ln are natural logrithm oeprators, and ζ and ξ are positive numbers;
Step 5: then by the video after step 3 transcoding by downlink transmission to mobile terminal device, it is consumed
Time are as follows:
WhereinIt is the bit rate of step 4 transit code rear video stream,Indicate the size of transcoding rear video stream,
The numerical value can be directly obtained by media flow splitter software, Bn,kIt indicates to distribute to k-th of user's by the small base station of serial number n
Radio spectrum resources;Indicate the radio spectrum resources that k-th of user is distributed to by macro base station,Table
Show accessible momentary rate, is calculated by following formulaIts
Middle Gn,kIt indicates that the small base station of serial number n can be supplied to the spectrum efficiency of k-th of user, is calculated by following formulaWherein ρn,kIt is that the letter of the downlink from the small base station of serial number n to k-th of user dry is made an uproar
Than,It indicates that macro base station can be supplied to the spectrum efficiency of k-th of user, is calculated by following formulaWhereinIt is the Signal to Interference plus Noise Ratio from macro base station to the downlink of k-th of user;
Step 6: mobile terminal device is decoded the video flowing received, and the time consumed by video decoding is denoted asFor a constant;
Step 7: according to step 3, step 5 and step 6 as a result, obtaining total time T in step 2n,kValue, with equation table
It is shown asTime delay D is obtained simultaneouslyn,kValue, with following formula meter
Calculate Dn,k=Tn,k-L;
Step 8: according to step 4 and step 7 as a result, the Reward Program r for obtaining MDP is as follows
WhereinIt is the price of video flow quality, unit is $, Zn,kIt is the mass function of video flowing in step 4, υn,kIt is
The price of time delay, unit are $, Dn,kIt is time delay function in step 7;
Step 9: above-mentioned MDP is solved the problems, such as using the Actor-Critic algorithm of double eligibility traces, first at the beginning of progress parameter
Beginningization initializes the eligibility trace rate of decay λ of the part Actorθ∈ is [0,1) and the eligibility trace rate of decay λ of the part Criticω∈
[0,1), initialize the state value function parameter ω of the part policing parameter θ and Critic of the part Actor;Initialize the portion Actor
Divide and the eligibility trace vector of the part Critic is null vector, initializes the learning rate α of the part Actora,t> 0 and the part Critic
Learning rate αc,t> 0;The maximum value of one the number of iterations is set, starts step 10 and is iterated;
Step 10: in each iteration, a movement being obtained according to the probability-distribution function π of movement (a | s, θ) selection
A,Wherein μ (s, θμ) it is the mean value being just distributed very much, σ (s, θσ) be
The mean square deviation being just distributed very much,θμAnd θσIt is the parameter of estimator, θ=[θμ,θσ]T, wherein s table
Show current state, θ is current strategies, and the movement is executed under current state, so that it may obtain the reward value of this movement, state
It is transformed into next state from current state, and obtains the instant reward value r of NextStatet+1;
Step 11: more new state feature vector φ (s), with a linear estimator come learning state value function Vπ(s), estimate
Meter method is Vπ(s) ≈ V (s, ω)=ωTφ (s), wherein ω is the parameter of the state value function of the part Critic, ωTIt is ω
Transposition, in order to accelerate learning process, Actor part and Critic part all using eligibility trace carry out multistep update;
Step 12: updating Timing Difference function δ, update method is δ=rt+1+γωV(st+1,ω)-V(st, ω), wherein
rt+1+γωV(st+1, ω) be NextState total reward value, γωIt is decay factor numerical value between 0 to 1, V (st, ω) and it is to work as
Reward value under preceding state;
Step 13: updating the eligibility trace vector z (ω, t) of the part Critic, update method are as follows:
WhereinIt is the gradient of parameter ω, λω∈ [0,1) is decaying ginseng
Number, z (ω, t-1) is the eligibility trace vector in the part Critic of the t-1 time slot;
Step 14: updating the parameter ω (t) of state value function, update method is ω (t+1)=ω (t)+αc,tδz(ω,
T), wherein αc,tIt is the learning rate of the part Critic, meets
Step 15: updating the eligibility trace vector z (θ, t) of the part Actor, update method are as follows:
WhereinIt is the gradient of parameter θ, γθλθIt is decaying ginseng
Number, z (θ, t-1) is the eligibility trace vector in the part Actor of the t-1 time slot;
Step 16: updating the policing parameter θ of next time slott+1, update method is θ (t+1)=θ (t)+αa,tδ z (θ, t),
Wherein αa,tIt is the learning rate of the part Actor, is a positive number, and meets
Step 17: updating mean μ (s, the θ being just distributed very much in step 15μ), update method isIt updates
Meansquaredeviationσ (s, the θ being just distributed very much in step 15σ), this value is positive number, and update method isSentence
Whether disconnected iteration restrains, or reaches the upper limit of the number of iterations, if not reaching the upper limit of the number of iterations and iteration does not have
Convergence, then return step 10 continues iteration, if reaching the upper limit of the number of iterations or iteration has restrained, terminates iteration.
The beneficial effects of the present invention are:
Compared with prior art, present invention has an advantage that
1) present invention provides cloud computing service in the network edge close to mobile subscriber using mobile edge calculations system,
Video data can be reduced and carry out transmission back in core net, the data transmission burden of core net is greatly alleviated, by video code conversion
Calculating task be unloaded to the edge network near the close base station of user, reduce propagation delay time;
2) user experience quality function, including video quality and time delay are more fully defined, wherein video quality, which is used, divides
Resolution and bit rate are measured, time delay as a penalty factor, be defined such that optimization aim definitely --- improve
Video quality reduces time delay;
3) present invention improves over Actor-Critic algorithms, all introduce money in the part Actor and the part Critic for the first time
Lattice mark carries out multistep update, accelerates convergence to improve learning efficiency, accelerates learning process, achievees the purpose that reduce time delay,
It handles on continuous state space and continuous motion space problem, compared with traditional nitrification enhancement, performance is more preferable, receives
Hold back speed faster;
4) different from existing literature, the present invention innovatively models obtainable computing resource and radio spectrum resources
For random process, the more wireless communications environment of approaching to reality.
A kind of method improving live video system user Quality of experience provided by the invention assists heterogeneous network in cloud
In, using mobile edge calculations and SDN technology, video code conversion strategy, user's scheduling strategy and resource allocation combine excellent
Change, and using Actor-Critic algorithm is improved, multistep is all carried out more using eligibility trace in the part Actor and the part Critic
Newly, user experience quality can be improved, and achieve the purpose that improve learning efficiency.
Detailed description of the invention
Below in conjunction with attached drawing, specific embodiments of the present invention will be described in further detail, in which:
Fig. 1 is the applicable scene figure of present invention invention.
Specific embodiment
As shown in Figure 1, the center of this service area have an individual macro base station (Microcell Base Station,
MBS) and many small base stations (Small Base Station, SBS), for carrying out transcoding and the transmission of video flowing, each base station
A MEC server is connected, in a wired manner all for providing the service of calculating.Network structure is divided into three layers, uppermost to be
Application layer is divided into a variety of quality versions according to the difference of video resolution and bit rate, resolution ratio there are four types of 224p, 360p,
The bit rate of 720p and 1080p, corresponding video flowing are followed successively by 400kbps, 1Mbps, 1.5Mbps and 2Mbps.Intermediate one layer is
Control layer, including user's distribution, computational resource allocation, radio spectrum resources distribution, resolution ratio and bit rate selection.It is bottom
One layer is basic facility layer, including core net, SBS and the MEC server connecting with it, MBS and the MEC service connecting with it
Device, the offer end of original video stream and user terminal;All facilities in infrastructure layer are all wirelessly connected to control
SDN controller in preparative layer, therefore control layer and infrastructure layer are separation, all wireless connections between them all pass through
OpenFlow agreement configures.At the beginning of each time slot, infrastructure layer can send the status information of facility to control
Layer, such as: can get radio spectrum resources and computing resource.Based on these status informations, SDN controller is sent to information is controlled
Corresponding facility.
Specific embodiment: the service area of a 1km*1km size, center are distributed with a MBS, other positions with
Machine is distributed 10 SBS, and each SBS can provide service for multiple users, there is 3 users in the service area of each SBS.MBS and
By single antenna transmissions between SBS, the frequency spectrum resource that the occupied frequency spectrum resource of MBS and SBS are distributed is orthogonal.Wireless communication
Road model is based on 3GPP standard, and the transimission power from SBS to user's downlink is 50mW, from MBS to user's downlink
Transimission power be 20W;Background Noise Power is -174dB.The computing resource obtained by the SBS of serial number n in a time slot
The random distribution in [0,5GHz] range, computing resource obtained by mean value 2.5GHz, MBS in [0,100GHz] range with
Machine distribution, mean value 50GHz.Radio spectrum resources obtained by the SBS of serial number n are in [0,10MHz] model in a time slot
Enclose interior random distribution, radio spectrum resources obtained by mean value 5MHz, MBS random distribution, mean value in [0,20MHz] range
For 10MHz.Video is divided into several segments and plays and handle, each section of play time length L=2s.
Initially set up network model, video flow model, mobile edge calculations model, downlink communication model and effectiveness letter
Exponential model specifies optimization aim, is MDP by Modelling of Dynamic System, and be achieved by the steps of:
Step 1: it is markov decision process by Modelling of Dynamic System, referred to as MDP, including tetra- parameters of S, A, P, r,
State space S therein includes three parts: 1) computing resource obtained by mobile edge calculations server, 2) wireless obtained by
Frequency spectrum resource, 3) Signal to Interference plus Noise Ratio of wireless channel downlink;Motion space A therein includes four parts: 1) user dispatches plan
Slightly, 2) transcoding strategy, 3) computating resource allocation strategy, 4) radio spectrum resources allocation strategy;P is state transition probability matrix;r
It is Reward Program, is obtained by following steps;
Step 2: live video stream is divided into several segments, plays out and handles, each section of play time length
It is L, a bit of video flowing under handling while playing a bit of video, handling the required time is denoted as Tn,kIf guaranteeing continuous
Broadcasting need to meet Tn,k≤ L, otherwise generates time delay Dn,k;
Step 3: moving edge calculations server first and transcoding processing is carried out to video flowing, from original high-quality video stream
The video flowing of lower quality is gone to, the consumed time is as follows:
Wherein parameter cn,k=1 indicates that user provides service, c by small base stationn,k=0 expression user provides clothes by macro base station
Business,Indicate original video stream,Indicate that user receives by the video flowing after mobile edge calculations server transcoding,Indicate by video flowing fromVersion is transcoded ontoComputing resource needed for version, fn,kIt indicates by serial number n
The mobile edge calculations server of small base station distribute to the computing resource of k-th of user;Indicate the mobile side by macro base station
Edge calculation server distributes to the computing resource of k-th of user;
Step 4: using the mass function Z of video flowing in step 3 to video stream transcodingn,k, it is calculated by following formula
Wherein parameterWithIndicate the resolution ratio and bit rate of transcoding rear video stream, qn,kAnd bn,kRespectively indicate original
The resolution ratio and bit rate of beginning video flowing, ln are natural logrithm oeprators, and ζ and ξ are positive numbers;
Step 5: then by the video after step 3 transcoding by downlink transmission to mobile terminal device, it is consumed
Time are as follows:
WhereinIt is the bit rate of step 4 transit code rear video stream,Indicate the size of transcoding rear video stream,
The numerical value can be directly obtained by media flow splitter software, Bn,kIt indicates to distribute to k-th of user's by the small base station of serial number n
Radio spectrum resources;Indicate the radio spectrum resources that k-th of user is distributed to by macro base station,Table
Show accessible momentary rate, is calculated by following formulaIts
Middle Gn,kIt indicates that the small base station of serial number n can be supplied to the spectrum efficiency of k-th of user, is calculated by following formulaWherein ρn,kIt is that the letter of the downlink from the small base station of serial number n to k-th of user dry is made an uproar
Than,It indicates that macro base station can be supplied to the spectrum efficiency of k-th of user, is calculated by following formulaWhereinIt is the Signal to Interference plus Noise Ratio from macro base station to the downlink of k-th of user;
Step 6: mobile terminal device is decoded the video flowing received, and the time consumed by video decoding is denoted asFor a constant;
Step 7: according to step 3, step 5 and step 6 as a result, obtaining total time T in step 2n,kValue, with equation table
It is shown asTime delay D is obtained simultaneouslyn,kValue, with following formula meter
Calculate Dn,k=Tn,k-L;
Step 8: according to step 4 and step 7 as a result, the Reward Program r for obtaining MDP is as follows
WhereinIt is the price of video flow quality, unit is $, Zn,kIt is the mass function of video flowing in step 4, υn,kIt is
The price of time delay, unit are $, Dn,kIt is time delay function in step 7;
Step 9: above-mentioned MDP is solved the problems, such as using the Actor-Critic algorithm of double eligibility traces, first at the beginning of progress parameter
Beginningization initializes the eligibility trace rate of decay λ of the part Actorθ∈ is [0,1) and the eligibility trace rate of decay λ of the part Criticω∈
[0,1), initialize the state value function parameter ω of the part policing parameter θ and Critic of the part Actor;Initialize the portion Actor
Divide and the eligibility trace vector of the part Critic is null vector, initializes the learning rate α of the part Actora,t> 0 and the part Critic
Learning rate αc,t> 0;The maximum value of one the number of iterations is set, starts step 10 and is iterated;
Step 10: in each iteration, a movement being obtained according to the probability-distribution function π of movement (a | s, θ) selection
A,Wherein μ (s, θμ) it is the mean value being just distributed very much, σ (s, θσ) be
The mean square deviation being just distributed very much,θμAnd θσIt is the parameter of estimator, θ=[θμ,θσ]T, wherein s table
Show current state, θ is current strategies, and the movement is executed under current state, so that it may obtain the reward value of this movement, state
It is transformed into next state from current state, and obtains the instant reward value r of NextStatet+1;
Step 11: more new state feature vector φ (s), with a linear estimator come learning state value function Vπ(s), estimate
Meter method is Vπ(s) ≈ V (s, ω)=ωTφ (s), wherein ω is the parameter of the state value function of the part Critic, ωTIt is ω
Transposition, in order to accelerate learning process, Actor part and Critic part all using eligibility trace carry out multistep update;
Step 12: updating Timing Difference function δ, update method is δ=rt+1+γωV(st+1,ω)-V(st, ω), wherein
rt+1+γωV(st+1, ω) be NextState total reward value, γωIt is decay factor numerical value between 0 to 1, V (st, ω) and it is to work as
Reward value under preceding state;
Step 13: updating the eligibility trace vector z (ω, t) of the part Critic, update method are as follows:
WhereinIt is the gradient of parameter ω, λω∈ [0,1) is decaying ginseng
Number, z (ω, t-1) is the eligibility trace vector in the part Critic of the t-1 time slot;
Step 14: updating the parameter ω (t) of state value function, update method is ω (t+1)=ω (t)+αc,tδz(ω,
T), wherein αc,tIt is the learning rate of the part Critic, meets
Step 15: updating the eligibility trace vector z (θ, t) of the part Actor, update method are as follows:
WhereinIt is the gradient of parameter θ, γθλθIt is decaying ginseng
Number, z (θ, t-1) is the eligibility trace vector in the part Actor of the t-1 time slot;
Step 16: updating the policing parameter θ of next time slott+1, update method is θ (t+1)=θ (t)+αa,tδ z (θ, t),
Wherein αa,tIt is the learning rate of the part Actor, is a positive number, and meets
Step 17: updating mean μ (s, the θ being just distributed very much in step 10μ), update method isIt updates
Meansquaredeviationσ (s, the θ being just distributed very much in step 15σ), this value is positive number, and update method isSentence
Whether disconnected iteration restrains, or reaches the upper limit of the number of iterations, if not reaching the upper limit of the number of iterations and iteration does not have
Convergence, then return step 10 continues iteration, if reaching the upper limit of the number of iterations or iteration has restrained, terminates iteration.
Compared with prior art, present invention has an advantage that
1) present invention provides cloud computing service in the network edge close to mobile subscriber using mobile edge calculations system,
Video data can be reduced and carry out transmission back in core net, the data transmission burden of core net is greatly alleviated, by video code conversion
Calculating task be unloaded to the edge network near the close base station of user, reduce propagation delay time;
2) user experience quality function, including video quality and time delay are more fully defined, wherein video quality, which is used, divides
Resolution and bit rate are measured, time delay as a penalty factor, be defined such that optimization aim definitely --- improve
Video quality reduces time delay;
3) present invention improves over Actor-Critic algorithms, all introduce money in the part Actor and the part Critic for the first time
Lattice mark carries out multistep update, accelerates convergence to improve learning efficiency, accelerates learning process, achievees the purpose that reduce time delay,
It handles on continuous state space and continuous motion space problem, compared with traditional nitrification enhancement, performance is more preferable, receives
Hold back speed faster;
4) different from existing literature, the present invention innovatively models obtainable computing resource and radio spectrum resources
For random process, the more wireless communications environment of approaching to reality.
A kind of method improving live video system user Quality of experience provided by the invention assists heterogeneous network in cloud
In, using mobile edge calculations and SDN technology, video code conversion strategy, user's scheduling strategy and resource allocation combine excellent
Change, and using Actor-Critic algorithm is improved, multistep is all carried out more using eligibility trace in the part Actor and the part Critic
Newly, user experience quality can be improved, and achieve the purpose that improve learning efficiency.
Above embodiments are not limited to the technical solution of the embodiment itself, can be incorporated between embodiment new
Embodiment.The above embodiments are merely illustrative of the technical solutions of the present invention and is not intended to limit it, all without departing from the present invention
Any modification of spirit and scope or equivalent replacement, shall fall within the scope of the technical solution of the present invention.
Claims (1)
1. a kind of method for improving live video system user Quality of experience, it is characterised in that: include
Step 1: it is markov decision process by Modelling of Dynamic System, referred to as MDP, including tetra- parameters of S, A, P, r, wherein
State space S include three parts: 1) wireless frequency spectrum obtained by computing resource obtained by mobile edge calculations server, 2)
Resource, 3) Signal to Interference plus Noise Ratio of wireless channel downlink;Motion space A therein includes four parts: 1) user's scheduling strategy, 2)
Transcoding strategy, 3) computating resource allocation strategy, 4) radio spectrum resources allocation strategy;P is state transition probability matrix;R is back
Function is reported, is obtained by following steps;
Step 2: live video stream is divided into several segments, plays out and handles, each section of play time length is L,
A bit of video flowing under handling while playing a bit of video, handling the required time is denoted as Tn,kIf guaranteeing continuous play
T need to be metn,k≤ L, otherwise generates time delay Dn,k;
Step 3: moving edge calculations server first and transcoding processing is carried out to video flowing, gone to from original high-quality video stream
The video flowing of lower quality, consumed time are as follows:
Wherein parameter cn,k=1 indicates that user provides service, c by small base stationn,k=0 expression user provides service by macro base station,
Indicate original video stream,Indicate that user receives by the video flowing after mobile edge calculations server transcoding,Indicate by video flowing fromVersion is transcoded ontoComputing resource needed for version, fn,kIt indicates by serial number n
The mobile edge calculations server of small base station distribute to the computing resource of k-th of user;Indicate the mobile side by macro base station
Edge calculation server distributes to the computing resource of k-th of user;
Step 4: using the mass function Z of video flowing in step 3 to video stream transcodingn,k, it is calculated by following formula
Wherein parameterWithIndicate the resolution ratio and bit rate of transcoding rear video stream, qn,kAnd bn,kRespectively indicate original view
The resolution ratio and bit rate of frequency stream, ln is natural logrithm oeprator, and ξ and ξ are positive numbers;
Step 5: and then the video after step 3 transcoding is passed through into downlink transmission to mobile terminal device, consumed time
Are as follows:
WhereinIt is the bit rate of step 4 transit code rear video stream,Indicate the size of transcoding rear video stream, the numerical value
It can be directly obtained by media flow splitter software, Bn,kIndicate the wireless frequency that k-th of user is distributed to by the small base station of serial number n
Spectrum resource;Indicate the radio spectrum resources that k-th of user is distributed to by macro base station,Indicate reachable
The momentary rate arrived, is calculated by following formulaWherein Gn,kTable
Show that the small base station of serial number n can be supplied to the spectrum efficiency of k-th of user, is calculated by following formulaWherein ρn,kIt is that the letter of the downlink from the small base station of serial number n to k-th of user dry is made an uproar
Than,It indicates that macro base station can be supplied to the spectrum efficiency of k-th of user, is calculated by following formulaWhereinIt is the Signal to Interference plus Noise Ratio from macro base station to the downlink of k-th of user;
Step 6: mobile terminal device is decoded the video flowing received, and the time consumed by video decoding is denoted asFor
One constant;
Step 7: according to step 3, step 5 and step 6 as a result, obtaining total time T in step 2n,kValue, be expressed as with equationTime delay D is obtained simultaneouslyn,kValue, with following formula calculate Dn,k
=Tn,k-L;
Step 8: according to step 4 and step 7 as a result, the Reward Program r for obtaining MDP is as follows
WhereinIt is the price of video flow quality, unit is $, Zn,kIt is the mass function of video flowing in step 4, υn,kIt is time delay
Price, unit is $, Dn,kIt is time delay function in step 7;
Step 9: above-mentioned MDP, first progress parameter initialization are solved the problems, such as using the Actor-Critic algorithm of double eligibility traces,
Initialize the eligibility trace rate of decay λ of the part Actorθ∈ is [0,1) and the eligibility trace rate of decay λ of the part Criticω∈[0,
1) the state value function parameter ω of the part policing parameter θ and Critic of the part Actor, is initialized;Initialize Actor part and
The eligibility trace vector of the part Critic is null vector, initializes the learning rate α of the part Actora,t> 0 and the part Critic
Habit rate αc,t> 0;The maximum value of one the number of iterations is set, starts step 10 and is iterated;
Step 10: in each iteration, a movement a is obtained according to the probability-distribution function π of movement (a | s, θ) selection,Wherein μ (s, θμ) it is the mean value being just distributed very much, σ (s, θσ) it is just
The mean square deviation being distributed very much,θμAnd θσIt is the parameter of estimator, θ=[θμ,θσ]T, wherein s is indicated
Current state, θ are current strategies, and the movement is executed under current state, so that it may obtain this movement reward value, state from
Current state is transformed into next state, and obtains the instant reward value r of NextStatet+1;
Step 11: more new state feature vector φ (s), with a linear estimator come learning state value function Vπ(s), estimation side
Method is Vπ(s) ≈ V (s, ω)=ωTφ (s), wherein ω is the parameter of the state value function of the part Critic, ωTIt is turning for ω
It sets, in order to accelerate learning process, multistep update is all carried out using eligibility trace in the part Actor and the part Critic;
Step 12: updating Timing Difference function δ, update method is δ=rt+1+γωV(st+1,ω)-V(st, ω), wherein rt+1+
γωV(st+1, ω) be NextState total reward value, γωIt is decay factor numerical value between 0 to 1, V (st, ω) and it is current shape
Reward value under state;
Step 13: updating the eligibility trace vector z (ω, t) of the part Critic, update method are as follows:
WhereinIt is the gradient of parameter ω, λω∈ [0,1) it is attenuation parameter, z
(ω, t-1) is the eligibility trace vector in the part Critic of the t-1 time slot;
Step 14: updating the parameter ω (t) of state value function, update method is ω (t+1)=ω (t)+αc,tδ z (ω, t),
Middle αc,tIt is the learning rate of the part Critic, meets
Step 15: updating the eligibility trace vector z (θ, t) of the part Actor, update method are as follows:
WhereinIt is the gradient of parameter θ, γθλθIt is attenuation parameter, z
(θ, t-1) is the eligibility trace vector in the part Actor of the t-1 time slot;
Step 16: updating the policing parameter θ of next time slott+1, update method is θ (t+1)=θ (t)+αa,tδ z (θ, t),
Wherein αa,tIt is the learning rate of the part Actor, is a positive number, and meets
Step 17: updating mean μ (s, the θ being just distributed very much in step 10μ), update method isUpdate step
Meansquaredeviationσ (s, the θ being just distributed very much in 15σ), this value is positive number, and update method isJudgement changes
Whether generation restrains, or reaches the upper limit of the number of iterations, if not reaching the upper limit of the number of iterations and iteration does not restrain,
Then return step 10 continues iteration, if reaching the upper limit of the number of iterations or iteration has restrained, terminates iteration.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910343561.4A CN110049315B (en) | 2019-04-26 | 2019-04-26 | Method for improving user experience quality of live video system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910343561.4A CN110049315B (en) | 2019-04-26 | 2019-04-26 | Method for improving user experience quality of live video system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110049315A true CN110049315A (en) | 2019-07-23 |
CN110049315B CN110049315B (en) | 2020-04-24 |
Family
ID=67279613
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910343561.4A Expired - Fee Related CN110049315B (en) | 2019-04-26 | 2019-04-26 | Method for improving user experience quality of live video system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110049315B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111245845A (en) * | 2020-01-14 | 2020-06-05 | 北京邮电大学 | Data processing method based on mobile edge calculation in space-ground heterogeneous network |
CN112511197A (en) * | 2020-12-01 | 2021-03-16 | 南京工业大学 | Unmanned aerial vehicle auxiliary elastic video multicast method based on deep reinforcement learning |
CN112887314A (en) * | 2021-01-27 | 2021-06-01 | 重庆邮电大学 | Time-delay-sensing cloud and mist cooperative video distribution method |
CN113114756A (en) * | 2021-04-08 | 2021-07-13 | 广西师范大学 | Video cache updating method for self-adaptive code rate selection in mobile edge calculation |
CN114786137A (en) * | 2022-04-21 | 2022-07-22 | 重庆邮电大学 | Cache-enabled multi-quality video distribution method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007515866A (en) * | 2003-11-13 | 2007-06-14 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Method and apparatus for smoothing the overall quality of video transmitted over a wireless medium |
CN103888849A (en) * | 2014-04-11 | 2014-06-25 | 北京工业大学 | Computing and wireless resource cooperative dispatching method in mobile cloud video transmission |
CN108307510A (en) * | 2018-02-28 | 2018-07-20 | 北京科技大学 | A kind of power distribution method in isomery subzone network |
CN109068391A (en) * | 2018-09-27 | 2018-12-21 | 青岛智能产业技术研究院 | Car networking communication optimization algorithm based on edge calculations and Actor-Critic algorithm |
WO2019002465A1 (en) * | 2017-06-28 | 2019-01-03 | Deepmind Technologies Limited | Training action selection neural networks using apprenticeship |
-
2019
- 2019-04-26 CN CN201910343561.4A patent/CN110049315B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007515866A (en) * | 2003-11-13 | 2007-06-14 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Method and apparatus for smoothing the overall quality of video transmitted over a wireless medium |
CN103888849A (en) * | 2014-04-11 | 2014-06-25 | 北京工业大学 | Computing and wireless resource cooperative dispatching method in mobile cloud video transmission |
WO2019002465A1 (en) * | 2017-06-28 | 2019-01-03 | Deepmind Technologies Limited | Training action selection neural networks using apprenticeship |
CN108307510A (en) * | 2018-02-28 | 2018-07-20 | 北京科技大学 | A kind of power distribution method in isomery subzone network |
CN109068391A (en) * | 2018-09-27 | 2018-12-21 | 青岛智能产业技术研究院 | Car networking communication optimization algorithm based on edge calculations and Actor-Critic algorithm |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111245845A (en) * | 2020-01-14 | 2020-06-05 | 北京邮电大学 | Data processing method based on mobile edge calculation in space-ground heterogeneous network |
CN112511197A (en) * | 2020-12-01 | 2021-03-16 | 南京工业大学 | Unmanned aerial vehicle auxiliary elastic video multicast method based on deep reinforcement learning |
CN112887314A (en) * | 2021-01-27 | 2021-06-01 | 重庆邮电大学 | Time-delay-sensing cloud and mist cooperative video distribution method |
CN113114756A (en) * | 2021-04-08 | 2021-07-13 | 广西师范大学 | Video cache updating method for self-adaptive code rate selection in mobile edge calculation |
CN114786137A (en) * | 2022-04-21 | 2022-07-22 | 重庆邮电大学 | Cache-enabled multi-quality video distribution method |
CN114786137B (en) * | 2022-04-21 | 2023-06-20 | 重庆邮电大学 | Cache-enabled multi-quality video distribution method |
Also Published As
Publication number | Publication date |
---|---|
CN110049315B (en) | 2020-04-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110049315A (en) | A method of improving live video system user Quality of experience | |
Luo et al. | Adaptive video streaming with edge caching and video transcoding over software-defined mobile networks: A deep reinforcement learning approach | |
CN109857546A (en) | The mobile edge calculations discharging method of multiserver and device based on Lyapunov optimization | |
CN110531617A (en) | Multiple no-manned plane 3D hovering position combined optimization method, device and unmanned plane base station | |
Ayala-Romero et al. | vrAIn: Deep learning based orchestration for computing and radio resources in vRANs | |
CN111918339A (en) | AR task unloading and resource allocation method based on reinforcement learning in mobile edge network | |
CN114595632A (en) | Mobile edge cache optimization method based on federal learning | |
CN109982434B (en) | Wireless resource scheduling integrated intelligent control system and method and wireless communication system | |
Xu et al. | Multi-agent reinforcement learning based distributed transmission in collaborative cloud-edge systems | |
Huang et al. | Utility-oriented resource allocation for 360-degree video transmission over heterogeneous networks | |
CN110049566A (en) | A kind of downlink power distributing method based on multiple no-manned plane secondary communication path | |
Chen et al. | Wireless multiplayer interactive virtual reality game systems with edge computing: Modeling and optimization | |
CN110233755A (en) | The computing resource and frequency spectrum resource allocation method that mist calculates in a kind of Internet of Things | |
CN114116047A (en) | V2I unloading method for vehicle-mounted computation-intensive application based on reinforcement learning | |
Feng et al. | Vabis: Video adaptation bitrate system for time-critical live streaming | |
Yu et al. | User-centric heterogeneous-action deep reinforcement learning for virtual reality in the metaverse over wireless networks | |
CN113395723A (en) | 5G NR downlink scheduling delay optimization system based on reinforcement learning | |
CN114219094B (en) | Communication cost and model robustness optimization method based on multi-task federal learning | |
CN112887314B (en) | Time delay perception cloud and mist cooperative video distribution method | |
CN110190982B (en) | Non-orthogonal multiple access edge computation time and energy consumption optimization based on fair time | |
CN103796293B (en) | A kind of power distribution method under high ferro communication construction | |
CN114340017A (en) | Heterogeneous network resource slicing method with eMBB and URLLC mixed service | |
CN114786137B (en) | Cache-enabled multi-quality video distribution method | |
Tian et al. | Cloud game computing offload based on Multi-Agent Reinforcement Learning | |
CN103313063B (en) | A kind of H.264/AVC video dispatching method based on dual decoding simulation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200424 |
|
CF01 | Termination of patent right due to non-payment of annual fee |