CN116016987A - Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network - Google Patents

Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network Download PDF

Info

Publication number
CN116016987A
CN116016987A CN202211574628.3A CN202211574628A CN116016987A CN 116016987 A CN116016987 A CN 116016987A CN 202211574628 A CN202211574628 A CN 202211574628A CN 116016987 A CN116016987 A CN 116016987A
Authority
CN
China
Prior art keywords
video
code rate
transcoding
model
slice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211574628.3A
Other languages
Chinese (zh)
Inventor
孙彦赞
陈文凯
于军
张舜卿
陈小静
王涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN202211574628.3A priority Critical patent/CN116016987A/en
Publication of CN116016987A publication Critical patent/CN116016987A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

A video code rate self-adaption method based on reinforcement learning for an edge cellular network is characterized in that a server capable of multi-address edge computing (MEC) transcoding and a client with built-in cache are constructed to serve as a video stream session simulation environment, an ABR method (PCMC) model of parallel collaboration joint multi-video slice code rate transcoding and transmission is adopted, a video data set and a wireless bandwidth track data set are used for carrying out training based on asynchronous reinforcement learning (A3C) in the video stream session simulation environment, and the video code rate is self-adaption adjusted through the model in an online stage. According to the invention, under the wireless network scene of configuration MEC, the information on the RAN side, MEC computing resources and storage resources are fully utilized, and simultaneously, the transmission and transcoding parallel execution strategy is used, so that the extra computing delay and computing energy consumption caused by MEC are reduced, and the QoE index of the video stream of the client is improved.

Description

Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network
Technical Field
The invention relates to a technology in the field of video processing, in particular to a video code rate self-adaption method based on reinforcement learning for an edge cellular network.
Background
In order to meet the quality of experience (QoE) of mobile users and provide new high performance quality of service (QoS), multiple access edge computing (MEC), software Defined Mobile Network (SDMN) and cloud radio access network (C-RAN) technologies are introduced in next generation wireless networks, cloud computing capabilities are extended to neighboring Small Base Stations (SBSs) in wireless networks, especially Ultra Dense Networks (UDNs), with computing and storage resources on the Radio Access Network (RAN) side closer to the end users, and high quality of experience is achieved through a variety of Adaptive Bit Rate (ABR) algorithms. Because the buffer space of the MEC is limited, the popularity of the content of the video stream changes due to time and geographic position, and the pre-buffer hit rate of the video stream is low, so that the buffer update and replacement algorithm frequently refreshes the buffer content, and however, the frequent update of the buffer content can increase additional energy consumption.
Disclosure of Invention
Aiming at the defects of inaccurate estimation of network throughput, lack of consideration of RAN side information, insufficient utilization of MEC calculation and cache resources in the existing improved ABR technology, the invention provides a video code rate self-adaptive method based on reinforcement learning for an edge cellular network, and the method uses a strategy of parallel execution of transmission and transcoding while fully utilizing RAN side information, MEC calculation resources and storage resources in a wireless network scene of configuration MEC, thereby reducing extra calculation delay and calculation energy consumption caused by MEC and improving video stream QoE indexes of a client.
The invention is realized by the following technical scheme:
the invention relates to a video code rate self-adaption method based on reinforcement learning for an edge cellular network, which is characterized in that a server capable of multi-address edge computing (MEC) transcoding and a client with built-in cache are constructed to serve as a video stream session simulation environment, an ABR (program control unit) method (PCMC) model of parallel collaboration joint multi-video slice code rate transcoding and transmission is adopted, a video data set and a wireless bandwidth track data set are used for carrying out training based on asynchronous reinforcement learning (A3C) in the video stream session simulation environment, and the video code rate is self-adaption adjusted through the model in an online stage.
The PCMC model is provided with a network structure of multi-action output, and comprises the following components: the system comprises an environment coding module, a strategy generating module and a strategy evaluating module, wherein: the Environment coding module is used for coding the Environment according to the state information reflecting the Environment (Environment) characteristics
Figure BDA0003988810000000011
Generating feature vectors, the policy generation module outputs a set A of k actions in the future based on the feature vectors n And the strategy evaluation module evaluates and evaluates the current strategy and feeds back the current strategy to the strategy generation module to evaluate and adjust the strategy model.
The invention relates to a system for realizing the method, which comprises the following steps: video source server side, customer end and the code rate selection module, buffer memory module and the transcoding module that are located the server side, wherein: the client side locally maintains a video slice cache, local cache state information is added when the video slice is requested to the server side, the code rate selection module operates the PCMC model according to the request information to output code rate selection of a future K block, the cache module sequentially checks whether the video slice of the future K block has a high code rate version or not, the transcoding module correspondingly adds the video slice of the high code rate version into a transcoding task queue, converts the video slice into a corresponding version and then transmits the corresponding version to the client side, and otherwise, the MEC server transmits the high code rate version to the client side after requesting the high code rate version of the video slice from the video source server side for storing all video data.
The transmission is preferably performed in synchronization with the transcoding.
Technical effects
Compared with the existing ABR algorithm, the invention fully considers the information of the RAN side through the PCMC model, can more accurately predict the bandwidth throughput in the wireless network environment, has a flexible multi-action output strategy, and can enable the transcoding and the transmission to be executed in parallel, thereby effectively reducing the calculation delay caused by MEC transcoding, improving the QoE of users and reducing the total energy consumption of the whole video streaming session.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a diagram of a PCMC model network architecture;
FIG. 3 is a graph comparing network convergence curves;
fig. 4 is a graph comparing QoE with each playing index.
Detailed Description
As shown in fig. 1, this embodiment relates to a video code rate adaptive method based on reinforcement learning for an edge cellular network, which includes the following steps:
step 1, constructing a multiple access edge computing (MEC) server with transcoding and caching capabilities and a client with built-in cache as a video streaming session simulation environment.
And the client is internally provided with a cache, when the requested video slice reaches the client, the length of the cache increase is the length of video content contained in the video slice, and the consumption rate of the cache is equal to the playing video rate of the client.
The length of the buffer memory satisfies the following conditions:
Figure BDA0003988810000000021
wherein: b (B) ue (n) is the length of the buffer when the slice with index n reaches the client, t is the time when the video slice reaches the client, and L is the length of the video content contained in the slice; when B is ue (n) a length greater than a threshold value (B) thresh ) When the video is stopped and the video is dormant for an integer number of dormant periods (T s ) Until the buffer memory meets the condition; t (T) ts (n, m) is the transmission time of the m-level code rate slice of the nth block; when the user plays video using the normal play rate, then when the requested video sliceUpon arrival, the consumed length of the buffer is equal to the sum of the transmission time and the transcoding time.
Said threshold value (B thresh ) Preferably, 60 seconds.
The transcoding means that: setting a transcoding task queue at the MEC server, and after selecting a future k-block video code rate by adopting a PCMC model, sequentially checking whether hit occurs or not by the MEC cache, namely whether a high-code-rate version exists or not; and when hit, the transcoding task queues are used for transcoding the video slices sequentially.
The time length of each transcoding task in the transcoding task queue is related to the number of CPU cores of the computing device, the frequency and the difference between the code rates before and after transcoding, and the time length of the transcoding task of the nth slice of the transcoding task queue meets the following conditions:
Figure BDA0003988810000000031
wherein: the original code rate is q (0), the target code rate is q (m), and Cm is the CPU period number required for processing the single code rate difference value level under the single-core condition; when slices are dormant in a transport or client cache, the transcoding tasks in the task queue may be performed in parallel to reduce delay due to transcoding. />
The MEC cache refers to: the server side utilizes a cache resource to cache the most frequently accessed video slice and uses a lowest frequency elimination algorithm (LFU) to simulate a cached updating mechanism, and specifically comprises the following steps: preferentially caching video slices with high access frequency, and caching only the highest code rate version of different code rates of the same slice; when the cache reaches the upper limit, the slices with low frequency are preferentially eliminated, the slices with earliest access time points are preferentially eliminated under the condition of the same frequency, when the code rate version of the requested video slice is lower than the version in the cache, the requested video slice is hit, and otherwise, the requested video slice is not hit.
The transcoding task queue is updated in the following specific way: the mark T is the consumable time length of the transcoding task queue, and when the nth slice reaches the client and the buffer length B of the client mec (n) is greater than the cache threshold B thresh When the T of the task queue is updated to
Figure BDA0003988810000000032
Otherwise, updating T to T ts (n, m) sequentially fetching the j-th task of the transcoding task queue, i.e. B, according to the first-in first-out principle mec (n+j) and updating it to max (0, B) mec (n+j) -T), and updating T to T-B mec (n+j) looping through the steps until T is less than zero or j exceeds the queue length.
And 2, constructing a video data set and a wireless bandwidth track data set.
The video data set is constructed by: a movie video with a resolution of 4K and a length of about 120 minutes is collected as an original data source, and after being encoded by h.264/AVC, the movie video is sliced into video slices of one every 4 seconds according to the HLS protocol format. Randomly selecting video slices and adding noise to form a new pseudo video slice file, and generating 100 pseudo video sources with random duration of 5-100 minutes as a video data set.
The wireless bandwidth track data set is constructed by the following modes: the method comprises the steps of obtaining tracks of different network bandwidths by controlling the number of RBs distributed in each radio frame, adding a random function to the distribution rule of the number of RBs, determining the average size of the network bandwidths by the average value, determining the fluctuation amplitude of the network bandwidths by the variance, and simulating different conditions of the network by changing the average value and the variance of the random function. A total of 100 network trace datasets of 2000s were generated as wireless bandwidth trace datasets.
The bandwidth is simulated by using an OFDM modulation mode, and specifically comprises the following steps: setting subcarrier spacing as 15kHz and setting fading model of wireless channel model as
Figure BDA0003988810000000041
Wherein: />
Figure BDA0003988810000000042
The coefficient alpha is a small-scale fading model. The large-scale fading model uses a path loss model as follows: />
Figure BDA0003988810000000043
Wherein: g A The gain coefficient of the antenna, d is the distance between the base station and the user, and the factory c Is the frequency of the subcarrier, d e Is a constant coefficient.
The small-scale fading uses a Rayleigh fading model, and the probability density function is as follows:
Figure BDA0003988810000000044
wherein: r is a real number greater than or equal to 0, and δ is the standard deviation of the random process.
Preferably, the rayleigh fading is simulated in the wireless bandwidth trace data set by using a method that the imaginary part and the real part are both in standard normal distribution. The simulation environment randomly selects a record and a time starting point each time when loading the network track, so that the randomness of training is ensured, and the starting process is repeated when the simulation environment runs to the end point until the video streaming session is closed.
And 3, constructing an ABR (advanced binary coded carrier) method (PCMC) model for parallel collaborative joint multi-video slice code rate transcoding and transmission in the simulation environment of the step 1, and after loading the data set constructed in the step 2 by the simulation environment of the step 1, continuously performing interactive training on the PCMC model and the simulation environment in the step 1.
As shown in fig. 2, the PCMC model in this embodiment specifically includes: the system comprises an environment coding module, a strategy generating module and a strategy evaluating module, wherein: the Environment coding module is used for coding Environment information according to Environment state information reflecting Environment (Environment) characteristics
Figure BDA00039888100000000411
Figure BDA0003988810000000046
Outputting a feature vector; the strategy generation module generates a decision model->
Figure BDA0003988810000000047
Outputting code rate selection of future k video slices according to the feature vector generated by the environment coding module, namely action A n =(v n ,v n+1 ,...,v n+k ) Probability distribution of->
Figure BDA0003988810000000048
When transmitting slices of index n, transcoding processes of the slice code rates of n+1 to n+k are performed in parallel to reduce delay due to transcoding, v (n) is the code rate selected for the video slice of request index n,
Figure BDA0003988810000000049
for the length of the client Buffer at the time t, b (n) is the average network throughput of the video slices of the index n, Z (n, m) is the byte size of the video slices of the index n code rate m, d (n-1) is the playing katon time length of the client caused by the transmission of the video of the index n, C (n) is the highest code rate version of the video slices of the index n in the Buffer memory, and l (n) is the number of the rest video slices; the strategy evaluation module fits a state value (V) function, and outputs the V value to the strategy generation module according to the feature vector of the environment state information for gradient updating.
Preferably, an Action Mask (Action Mask) is further set in the policy generation module to filter actions that are unlikely to occur, for example, to filter actions with an Action index exceeding the total number of video slices.
Preferably, the built-in storage unit of the environment coding module stores the historic u node information.
The PCMC model is used for maximizing the expected return value J (pi θ ) For the target, adopting an A3C method to perform asynchronous training, namely deploying a plurality of sub-threads on the basis of an AC network architecture to perform simultaneous training, and synchronizing parameters to a main thread after the sub-threads are trained; in the training process, parameters of the strategy generating module and the strategy evaluating module are respectively updated in the following modes:
Figure BDA00039888100000000410
Figure BDA0003988810000000051
wherein: status->
Figure BDA0003988810000000052
Take action A n Difference from average->
Figure BDA0003988810000000053
From state->
Figure BDA0003988810000000054
And perform action A n Is a desired return value obtainable under policy pi>
Figure BDA0003988810000000055
Belman's equation for V function is
Figure BDA0003988810000000056
The rewards of the environment obtained after the intelligent body makes action are
Figure BDA0003988810000000057
ω、μ、δ、/>
Figure BDA0003988810000000058
The weight system for each sub-item is constant. q (0) is the highest code rate, so that normalization processing is convenient; while using discount rewards R that decay over time for model trade-off the importance of near-term rewards and long-term rewards τ To enable the policy model to take into account long-term return values:
Figure BDA0003988810000000059
the strategy generation module and the environment coding module of the PCMC model continuously generate rewards until reaching a termination state, and all sets of state information, actions and rewards of the process, namely a track tau, and the occurrence probability P (tau) of the track tau; the channel fading conditions follow the k-state markov model with confidence space vector +.>
Figure BDA00039888100000000510
For being in state information->
Figure BDA00039888100000000511
Under observe +.>
Figure BDA00039888100000000512
Probability distribution of (2); in a Partially Observable Markov Decision Process (POMDP), the return value is r' n I.e. r can be obtained under confidence space vector n Is a desired value of (2); will r' n Replacement discount rewards R τ Middle r n Obtaining a new discount report R' τ . Since both the environmental state transitions and the policies are stochastic, the same policy model acts on the same environment as the initial state, possibly creating distinct trajectories, so the optimization objective of the reinforcement learning model should be to maximize the observation +.>
Figure BDA00039888100000000515
The desired return value->
Figure BDA00039888100000000514
θ represents all parameter sets in the reinforcement learning model. Video streaming session total energy loss e=e c +E om +E tc Wherein: energy consumption E brought by MEC server-side when executing cache task c =w cm * Z (n, m), when the cache misses, the transmission delay T of the data of the request source server om =Z(n,m)/W om Transmission energy consumption E om =e om *Z(n,m)*T om The method comprises the steps of carrying out a first treatment on the surface of the When the code rate version exists in the cache and is higher than the request, the MEC executes the calculation energy consumption E of the transcoding task tc =ρ 0 *c tm *(q ext -q tar )*T tc (n,m);w cm Buffer energy consumption unit of MEC, w om For MEC to source server bandwidth, e om For the transmission energy consumption unit of MEC to the source server, ρ 0 Power consumption per cycle for CPU operation, c tm The number of cycles required to process each bit of transcoding task for the CPU.
The user QoE index comprises the following steps: average play quality of the client, smoothness of play code rate and play click time.
Through specific practical experiments, under the specific environment setting of table 1, 8 sub-threads are deployed for training, after all the sub-threads are trained for 500 epochs, the network converges, and an optimal model in the training process is recorded. Through testing, the average QoE of the optimal model on the test set can reach 289.28.
Figure BDA0003988810000000061
As shown in fig. 3, the method greatly reduces the time delay caused by the MEC transcoding, simultaneously gives consideration to energy efficiency and improves the QoE of users. The comparison method (Baseline 1) only considers the code rate selection of one video slice at a time in the future, and the method uses the same network architecture and performs training and testing on the same data set and test set. Because the network environment of each epoch has randomness and the average bandwidths are not consistent during training, the performance of the observation model on the same test set reflects the advantages and disadvantages of the method. As can be seen from the figure, the invention has more flexible code rate selection strategy, and the parallel execution of transmission and transcoding reduces delay, so that the optimal effect is better than that of a comparison method.
As shown in fig. 4, the MPC method is taken as a Baseline 2 method, and the three methods are respectively placed under different network scenes, and the average QoE, the average code rate, the average play-back time and the average code rate fluctuation of the three methods are counted. The average QoE of the method is higher than that of the other two methods, and the average code rate is slightly lower than that of the Baseline 2 method, but the method better considers the fluctuation of the video code rate and avoids the degradation of the playing experience quality caused by excessive switching of the video code rate.
Compared with the prior art, the method considers RAN side information more comprehensively, and dynamically selects the code rate of a plurality of video slices in the future by utilizing the information of the RAN side and the client side under the wireless communication environment aiming at configuration MEC through a more flexible code rate selection strategy model (PCMC). For video slices existing in the edge buffer, code rate transcoding is usually needed to be carried out and then the video slices are transmitted to the client, and because the model has flexible multi-video slice code rate selection characteristics, the calculation delay caused by MEC is obviously reduced by executing the transmission and transcoding tasks of the video slices in parallel. Meanwhile, the invention comprehensively considers the energy consumption factors of calculation and transmission, improves the QoE of the user and simultaneously reduces the energy consumption of the video stream session as much as possible.
The foregoing embodiments may be partially modified in numerous ways by those skilled in the art without departing from the principles and spirit of the invention, the scope of which is defined in the claims and not by the foregoing embodiments, and all such implementations are within the scope of the invention.

Claims (10)

1. The video code rate self-adaption method based on reinforcement learning for the edge cellular network is characterized in that a server capable of multi-address edge computing (MEC) transcoding and a client with built-in cache are constructed to serve as a video streaming session simulation environment, an ABR (program counter) method (PCMC) model of parallel collaboration joint multi-video slice code rate transcoding and transmission is adopted, a video data set and a wireless bandwidth track data set are used for carrying out training based on asynchronous reinforcement learning (A3C) in the video streaming session simulation environment, and the video code rate is self-adaption adjusted through the model in an online stage;
the PCMC model is provided with a network structure of multi-action output, and comprises the following components: the system comprises an environment coding module, a strategy generating module and a strategy evaluating module, wherein: the Environment coding module is used for coding the Environment according to the state information reflecting the Environment (Environment) characteristics
Figure FDA0003988809990000012
Generating feature vectors, the policy generation module outputs a set A of k actions in the future based on the feature vectors n And the strategy evaluation module evaluates and evaluates the current strategy and feeds back the current strategy to the strategy generation module to evaluate and adjust the strategy model.
2. The video code rate self-adaptive method based on reinforcement learning of the edge-oriented cellular network according to claim 1, wherein the client is internally provided with a buffer, when the requested video slice reaches the client, the length of the buffer increase is the length of video content contained in the video slice, and the consumption rate of the buffer is equal to the playing video rate of the client;
the length of the buffer memory satisfies the following conditions:
Figure FDA0003988809990000011
wherein: b (B) ue (n) is the length of the buffer when the slice with index n reaches the client, t is the time when the video slice reaches the client, and L is the length of the video content contained in the slice; when B is ue (n) a length greater than a threshold value (B) thresh ) When the video is stopped and the video is dormant for an integer number of dormant periods (T s ) Until the buffer memory meets the condition; t (T) ts (n, m) is the transmission time of the m-level code rate slice of the nth block; when the user plays the video using the normal play rate, the length of consumption of the buffer is equal to the sum of the transmission time and the transcoding time when the requested video slice arrives.
3. The video code rate self-adapting method based on reinforcement learning for the edge-oriented cellular network according to claim 1, wherein the transcoding means: setting a transcoding task queue at a server, adopting a PCMC model to select a future k-block video code rate, and then caching and sequentially checking whether hit occurs, namely whether a high-code-rate version exists; when hit, the transcoding task queues are used for sequentially transcoding the video slices;
the time length of each transcoding task in the transcoding task queue is related to the number of CPU cores of the computing device, the frequency and the difference between the code rates before and after transcoding, and the time length of the transcoding task of the nth slice of the transcoding task queue meets the following conditions:
Figure FDA0003988809990000021
wherein: the original code rate is q (0), the target code rate is q (m), and C m The number of CPU cycles required to process a single code rate difference level in a single core case; when the slice is transmittedWhen the input or client buffer memory is dormant, the transcoding tasks in the task queue can be executed in parallel so as to reduce delay caused by transcoding.
4. The video code rate self-adaption method based on reinforcement learning for the edge-oriented cellular network according to claim 1, wherein the MEC buffer is: the server side utilizes a cache resource to cache the most frequently accessed video slice and uses a lowest frequency elimination algorithm (LFU) to simulate a cached updating mechanism, and specifically comprises the following steps: preferentially caching video slices with high access frequency, and caching only the highest code rate version of different code rates of the same slice; when the cache reaches the upper limit, preferentially eliminating the slices with low frequency, preferentially eliminating the slices with earliest access time points under the condition of the same frequency, and when the code rate version of the requested video slice is lower than the version in the cache, obtaining a hit, otherwise, obtaining a miss;
the transcoding task queue is updated in the following specific way: the mark T is the consumable time length of the transcoding task queue, and when the nth slice reaches the client and the buffer length B of the client mec (n) is greater than the cache threshold B thresh When the T of the task queue is updated to
Figure FDA0003988809990000022
Otherwise, updating T to T ts (n, m) sequentially fetching the j-th task of the transcoding task queue, i.e. B, according to the first-in first-out principle mec (n+j) and updating it to max (0, B) mec (n+j) -T), and updating T to T-B mec (n+j) looping through the steps until T is less than zero or j exceeds the queue length.
5. The video code rate self-adapting method based on reinforcement learning for edge cellular network according to claim 1, wherein said video data set is constructed by: collecting 10 movie videos with resolution of 4K and length of about 120 minutes as an original data source, and segmenting the movie videos into video slices of one every 4 seconds according to an HLS protocol format after H.264/AVC coding is used; randomly selecting video slices and adding noise to form a new pseudo video slice file, and generating 100 pseudo video sources with random duration of 5-100 minutes as a video data set.
6. The video code rate self-adapting method based on reinforcement learning for edge cellular network according to claim 1, wherein said wireless bandwidth trace data set is constructed by: obtaining tracks of different network bandwidths by controlling the number of RBs distributed in each radio frame, adding a random function to the distribution rule of the number of RBs, determining the average size of the network bandwidths by the average value, determining the fluctuation amplitude of the network bandwidths by the variance, and simulating different conditions of the network by changing the average value and the variance of the random function; generating 100 total network track data sets of 2000s as wireless bandwidth track data sets;
the bandwidth is simulated by using an OFDM modulation mode, and specifically comprises the following steps: setting subcarrier spacing as 15kHz and setting fading model of wireless channel model as
Figure FDA0003988809990000023
Wherein: />
Figure FDA0003988809990000024
The coefficient alpha is a small-scale fading model;
the large-scale fading model uses a path loss model as follows:
Figure FDA0003988809990000031
wherein: g A Is the gain coefficient of the antenna, d is the distance between the base station and the user, f c Is the frequency of the subcarrier, d e Is a constant coefficient;
the small-scale fading uses a Rayleigh fading model, and the probability density function is as follows:
Figure FDA0003988809990000032
wherein: r is a real number greater than or equal to 0, and δ is the standard deviation of the random process.
7. The video code rate self-adaption method based on reinforcement learning for the edge cellular network according to claim 1, wherein the method that the imaginary part and the real part in the wireless bandwidth track data set are both standard normal distribution is used for simulating Rayleigh fading; the simulation environment randomly selects a record and a time starting point each time when loading the network track, so that the randomness of training is ensured, and the starting process is repeated when the simulation environment runs to the end point until the video streaming session is closed.
8. The video code rate self-adaption method based on reinforcement learning for the edge-oriented cellular network according to claim 1, wherein the PCMC model specifically comprises: the system comprises an environment coding module, a strategy generating module and a strategy evaluating module, wherein: the Environment coding module is used for coding Environment information according to Environment state information reflecting Environment (Environment) characteristics
Figure FDA0003988809990000033
Figure FDA0003988809990000034
Z (n, m), d (n-1), C (n), l (n)) outputs a feature vector; the strategy generation module generates a decision model->
Figure FDA0003988809990000035
Outputting code rate selection of future k video slices according to the feature vector generated by the environment coding module, namely action A n =(v n ,v n+1 ,...,v n+k ) Probability distribution of->
Figure FDA0003988809990000036
When transmitting the slice of index n, the transcoding process of the slice code rate of n+1 is performed in parallel to reduce the delay caused by transcoding, v (n) is the code rate selected for the video slice of request index n, < >>
Figure FDA0003988809990000037
For the length of the client Buffer at the time t, b (n) is the average network throughput of the video slices of the index n, Z (n, m) is the byte size of the video slices of the index n code rate m, d (n-1) is the playing katon time length of the client caused by the transmission of the video of the index n, C (n) is the highest code rate version of the video slices of the index n in the Buffer memory, and l (n) is the number of the rest video slices; the strategy evaluation module fits a state value (V) function, and outputs the V value to the strategy generation module according to the feature vector of the environment state information for gradient updating.
9. The video rate adaptation method based on reinforcement learning for an edge-oriented cellular network of claim 1, wherein PCMC model is used to maximize the expected return value J (pi θ ) For the target, adopting an A3C method to perform asynchronous training, namely deploying a plurality of sub-threads on the basis of an AC network architecture to perform simultaneous training, and synchronizing parameters to a main thread after the sub-threads are trained; in the training process, parameters of the strategy generating module and the strategy evaluating module are respectively updated in the following modes:
Figure FDA0003988809990000038
Figure FDA0003988809990000039
wherein: status->
Figure FDA00039888099900000310
Take action A n Difference from average->
Figure FDA0003988809990000041
From state->
Figure FDA0003988809990000042
And perform action A n Is a desired return value obtainable under policy pi>
Figure FDA0003988809990000043
Belman's equation for V-function is +.>
Figure FDA0003988809990000044
The optimization objective of the reinforcement learning model should be to maximize the observed +.>
Figure FDA0003988809990000045
The desired return value->
Figure FDA0003988809990000046
θ represents all parameter sets in the reinforcement learning model, confidence space vector +.>
Figure FDA0003988809990000047
For being in state information->
Figure FDA0003988809990000048
Under observe +.>
Figure FDA0003988809990000049
Probability distribution of (2); the rewarding of the intelligent agent for obtaining the environment after the intelligent agent acts is +.>
Figure FDA00039888099900000410
ω、μ、δ、/>
Figure FDA00039888099900000411
The weight system for each sub-item is constant; q (0) is the highest code rate, so that normalization processing is convenient; while using discount rewards R that decay over time for model trade-off the importance of near-term rewards and long-term rewards τ To enable the policy model to take into account long-term return values: />
Figure FDA00039888099900000412
The policy generation module and the environment coding module of the PCMC model continuously generate rewards until reaching a termination stateState, all sets of state information, actions and rewards of this process, i.e. trajectories τ, probability of occurrence P (τ); in a Partially Observable Markov Decision Process (POMDP), the return value is r' n I.e. r can be obtained under confidence space vector n Is a desired value of (2); will r' m Replacement discount rewards R τ Middle r n Obtaining a new discount report R' τ The method comprises the steps of carrying out a first treatment on the surface of the Since both the environmental state transitions and the policies are stochastic, the same policy model acts on the same environment as the initial state, possibly creating distinct trajectories, so the optimization objective of the reinforcement learning model should be to maximize the observation +.>
Figure FDA00039888099900000413
A desired return; video streaming session total energy loss e=e c +E om +E tc Wherein: energy consumption E brought by MEC server-side when executing cache task c =w cm * Z (n, m), when the cache misses, the transmission delay T of the data of the request source server om =Z(n,m)/W om Transmission energy consumption E om =e om *Z(n,m)*T om The method comprises the steps of carrying out a first treatment on the surface of the When the code rate version exists in the cache and is higher than the request, the MEC executes the calculation energy consumption E of the transcoding task tc =ρ 0 *c tm *(q ext -q tar )*T tc (n,m);w cm Buffer energy consumption unit of MEC, w om For MEC to source server bandwidth, e om For the transmission energy consumption unit of MEC to the source server, ρ 0 Power consumption per cycle for CPU operation, c tm The number of cycles required to process each bit of transcoding task for the CPU.
10. A system for implementing the reinforcement learning-based video rate adaptation method for an edge-oriented cellular network of any one of claims 1-9, comprising: video source server side, customer end and the code rate selection module, buffer memory module and the transcoding module that are located the server side, wherein: the client side locally maintains a video slice cache, local cache state information is added when the video slice is requested to the server side, a PCMC model is operated by a code rate selection module according to the request information to output code rate selection of a future K block, the cache module sequentially checks whether a high code rate version exists in the video slice of the future K block, a transcoding module correspondingly adds the video slice of the high code rate version into a transcoding task queue, the video slice is converted into a corresponding version and then is transmitted to the client side, otherwise, the MEC server requests the high code rate version of the video slice to a video source server side for storing all video data, and the MEC server forwards and transmits the video slice to the client side;
the transmission is performed in synchronization with the transcoding.
CN202211574628.3A 2022-12-08 2022-12-08 Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network Pending CN116016987A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211574628.3A CN116016987A (en) 2022-12-08 2022-12-08 Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211574628.3A CN116016987A (en) 2022-12-08 2022-12-08 Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network

Publications (1)

Publication Number Publication Date
CN116016987A true CN116016987A (en) 2023-04-25

Family

ID=86028904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211574628.3A Pending CN116016987A (en) 2022-12-08 2022-12-08 Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network

Country Status (1)

Country Link
CN (1) CN116016987A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116805923A (en) * 2023-08-25 2023-09-26 淳安华数数字电视有限公司 Broadband communication method based on edge calculation

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109525861A (en) * 2018-12-05 2019-03-26 北京邮电大学 A kind of method and device of video needed for determining user
CN110913373A (en) * 2019-09-17 2020-03-24 上海大学 In-vehicle wireless communication platform based on joint time-frequency priority strategy and anti-interference method thereof
US20200162535A1 (en) * 2018-11-19 2020-05-21 Zhan Ma Methods and Apparatus for Learning Based Adaptive Real-time Streaming
CN111431941A (en) * 2020-05-13 2020-07-17 南京工业大学 Real-time video code rate self-adaption method based on mobile edge calculation
CN113114756A (en) * 2021-04-08 2021-07-13 广西师范大学 Video cache updating method for self-adaptive code rate selection in mobile edge calculation
CN114501468A (en) * 2022-02-22 2022-05-13 上海大学 Method for allocating joint uplink and downlink slice resources in TDD network
CN114640870A (en) * 2022-03-21 2022-06-17 陕西师范大学 QoE-driven wireless VR video self-adaptive transmission optimization method and system
CN114867030A (en) * 2022-06-09 2022-08-05 东南大学 Double-time-scale intelligent wireless access network slicing method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200162535A1 (en) * 2018-11-19 2020-05-21 Zhan Ma Methods and Apparatus for Learning Based Adaptive Real-time Streaming
CN109525861A (en) * 2018-12-05 2019-03-26 北京邮电大学 A kind of method and device of video needed for determining user
CN110913373A (en) * 2019-09-17 2020-03-24 上海大学 In-vehicle wireless communication platform based on joint time-frequency priority strategy and anti-interference method thereof
CN111431941A (en) * 2020-05-13 2020-07-17 南京工业大学 Real-time video code rate self-adaption method based on mobile edge calculation
CN113114756A (en) * 2021-04-08 2021-07-13 广西师范大学 Video cache updating method for self-adaptive code rate selection in mobile edge calculation
CN114501468A (en) * 2022-02-22 2022-05-13 上海大学 Method for allocating joint uplink and downlink slice resources in TDD network
CN114640870A (en) * 2022-03-21 2022-06-17 陕西师范大学 QoE-driven wireless VR video self-adaptive transmission optimization method and system
CN114867030A (en) * 2022-06-09 2022-08-05 东南大学 Double-time-scale intelligent wireless access network slicing method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
J. LUO 等: "Adaptive Video Streaming With Edge Caching and Video Transcoding Over Software-Defined Mobile Networks: A Deep Reinforcement Learning Approach", IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, vol. 19, no. 03, 3 December 2019 (2019-12-03), pages 1577 - 1592, XP011777875, DOI: 10.1109/TWC.2019.2955129 *
曹行健 等: "面向智慧交通的图像处理与边缘计算", 中国图象图形学报, vol. 27, no. 06, 16 June 2022 (2022-06-16), pages 1743 - 1767 *
王英: "移动边缘视频自适应传输与缓存机制研究", 中国优秀硕士学位论文全文数据库信息科技辑, no. 2022, 15 April 2022 (2022-04-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116805923A (en) * 2023-08-25 2023-09-26 淳安华数数字电视有限公司 Broadband communication method based on edge calculation
CN116805923B (en) * 2023-08-25 2023-11-10 淳安华数数字电视有限公司 Broadband communication method based on edge calculation

Similar Documents

Publication Publication Date Title
Pedersen et al. Enhancing mobile video capacity and quality using rate adaptation, RAN caching and processing
Ahlehagh et al. Video-aware scheduling and caching in the radio access network
CN112953922B (en) Self-adaptive streaming media control method, system, computer equipment and application
Khan et al. A survey on mobile edge computing for video streaming: Opportunities and challenges
Chen et al. Artificial intelligence aided joint bit rate selection and radio resource allocation for adaptive video streaming over F-RANs
Guo et al. Buffer-aware streaming in small-scale wireless networks: A deep reinforcement learning approach
Tan et al. Radio network-aware edge caching for video delivery in MEC-enabled cellular networks
Chiang et al. Collaborative social-aware and QoE-driven video caching and adaptation in edge network
Baccour et al. Proactive video chunks caching and processing for latency and cost minimization in edge networks
Hong et al. Continuous bitrate & latency control with deep reinforcement learning for live video streaming
Chua et al. Resource allocation for mobile metaverse with the Internet of Vehicles over 6G wireless communications: A deep reinforcement learning approach
Mu et al. AMIS: Edge computing based adaptive mobile video streaming
CN116016987A (en) Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network
Zhao et al. Popularity-based and version-aware caching scheme at edge servers for multi-version VoD systems
Tian et al. Deeplive: QoE optimization for live video streaming through deep reinforcement learning
KR101966588B1 (en) Method and apparatus for receiving video contents
Li et al. User dynamics-aware edge caching and computing for mobile virtual reality
Cai et al. Mec-based qoe optimization for adaptive video streaming via satellite backhaul
Kim et al. eff-HAS: Achieve higher efficiency in data and energy usage on dynamic adaptive streaming
Chen et al. Cooperative caching for scalable video coding using value-decomposed dimensional networks
Lin et al. Knn-q learning algorithm of bitrate adaptation for video streaming over http
Zhang et al. Cache-enabled adaptive bit rate streaming via deep self-transfer reinforcement learning
CN115720237A (en) Caching and resource scheduling method for edge network self-adaptive bit rate video
Chou et al. Pricing-based deep reinforcement learning for live video streaming with joint user association and resource management in mobile edge computing
Mu et al. AMIS-MU: edge computing based adaptive video streaming for multiple mobile users

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination