CN116016987A - Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network - Google Patents
Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network Download PDFInfo
- Publication number
- CN116016987A CN116016987A CN202211574628.3A CN202211574628A CN116016987A CN 116016987 A CN116016987 A CN 116016987A CN 202211574628 A CN202211574628 A CN 202211574628A CN 116016987 A CN116016987 A CN 116016987A
- Authority
- CN
- China
- Prior art keywords
- video
- code rate
- transcoding
- model
- slice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 230000002787 reinforcement Effects 0.000 title claims abstract description 25
- 230000001413 cellular effect Effects 0.000 title claims abstract description 17
- 230000005540 biological transmission Effects 0.000 claims abstract description 24
- CFKMVGJGLGKFKI-UHFFFAOYSA-N 4-chloro-m-cresol Chemical compound CC1=CC(O)=CC=C1Cl CFKMVGJGLGKFKI-UHFFFAOYSA-N 0.000 claims abstract description 20
- 238000005265 energy consumption Methods 0.000 claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000004088 simulation Methods 0.000 claims abstract description 14
- 230000009471 action Effects 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 16
- 239000013598 vector Substances 0.000 claims description 14
- 238000005562 fading Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 4
- 230000007774 longterm Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 3
- 230000008030 elimination Effects 0.000 claims description 2
- 238000003379 elimination reaction Methods 0.000 claims description 2
- 230000007613 environmental effect Effects 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 230000007704 transition Effects 0.000 claims description 2
- 230000006978 adaptation Effects 0.000 claims 2
- 239000003795 chemical substances by application Substances 0.000 claims 2
- 238000012360 testing method Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
A video code rate self-adaption method based on reinforcement learning for an edge cellular network is characterized in that a server capable of multi-address edge computing (MEC) transcoding and a client with built-in cache are constructed to serve as a video stream session simulation environment, an ABR method (PCMC) model of parallel collaboration joint multi-video slice code rate transcoding and transmission is adopted, a video data set and a wireless bandwidth track data set are used for carrying out training based on asynchronous reinforcement learning (A3C) in the video stream session simulation environment, and the video code rate is self-adaption adjusted through the model in an online stage. According to the invention, under the wireless network scene of configuration MEC, the information on the RAN side, MEC computing resources and storage resources are fully utilized, and simultaneously, the transmission and transcoding parallel execution strategy is used, so that the extra computing delay and computing energy consumption caused by MEC are reduced, and the QoE index of the video stream of the client is improved.
Description
Technical Field
The invention relates to a technology in the field of video processing, in particular to a video code rate self-adaption method based on reinforcement learning for an edge cellular network.
Background
In order to meet the quality of experience (QoE) of mobile users and provide new high performance quality of service (QoS), multiple access edge computing (MEC), software Defined Mobile Network (SDMN) and cloud radio access network (C-RAN) technologies are introduced in next generation wireless networks, cloud computing capabilities are extended to neighboring Small Base Stations (SBSs) in wireless networks, especially Ultra Dense Networks (UDNs), with computing and storage resources on the Radio Access Network (RAN) side closer to the end users, and high quality of experience is achieved through a variety of Adaptive Bit Rate (ABR) algorithms. Because the buffer space of the MEC is limited, the popularity of the content of the video stream changes due to time and geographic position, and the pre-buffer hit rate of the video stream is low, so that the buffer update and replacement algorithm frequently refreshes the buffer content, and however, the frequent update of the buffer content can increase additional energy consumption.
Disclosure of Invention
Aiming at the defects of inaccurate estimation of network throughput, lack of consideration of RAN side information, insufficient utilization of MEC calculation and cache resources in the existing improved ABR technology, the invention provides a video code rate self-adaptive method based on reinforcement learning for an edge cellular network, and the method uses a strategy of parallel execution of transmission and transcoding while fully utilizing RAN side information, MEC calculation resources and storage resources in a wireless network scene of configuration MEC, thereby reducing extra calculation delay and calculation energy consumption caused by MEC and improving video stream QoE indexes of a client.
The invention is realized by the following technical scheme:
the invention relates to a video code rate self-adaption method based on reinforcement learning for an edge cellular network, which is characterized in that a server capable of multi-address edge computing (MEC) transcoding and a client with built-in cache are constructed to serve as a video stream session simulation environment, an ABR (program control unit) method (PCMC) model of parallel collaboration joint multi-video slice code rate transcoding and transmission is adopted, a video data set and a wireless bandwidth track data set are used for carrying out training based on asynchronous reinforcement learning (A3C) in the video stream session simulation environment, and the video code rate is self-adaption adjusted through the model in an online stage.
The PCMC model is provided with a network structure of multi-action output, and comprises the following components: the system comprises an environment coding module, a strategy generating module and a strategy evaluating module, wherein: the Environment coding module is used for coding the Environment according to the state information reflecting the Environment (Environment) characteristicsGenerating feature vectors, the policy generation module outputs a set A of k actions in the future based on the feature vectors n And the strategy evaluation module evaluates and evaluates the current strategy and feeds back the current strategy to the strategy generation module to evaluate and adjust the strategy model.
The invention relates to a system for realizing the method, which comprises the following steps: video source server side, customer end and the code rate selection module, buffer memory module and the transcoding module that are located the server side, wherein: the client side locally maintains a video slice cache, local cache state information is added when the video slice is requested to the server side, the code rate selection module operates the PCMC model according to the request information to output code rate selection of a future K block, the cache module sequentially checks whether the video slice of the future K block has a high code rate version or not, the transcoding module correspondingly adds the video slice of the high code rate version into a transcoding task queue, converts the video slice into a corresponding version and then transmits the corresponding version to the client side, and otherwise, the MEC server transmits the high code rate version to the client side after requesting the high code rate version of the video slice from the video source server side for storing all video data.
The transmission is preferably performed in synchronization with the transcoding.
Technical effects
Compared with the existing ABR algorithm, the invention fully considers the information of the RAN side through the PCMC model, can more accurately predict the bandwidth throughput in the wireless network environment, has a flexible multi-action output strategy, and can enable the transcoding and the transmission to be executed in parallel, thereby effectively reducing the calculation delay caused by MEC transcoding, improving the QoE of users and reducing the total energy consumption of the whole video streaming session.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a diagram of a PCMC model network architecture;
FIG. 3 is a graph comparing network convergence curves;
fig. 4 is a graph comparing QoE with each playing index.
Detailed Description
As shown in fig. 1, this embodiment relates to a video code rate adaptive method based on reinforcement learning for an edge cellular network, which includes the following steps:
And the client is internally provided with a cache, when the requested video slice reaches the client, the length of the cache increase is the length of video content contained in the video slice, and the consumption rate of the cache is equal to the playing video rate of the client.
The length of the buffer memory satisfies the following conditions:
wherein: b (B) ue (n) is the length of the buffer when the slice with index n reaches the client, t is the time when the video slice reaches the client, and L is the length of the video content contained in the slice; when B is ue (n) a length greater than a threshold value (B) thresh ) When the video is stopped and the video is dormant for an integer number of dormant periods (T s ) Until the buffer memory meets the condition; t (T) ts (n, m) is the transmission time of the m-level code rate slice of the nth block; when the user plays video using the normal play rate, then when the requested video sliceUpon arrival, the consumed length of the buffer is equal to the sum of the transmission time and the transcoding time.
Said threshold value (B thresh ) Preferably, 60 seconds.
The transcoding means that: setting a transcoding task queue at the MEC server, and after selecting a future k-block video code rate by adopting a PCMC model, sequentially checking whether hit occurs or not by the MEC cache, namely whether a high-code-rate version exists or not; and when hit, the transcoding task queues are used for transcoding the video slices sequentially.
The time length of each transcoding task in the transcoding task queue is related to the number of CPU cores of the computing device, the frequency and the difference between the code rates before and after transcoding, and the time length of the transcoding task of the nth slice of the transcoding task queue meets the following conditions:
wherein: the original code rate is q (0), the target code rate is q (m), and Cm is the CPU period number required for processing the single code rate difference value level under the single-core condition; when slices are dormant in a transport or client cache, the transcoding tasks in the task queue may be performed in parallel to reduce delay due to transcoding. />
The MEC cache refers to: the server side utilizes a cache resource to cache the most frequently accessed video slice and uses a lowest frequency elimination algorithm (LFU) to simulate a cached updating mechanism, and specifically comprises the following steps: preferentially caching video slices with high access frequency, and caching only the highest code rate version of different code rates of the same slice; when the cache reaches the upper limit, the slices with low frequency are preferentially eliminated, the slices with earliest access time points are preferentially eliminated under the condition of the same frequency, when the code rate version of the requested video slice is lower than the version in the cache, the requested video slice is hit, and otherwise, the requested video slice is not hit.
The transcoding task queue is updated in the following specific way: the mark T is the consumable time length of the transcoding task queue, and when the nth slice reaches the client and the buffer length B of the client mec (n) is greater than the cache threshold B thresh When the T of the task queue is updated toOtherwise, updating T to T ts (n, m) sequentially fetching the j-th task of the transcoding task queue, i.e. B, according to the first-in first-out principle mec (n+j) and updating it to max (0, B) mec (n+j) -T), and updating T to T-B mec (n+j) looping through the steps until T is less than zero or j exceeds the queue length.
And 2, constructing a video data set and a wireless bandwidth track data set.
The video data set is constructed by: a movie video with a resolution of 4K and a length of about 120 minutes is collected as an original data source, and after being encoded by h.264/AVC, the movie video is sliced into video slices of one every 4 seconds according to the HLS protocol format. Randomly selecting video slices and adding noise to form a new pseudo video slice file, and generating 100 pseudo video sources with random duration of 5-100 minutes as a video data set.
The wireless bandwidth track data set is constructed by the following modes: the method comprises the steps of obtaining tracks of different network bandwidths by controlling the number of RBs distributed in each radio frame, adding a random function to the distribution rule of the number of RBs, determining the average size of the network bandwidths by the average value, determining the fluctuation amplitude of the network bandwidths by the variance, and simulating different conditions of the network by changing the average value and the variance of the random function. A total of 100 network trace datasets of 2000s were generated as wireless bandwidth trace datasets.
The bandwidth is simulated by using an OFDM modulation mode, and specifically comprises the following steps: setting subcarrier spacing as 15kHz and setting fading model of wireless channel model asWherein: />The coefficient alpha is a small-scale fading model. The large-scale fading model uses a path loss model as follows: />Wherein: g A The gain coefficient of the antenna, d is the distance between the base station and the user, and the factory c Is the frequency of the subcarrier, d e Is a constant coefficient.
The small-scale fading uses a Rayleigh fading model, and the probability density function is as follows:wherein: r is a real number greater than or equal to 0, and δ is the standard deviation of the random process.
Preferably, the rayleigh fading is simulated in the wireless bandwidth trace data set by using a method that the imaginary part and the real part are both in standard normal distribution. The simulation environment randomly selects a record and a time starting point each time when loading the network track, so that the randomness of training is ensured, and the starting process is repeated when the simulation environment runs to the end point until the video streaming session is closed.
And 3, constructing an ABR (advanced binary coded carrier) method (PCMC) model for parallel collaborative joint multi-video slice code rate transcoding and transmission in the simulation environment of the step 1, and after loading the data set constructed in the step 2 by the simulation environment of the step 1, continuously performing interactive training on the PCMC model and the simulation environment in the step 1.
As shown in fig. 2, the PCMC model in this embodiment specifically includes: the system comprises an environment coding module, a strategy generating module and a strategy evaluating module, wherein: the Environment coding module is used for coding Environment information according to Environment state information reflecting Environment (Environment) characteristics Outputting a feature vector; the strategy generation module generates a decision model->Outputting code rate selection of future k video slices according to the feature vector generated by the environment coding module, namely action A n =(v n ,v n+1 ,...,v n+k ) Probability distribution of->When transmitting slices of index n, transcoding processes of the slice code rates of n+1 to n+k are performed in parallel to reduce delay due to transcoding, v (n) is the code rate selected for the video slice of request index n,for the length of the client Buffer at the time t, b (n) is the average network throughput of the video slices of the index n, Z (n, m) is the byte size of the video slices of the index n code rate m, d (n-1) is the playing katon time length of the client caused by the transmission of the video of the index n, C (n) is the highest code rate version of the video slices of the index n in the Buffer memory, and l (n) is the number of the rest video slices; the strategy evaluation module fits a state value (V) function, and outputs the V value to the strategy generation module according to the feature vector of the environment state information for gradient updating.
Preferably, an Action Mask (Action Mask) is further set in the policy generation module to filter actions that are unlikely to occur, for example, to filter actions with an Action index exceeding the total number of video slices.
Preferably, the built-in storage unit of the environment coding module stores the historic u node information.
The PCMC model is used for maximizing the expected return value J (pi θ ) For the target, adopting an A3C method to perform asynchronous training, namely deploying a plurality of sub-threads on the basis of an AC network architecture to perform simultaneous training, and synchronizing parameters to a main thread after the sub-threads are trained; in the training process, parameters of the strategy generating module and the strategy evaluating module are respectively updated in the following modes:
wherein: status->Take action A n Difference from average->From state->And perform action A n Is a desired return value obtainable under policy pi>Belman's equation for V function is
The rewards of the environment obtained after the intelligent body makes action are
ω、μ、δ、/>The weight system for each sub-item is constant. q (0) is the highest code rate, so that normalization processing is convenient; while using discount rewards R that decay over time for model trade-off the importance of near-term rewards and long-term rewards τ To enable the policy model to take into account long-term return values:the strategy generation module and the environment coding module of the PCMC model continuously generate rewards until reaching a termination state, and all sets of state information, actions and rewards of the process, namely a track tau, and the occurrence probability P (tau) of the track tau; the channel fading conditions follow the k-state markov model with confidence space vector +.>For being in state information->Under observe +.>Probability distribution of (2); in a Partially Observable Markov Decision Process (POMDP), the return value is r' n I.e. r can be obtained under confidence space vector n Is a desired value of (2); will r' n Replacement discount rewards R τ Middle r n Obtaining a new discount report R' τ . Since both the environmental state transitions and the policies are stochastic, the same policy model acts on the same environment as the initial state, possibly creating distinct trajectories, so the optimization objective of the reinforcement learning model should be to maximize the observation +.>The desired return value->θ represents all parameter sets in the reinforcement learning model. Video streaming session total energy loss e=e c +E om +E tc Wherein: energy consumption E brought by MEC server-side when executing cache task c =w cm * Z (n, m), when the cache misses, the transmission delay T of the data of the request source server om =Z(n,m)/W om Transmission energy consumption E om =e om *Z(n,m)*T om The method comprises the steps of carrying out a first treatment on the surface of the When the code rate version exists in the cache and is higher than the request, the MEC executes the calculation energy consumption E of the transcoding task tc =ρ 0 *c tm *(q ext -q tar )*T tc (n,m);w cm Buffer energy consumption unit of MEC, w om For MEC to source server bandwidth, e om For the transmission energy consumption unit of MEC to the source server, ρ 0 Power consumption per cycle for CPU operation, c tm The number of cycles required to process each bit of transcoding task for the CPU.
The user QoE index comprises the following steps: average play quality of the client, smoothness of play code rate and play click time.
Through specific practical experiments, under the specific environment setting of table 1, 8 sub-threads are deployed for training, after all the sub-threads are trained for 500 epochs, the network converges, and an optimal model in the training process is recorded. Through testing, the average QoE of the optimal model on the test set can reach 289.28.
As shown in fig. 3, the method greatly reduces the time delay caused by the MEC transcoding, simultaneously gives consideration to energy efficiency and improves the QoE of users. The comparison method (Baseline 1) only considers the code rate selection of one video slice at a time in the future, and the method uses the same network architecture and performs training and testing on the same data set and test set. Because the network environment of each epoch has randomness and the average bandwidths are not consistent during training, the performance of the observation model on the same test set reflects the advantages and disadvantages of the method. As can be seen from the figure, the invention has more flexible code rate selection strategy, and the parallel execution of transmission and transcoding reduces delay, so that the optimal effect is better than that of a comparison method.
As shown in fig. 4, the MPC method is taken as a Baseline 2 method, and the three methods are respectively placed under different network scenes, and the average QoE, the average code rate, the average play-back time and the average code rate fluctuation of the three methods are counted. The average QoE of the method is higher than that of the other two methods, and the average code rate is slightly lower than that of the Baseline 2 method, but the method better considers the fluctuation of the video code rate and avoids the degradation of the playing experience quality caused by excessive switching of the video code rate.
Compared with the prior art, the method considers RAN side information more comprehensively, and dynamically selects the code rate of a plurality of video slices in the future by utilizing the information of the RAN side and the client side under the wireless communication environment aiming at configuration MEC through a more flexible code rate selection strategy model (PCMC). For video slices existing in the edge buffer, code rate transcoding is usually needed to be carried out and then the video slices are transmitted to the client, and because the model has flexible multi-video slice code rate selection characteristics, the calculation delay caused by MEC is obviously reduced by executing the transmission and transcoding tasks of the video slices in parallel. Meanwhile, the invention comprehensively considers the energy consumption factors of calculation and transmission, improves the QoE of the user and simultaneously reduces the energy consumption of the video stream session as much as possible.
The foregoing embodiments may be partially modified in numerous ways by those skilled in the art without departing from the principles and spirit of the invention, the scope of which is defined in the claims and not by the foregoing embodiments, and all such implementations are within the scope of the invention.
Claims (10)
1. The video code rate self-adaption method based on reinforcement learning for the edge cellular network is characterized in that a server capable of multi-address edge computing (MEC) transcoding and a client with built-in cache are constructed to serve as a video streaming session simulation environment, an ABR (program counter) method (PCMC) model of parallel collaboration joint multi-video slice code rate transcoding and transmission is adopted, a video data set and a wireless bandwidth track data set are used for carrying out training based on asynchronous reinforcement learning (A3C) in the video streaming session simulation environment, and the video code rate is self-adaption adjusted through the model in an online stage;
the PCMC model is provided with a network structure of multi-action output, and comprises the following components: the system comprises an environment coding module, a strategy generating module and a strategy evaluating module, wherein: the Environment coding module is used for coding the Environment according to the state information reflecting the Environment (Environment) characteristicsGenerating feature vectors, the policy generation module outputs a set A of k actions in the future based on the feature vectors n And the strategy evaluation module evaluates and evaluates the current strategy and feeds back the current strategy to the strategy generation module to evaluate and adjust the strategy model.
2. The video code rate self-adaptive method based on reinforcement learning of the edge-oriented cellular network according to claim 1, wherein the client is internally provided with a buffer, when the requested video slice reaches the client, the length of the buffer increase is the length of video content contained in the video slice, and the consumption rate of the buffer is equal to the playing video rate of the client;
the length of the buffer memory satisfies the following conditions:
wherein: b (B) ue (n) is the length of the buffer when the slice with index n reaches the client, t is the time when the video slice reaches the client, and L is the length of the video content contained in the slice; when B is ue (n) a length greater than a threshold value (B) thresh ) When the video is stopped and the video is dormant for an integer number of dormant periods (T s ) Until the buffer memory meets the condition; t (T) ts (n, m) is the transmission time of the m-level code rate slice of the nth block; when the user plays the video using the normal play rate, the length of consumption of the buffer is equal to the sum of the transmission time and the transcoding time when the requested video slice arrives.
3. The video code rate self-adapting method based on reinforcement learning for the edge-oriented cellular network according to claim 1, wherein the transcoding means: setting a transcoding task queue at a server, adopting a PCMC model to select a future k-block video code rate, and then caching and sequentially checking whether hit occurs, namely whether a high-code-rate version exists; when hit, the transcoding task queues are used for sequentially transcoding the video slices;
the time length of each transcoding task in the transcoding task queue is related to the number of CPU cores of the computing device, the frequency and the difference between the code rates before and after transcoding, and the time length of the transcoding task of the nth slice of the transcoding task queue meets the following conditions:wherein: the original code rate is q (0), the target code rate is q (m), and C m The number of CPU cycles required to process a single code rate difference level in a single core case; when the slice is transmittedWhen the input or client buffer memory is dormant, the transcoding tasks in the task queue can be executed in parallel so as to reduce delay caused by transcoding.
4. The video code rate self-adaption method based on reinforcement learning for the edge-oriented cellular network according to claim 1, wherein the MEC buffer is: the server side utilizes a cache resource to cache the most frequently accessed video slice and uses a lowest frequency elimination algorithm (LFU) to simulate a cached updating mechanism, and specifically comprises the following steps: preferentially caching video slices with high access frequency, and caching only the highest code rate version of different code rates of the same slice; when the cache reaches the upper limit, preferentially eliminating the slices with low frequency, preferentially eliminating the slices with earliest access time points under the condition of the same frequency, and when the code rate version of the requested video slice is lower than the version in the cache, obtaining a hit, otherwise, obtaining a miss;
the transcoding task queue is updated in the following specific way: the mark T is the consumable time length of the transcoding task queue, and when the nth slice reaches the client and the buffer length B of the client mec (n) is greater than the cache threshold B thresh When the T of the task queue is updated toOtherwise, updating T to T ts (n, m) sequentially fetching the j-th task of the transcoding task queue, i.e. B, according to the first-in first-out principle mec (n+j) and updating it to max (0, B) mec (n+j) -T), and updating T to T-B mec (n+j) looping through the steps until T is less than zero or j exceeds the queue length.
5. The video code rate self-adapting method based on reinforcement learning for edge cellular network according to claim 1, wherein said video data set is constructed by: collecting 10 movie videos with resolution of 4K and length of about 120 minutes as an original data source, and segmenting the movie videos into video slices of one every 4 seconds according to an HLS protocol format after H.264/AVC coding is used; randomly selecting video slices and adding noise to form a new pseudo video slice file, and generating 100 pseudo video sources with random duration of 5-100 minutes as a video data set.
6. The video code rate self-adapting method based on reinforcement learning for edge cellular network according to claim 1, wherein said wireless bandwidth trace data set is constructed by: obtaining tracks of different network bandwidths by controlling the number of RBs distributed in each radio frame, adding a random function to the distribution rule of the number of RBs, determining the average size of the network bandwidths by the average value, determining the fluctuation amplitude of the network bandwidths by the variance, and simulating different conditions of the network by changing the average value and the variance of the random function; generating 100 total network track data sets of 2000s as wireless bandwidth track data sets;
the bandwidth is simulated by using an OFDM modulation mode, and specifically comprises the following steps: setting subcarrier spacing as 15kHz and setting fading model of wireless channel model asWherein: />The coefficient alpha is a small-scale fading model;
the large-scale fading model uses a path loss model as follows:wherein: g A Is the gain coefficient of the antenna, d is the distance between the base station and the user, f c Is the frequency of the subcarrier, d e Is a constant coefficient;
7. The video code rate self-adaption method based on reinforcement learning for the edge cellular network according to claim 1, wherein the method that the imaginary part and the real part in the wireless bandwidth track data set are both standard normal distribution is used for simulating Rayleigh fading; the simulation environment randomly selects a record and a time starting point each time when loading the network track, so that the randomness of training is ensured, and the starting process is repeated when the simulation environment runs to the end point until the video streaming session is closed.
8. The video code rate self-adaption method based on reinforcement learning for the edge-oriented cellular network according to claim 1, wherein the PCMC model specifically comprises: the system comprises an environment coding module, a strategy generating module and a strategy evaluating module, wherein: the Environment coding module is used for coding Environment information according to Environment state information reflecting Environment (Environment) characteristics Z (n, m), d (n-1), C (n), l (n)) outputs a feature vector; the strategy generation module generates a decision model->Outputting code rate selection of future k video slices according to the feature vector generated by the environment coding module, namely action A n =(v n ,v n+1 ,...,v n+k ) Probability distribution of->When transmitting the slice of index n, the transcoding process of the slice code rate of n+1 is performed in parallel to reduce the delay caused by transcoding, v (n) is the code rate selected for the video slice of request index n, < >>For the length of the client Buffer at the time t, b (n) is the average network throughput of the video slices of the index n, Z (n, m) is the byte size of the video slices of the index n code rate m, d (n-1) is the playing katon time length of the client caused by the transmission of the video of the index n, C (n) is the highest code rate version of the video slices of the index n in the Buffer memory, and l (n) is the number of the rest video slices; the strategy evaluation module fits a state value (V) function, and outputs the V value to the strategy generation module according to the feature vector of the environment state information for gradient updating.
9. The video rate adaptation method based on reinforcement learning for an edge-oriented cellular network of claim 1, wherein PCMC model is used to maximize the expected return value J (pi θ ) For the target, adopting an A3C method to perform asynchronous training, namely deploying a plurality of sub-threads on the basis of an AC network architecture to perform simultaneous training, and synchronizing parameters to a main thread after the sub-threads are trained; in the training process, parameters of the strategy generating module and the strategy evaluating module are respectively updated in the following modes: wherein: status->Take action A n Difference from average->From state->And perform action A n Is a desired return value obtainable under policy pi>Belman's equation for V-function is +.>The optimization objective of the reinforcement learning model should be to maximize the observed +.>The desired return value->θ represents all parameter sets in the reinforcement learning model, confidence space vector +.>For being in state information->Under observe +.>Probability distribution of (2); the rewarding of the intelligent agent for obtaining the environment after the intelligent agent acts is +.>ω、μ、δ、/>The weight system for each sub-item is constant; q (0) is the highest code rate, so that normalization processing is convenient; while using discount rewards R that decay over time for model trade-off the importance of near-term rewards and long-term rewards τ To enable the policy model to take into account long-term return values: />The policy generation module and the environment coding module of the PCMC model continuously generate rewards until reaching a termination stateState, all sets of state information, actions and rewards of this process, i.e. trajectories τ, probability of occurrence P (τ); in a Partially Observable Markov Decision Process (POMDP), the return value is r' n I.e. r can be obtained under confidence space vector n Is a desired value of (2); will r' m Replacement discount rewards R τ Middle r n Obtaining a new discount report R' τ The method comprises the steps of carrying out a first treatment on the surface of the Since both the environmental state transitions and the policies are stochastic, the same policy model acts on the same environment as the initial state, possibly creating distinct trajectories, so the optimization objective of the reinforcement learning model should be to maximize the observation +.>A desired return; video streaming session total energy loss e=e c +E om +E tc Wherein: energy consumption E brought by MEC server-side when executing cache task c =w cm * Z (n, m), when the cache misses, the transmission delay T of the data of the request source server om =Z(n,m)/W om Transmission energy consumption E om =e om *Z(n,m)*T om The method comprises the steps of carrying out a first treatment on the surface of the When the code rate version exists in the cache and is higher than the request, the MEC executes the calculation energy consumption E of the transcoding task tc =ρ 0 *c tm *(q ext -q tar )*T tc (n,m);w cm Buffer energy consumption unit of MEC, w om For MEC to source server bandwidth, e om For the transmission energy consumption unit of MEC to the source server, ρ 0 Power consumption per cycle for CPU operation, c tm The number of cycles required to process each bit of transcoding task for the CPU.
10. A system for implementing the reinforcement learning-based video rate adaptation method for an edge-oriented cellular network of any one of claims 1-9, comprising: video source server side, customer end and the code rate selection module, buffer memory module and the transcoding module that are located the server side, wherein: the client side locally maintains a video slice cache, local cache state information is added when the video slice is requested to the server side, a PCMC model is operated by a code rate selection module according to the request information to output code rate selection of a future K block, the cache module sequentially checks whether a high code rate version exists in the video slice of the future K block, a transcoding module correspondingly adds the video slice of the high code rate version into a transcoding task queue, the video slice is converted into a corresponding version and then is transmitted to the client side, otherwise, the MEC server requests the high code rate version of the video slice to a video source server side for storing all video data, and the MEC server forwards and transmits the video slice to the client side;
the transmission is performed in synchronization with the transcoding.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211574628.3A CN116016987A (en) | 2022-12-08 | 2022-12-08 | Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211574628.3A CN116016987A (en) | 2022-12-08 | 2022-12-08 | Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116016987A true CN116016987A (en) | 2023-04-25 |
Family
ID=86028904
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211574628.3A Pending CN116016987A (en) | 2022-12-08 | 2022-12-08 | Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116016987A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116805923A (en) * | 2023-08-25 | 2023-09-26 | 淳安华数数字电视有限公司 | Broadband communication method based on edge calculation |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109525861A (en) * | 2018-12-05 | 2019-03-26 | 北京邮电大学 | A kind of method and device of video needed for determining user |
CN110913373A (en) * | 2019-09-17 | 2020-03-24 | 上海大学 | In-vehicle wireless communication platform based on joint time-frequency priority strategy and anti-interference method thereof |
US20200162535A1 (en) * | 2018-11-19 | 2020-05-21 | Zhan Ma | Methods and Apparatus for Learning Based Adaptive Real-time Streaming |
CN111431941A (en) * | 2020-05-13 | 2020-07-17 | 南京工业大学 | Real-time video code rate self-adaption method based on mobile edge calculation |
CN113114756A (en) * | 2021-04-08 | 2021-07-13 | 广西师范大学 | Video cache updating method for self-adaptive code rate selection in mobile edge calculation |
CN114501468A (en) * | 2022-02-22 | 2022-05-13 | 上海大学 | Method for allocating joint uplink and downlink slice resources in TDD network |
CN114640870A (en) * | 2022-03-21 | 2022-06-17 | 陕西师范大学 | QoE-driven wireless VR video self-adaptive transmission optimization method and system |
CN114867030A (en) * | 2022-06-09 | 2022-08-05 | 东南大学 | Double-time-scale intelligent wireless access network slicing method |
-
2022
- 2022-12-08 CN CN202211574628.3A patent/CN116016987A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200162535A1 (en) * | 2018-11-19 | 2020-05-21 | Zhan Ma | Methods and Apparatus for Learning Based Adaptive Real-time Streaming |
CN109525861A (en) * | 2018-12-05 | 2019-03-26 | 北京邮电大学 | A kind of method and device of video needed for determining user |
CN110913373A (en) * | 2019-09-17 | 2020-03-24 | 上海大学 | In-vehicle wireless communication platform based on joint time-frequency priority strategy and anti-interference method thereof |
CN111431941A (en) * | 2020-05-13 | 2020-07-17 | 南京工业大学 | Real-time video code rate self-adaption method based on mobile edge calculation |
CN113114756A (en) * | 2021-04-08 | 2021-07-13 | 广西师范大学 | Video cache updating method for self-adaptive code rate selection in mobile edge calculation |
CN114501468A (en) * | 2022-02-22 | 2022-05-13 | 上海大学 | Method for allocating joint uplink and downlink slice resources in TDD network |
CN114640870A (en) * | 2022-03-21 | 2022-06-17 | 陕西师范大学 | QoE-driven wireless VR video self-adaptive transmission optimization method and system |
CN114867030A (en) * | 2022-06-09 | 2022-08-05 | 东南大学 | Double-time-scale intelligent wireless access network slicing method |
Non-Patent Citations (3)
Title |
---|
J. LUO 等: "Adaptive Video Streaming With Edge Caching and Video Transcoding Over Software-Defined Mobile Networks: A Deep Reinforcement Learning Approach", IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, vol. 19, no. 03, 3 December 2019 (2019-12-03), pages 1577 - 1592, XP011777875, DOI: 10.1109/TWC.2019.2955129 * |
曹行健 等: "面向智慧交通的图像处理与边缘计算", 中国图象图形学报, vol. 27, no. 06, 16 June 2022 (2022-06-16), pages 1743 - 1767 * |
王英: "移动边缘视频自适应传输与缓存机制研究", 中国优秀硕士学位论文全文数据库信息科技辑, no. 2022, 15 April 2022 (2022-04-15) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116805923A (en) * | 2023-08-25 | 2023-09-26 | 淳安华数数字电视有限公司 | Broadband communication method based on edge calculation |
CN116805923B (en) * | 2023-08-25 | 2023-11-10 | 淳安华数数字电视有限公司 | Broadband communication method based on edge calculation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Pedersen et al. | Enhancing mobile video capacity and quality using rate adaptation, RAN caching and processing | |
Ahlehagh et al. | Video-aware scheduling and caching in the radio access network | |
CN112953922B (en) | Self-adaptive streaming media control method, system, computer equipment and application | |
Khan et al. | A survey on mobile edge computing for video streaming: Opportunities and challenges | |
Chen et al. | Artificial intelligence aided joint bit rate selection and radio resource allocation for adaptive video streaming over F-RANs | |
Guo et al. | Buffer-aware streaming in small-scale wireless networks: A deep reinforcement learning approach | |
Tan et al. | Radio network-aware edge caching for video delivery in MEC-enabled cellular networks | |
Chiang et al. | Collaborative social-aware and QoE-driven video caching and adaptation in edge network | |
Baccour et al. | Proactive video chunks caching and processing for latency and cost minimization in edge networks | |
Hong et al. | Continuous bitrate & latency control with deep reinforcement learning for live video streaming | |
Chua et al. | Resource allocation for mobile metaverse with the Internet of Vehicles over 6G wireless communications: A deep reinforcement learning approach | |
Mu et al. | AMIS: Edge computing based adaptive mobile video streaming | |
CN116016987A (en) | Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network | |
Zhao et al. | Popularity-based and version-aware caching scheme at edge servers for multi-version VoD systems | |
Tian et al. | Deeplive: QoE optimization for live video streaming through deep reinforcement learning | |
KR101966588B1 (en) | Method and apparatus for receiving video contents | |
Li et al. | User dynamics-aware edge caching and computing for mobile virtual reality | |
Cai et al. | Mec-based qoe optimization for adaptive video streaming via satellite backhaul | |
Kim et al. | eff-HAS: Achieve higher efficiency in data and energy usage on dynamic adaptive streaming | |
Chen et al. | Cooperative caching for scalable video coding using value-decomposed dimensional networks | |
Lin et al. | Knn-q learning algorithm of bitrate adaptation for video streaming over http | |
Zhang et al. | Cache-enabled adaptive bit rate streaming via deep self-transfer reinforcement learning | |
CN115720237A (en) | Caching and resource scheduling method for edge network self-adaptive bit rate video | |
Chou et al. | Pricing-based deep reinforcement learning for live video streaming with joint user association and resource management in mobile edge computing | |
Mu et al. | AMIS-MU: edge computing based adaptive video streaming for multiple mobile users |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |