CN116016987A

CN116016987A - Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network

Info

Publication number: CN116016987A
Application number: CN202211574628.3A
Authority: CN
Inventors: 孙彦赞; 陈文凯; 于军; 张舜卿; 陈小静; 王涛
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2023-04-25

Abstract

A video code rate self-adaption method based on reinforcement learning for an edge cellular network is characterized in that a server capable of multi-address edge computing (MEC) transcoding and a client with built-in cache are constructed to serve as a video stream session simulation environment, an ABR method (PCMC) model of parallel collaboration joint multi-video slice code rate transcoding and transmission is adopted, a video data set and a wireless bandwidth track data set are used for carrying out training based on asynchronous reinforcement learning (A3C) in the video stream session simulation environment, and the video code rate is self-adaption adjusted through the model in an online stage. According to the invention, under the wireless network scene of configuration MEC, the information on the RAN side, MEC computing resources and storage resources are fully utilized, and simultaneously, the transmission and transcoding parallel execution strategy is used, so that the extra computing delay and computing energy consumption caused by MEC are reduced, and the QoE index of the video stream of the client is improved.

Description

Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network

Technical Field

The invention relates to a technology in the field of video processing, in particular to a video code rate self-adaption method based on reinforcement learning for an edge cellular network.

Background

In order to meet the quality of experience (QoE) of mobile users and provide new high performance quality of service (QoS), multiple access edge computing (MEC), software Defined Mobile Network (SDMN) and cloud radio access network (C-RAN) technologies are introduced in next generation wireless networks, cloud computing capabilities are extended to neighboring Small Base Stations (SBSs) in wireless networks, especially Ultra Dense Networks (UDNs), with computing and storage resources on the Radio Access Network (RAN) side closer to the end users, and high quality of experience is achieved through a variety of Adaptive Bit Rate (ABR) algorithms. Because the buffer space of the MEC is limited, the popularity of the content of the video stream changes due to time and geographic position, and the pre-buffer hit rate of the video stream is low, so that the buffer update and replacement algorithm frequently refreshes the buffer content, and however, the frequent update of the buffer content can increase additional energy consumption.

Disclosure of Invention

Aiming at the defects of inaccurate estimation of network throughput, lack of consideration of RAN side information, insufficient utilization of MEC calculation and cache resources in the existing improved ABR technology, the invention provides a video code rate self-adaptive method based on reinforcement learning for an edge cellular network, and the method uses a strategy of parallel execution of transmission and transcoding while fully utilizing RAN side information, MEC calculation resources and storage resources in a wireless network scene of configuration MEC, thereby reducing extra calculation delay and calculation energy consumption caused by MEC and improving video stream QoE indexes of a client.

The invention is realized by the following technical scheme:

the invention relates to a video code rate self-adaption method based on reinforcement learning for an edge cellular network, which is characterized in that a server capable of multi-address edge computing (MEC) transcoding and a client with built-in cache are constructed to serve as a video stream session simulation environment, an ABR (program control unit) method (PCMC) model of parallel collaboration joint multi-video slice code rate transcoding and transmission is adopted, a video data set and a wireless bandwidth track data set are used for carrying out training based on asynchronous reinforcement learning (A3C) in the video stream session simulation environment, and the video code rate is self-adaption adjusted through the model in an online stage.

The PCMC model is provided with a network structure of multi-action output, and comprises the following components: the system comprises an environment coding module, a strategy generating module and a strategy evaluating module, wherein: the Environment coding module is used for coding the Environment according to the state information reflecting the Environment (Environment) characteristics

Generating feature vectors, the policy generation module outputs a set A of k actions in the future based on the feature vectors _n And the strategy evaluation module evaluates and evaluates the current strategy and feeds back the current strategy to the strategy generation module to evaluate and adjust the strategy model.

The invention relates to a system for realizing the method, which comprises the following steps: video source server side, customer end and the code rate selection module, buffer memory module and the transcoding module that are located the server side, wherein: the client side locally maintains a video slice cache, local cache state information is added when the video slice is requested to the server side, the code rate selection module operates the PCMC model according to the request information to output code rate selection of a future K block, the cache module sequentially checks whether the video slice of the future K block has a high code rate version or not, the transcoding module correspondingly adds the video slice of the high code rate version into a transcoding task queue, converts the video slice into a corresponding version and then transmits the corresponding version to the client side, and otherwise, the MEC server transmits the high code rate version to the client side after requesting the high code rate version of the video slice from the video source server side for storing all video data.

The transmission is preferably performed in synchronization with the transcoding.

Technical effects

Compared with the existing ABR algorithm, the invention fully considers the information of the RAN side through the PCMC model, can more accurately predict the bandwidth throughput in the wireless network environment, has a flexible multi-action output strategy, and can enable the transcoding and the transmission to be executed in parallel, thereby effectively reducing the calculation delay caused by MEC transcoding, improving the QoE of users and reducing the total energy consumption of the whole video streaming session.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a diagram of a PCMC model network architecture;

FIG. 3 is a graph comparing network convergence curves;

fig. 4 is a graph comparing QoE with each playing index.

Detailed Description

As shown in fig. 1, this embodiment relates to a video code rate adaptive method based on reinforcement learning for an edge cellular network, which includes the following steps:

step 1, constructing a multiple access edge computing (MEC) server with transcoding and caching capabilities and a client with built-in cache as a video streaming session simulation environment.

And the client is internally provided with a cache, when the requested video slice reaches the client, the length of the cache increase is the length of video content contained in the video slice, and the consumption rate of the cache is equal to the playing video rate of the client.

The length of the buffer memory satisfies the following conditions:

wherein: b (B) _ue (n) is the length of the buffer when the slice with index n reaches the client, t is the time when the video slice reaches the client, and L is the length of the video content contained in the slice; when B is _ue (n) a length greater than a threshold value (B) _thresh ) When the video is stopped and the video is dormant for an integer number of dormant periods (T _s ) Until the buffer memory meets the condition; t (T) _ts (n, m) is the transmission time of the m-level code rate slice of the nth block; when the user plays video using the normal play rate, then when the requested video sliceUpon arrival, the consumed length of the buffer is equal to the sum of the transmission time and the transcoding time.

Said threshold value (B _thresh ) Preferably, 60 seconds.

The transcoding means that: setting a transcoding task queue at the MEC server, and after selecting a future k-block video code rate by adopting a PCMC model, sequentially checking whether hit occurs or not by the MEC cache, namely whether a high-code-rate version exists or not; and when hit, the transcoding task queues are used for transcoding the video slices sequentially.

The time length of each transcoding task in the transcoding task queue is related to the number of CPU cores of the computing device, the frequency and the difference between the code rates before and after transcoding, and the time length of the transcoding task of the nth slice of the transcoding task queue meets the following conditions:

wherein: the original code rate is q (0), the target code rate is q (m), and Cm is the CPU period number required for processing the single code rate difference value level under the single-core condition; when slices are dormant in a transport or client cache, the transcoding tasks in the task queue may be performed in parallel to reduce delay due to transcoding. />

The MEC cache refers to: the server side utilizes a cache resource to cache the most frequently accessed video slice and uses a lowest frequency elimination algorithm (LFU) to simulate a cached updating mechanism, and specifically comprises the following steps: preferentially caching video slices with high access frequency, and caching only the highest code rate version of different code rates of the same slice; when the cache reaches the upper limit, the slices with low frequency are preferentially eliminated, the slices with earliest access time points are preferentially eliminated under the condition of the same frequency, when the code rate version of the requested video slice is lower than the version in the cache, the requested video slice is hit, and otherwise, the requested video slice is not hit.

The transcoding task queue is updated in the following specific way: the mark T is the consumable time length of the transcoding task queue, and when the nth slice reaches the client and the buffer length B of the client _mec (n) is greater than the cache threshold B _thresh When the T of the task queue is updated to

Otherwise, updating T to T _ts (n, m) sequentially fetching the j-th task of the transcoding task queue, i.e. B, according to the first-in first-out principle _mec (n+j) and updating it to max (0, B) _mec (n+j) -T), and updating T to T-B _mec (n+j) looping through the steps until T is less than zero or j exceeds the queue length.

And 2, constructing a video data set and a wireless bandwidth track data set.

The video data set is constructed by: a movie video with a resolution of 4K and a length of about 120 minutes is collected as an original data source, and after being encoded by h.264/AVC, the movie video is sliced into video slices of one every 4 seconds according to the HLS protocol format. Randomly selecting video slices and adding noise to form a new pseudo video slice file, and generating 100 pseudo video sources with random duration of 5-100 minutes as a video data set.

The wireless bandwidth track data set is constructed by the following modes: the method comprises the steps of obtaining tracks of different network bandwidths by controlling the number of RBs distributed in each radio frame, adding a random function to the distribution rule of the number of RBs, determining the average size of the network bandwidths by the average value, determining the fluctuation amplitude of the network bandwidths by the variance, and simulating different conditions of the network by changing the average value and the variance of the random function. A total of 100 network trace datasets of 2000s were generated as wireless bandwidth trace datasets.

The bandwidth is simulated by using an OFDM modulation mode, and specifically comprises the following steps: setting subcarrier spacing as 15kHz and setting fading model of wireless channel model as

Wherein: />

The coefficient alpha is a small-scale fading model. The large-scale fading model uses a path loss model as follows: />

Wherein: g _A The gain coefficient of the antenna, d is the distance between the base station and the user, and the factory _c Is the frequency of the subcarrier, d _e Is a constant coefficient.

The small-scale fading uses a Rayleigh fading model, and the probability density function is as follows:

wherein: r is a real number greater than or equal to 0, and δ is the standard deviation of the random process.

Preferably, the rayleigh fading is simulated in the wireless bandwidth trace data set by using a method that the imaginary part and the real part are both in standard normal distribution. The simulation environment randomly selects a record and a time starting point each time when loading the network track, so that the randomness of training is ensured, and the starting process is repeated when the simulation environment runs to the end point until the video streaming session is closed.

And 3, constructing an ABR (advanced binary coded carrier) method (PCMC) model for parallel collaborative joint multi-video slice code rate transcoding and transmission in the simulation environment of the step 1, and after loading the data set constructed in the step 2 by the simulation environment of the step 1, continuously performing interactive training on the PCMC model and the simulation environment in the step 1.

As shown in fig. 2, the PCMC model in this embodiment specifically includes: the system comprises an environment coding module, a strategy generating module and a strategy evaluating module, wherein: the Environment coding module is used for coding Environment information according to Environment state information reflecting Environment (Environment) characteristics

Outputting a feature vector; the strategy generation module generates a decision model->

Outputting code rate selection of future k video slices according to the feature vector generated by the environment coding module, namely action A _n ＝(v _n ，v _n+1 ，...，v _n+k ) Probability distribution of->

When transmitting slices of index n, transcoding processes of the slice code rates of n+1 to n+k are performed in parallel to reduce delay due to transcoding, v (n) is the code rate selected for the video slice of request index n,

for the length of the client Buffer at the time t, b (n) is the average network throughput of the video slices of the index n, Z (n, m) is the byte size of the video slices of the index n code rate m, d (n-1) is the playing katon time length of the client caused by the transmission of the video of the index n, C (n) is the highest code rate version of the video slices of the index n in the Buffer memory, and l (n) is the number of the rest video slices; the strategy evaluation module fits a state value (V) function, and outputs the V value to the strategy generation module according to the feature vector of the environment state information for gradient updating.

Preferably, an Action Mask (Action Mask) is further set in the policy generation module to filter actions that are unlikely to occur, for example, to filter actions with an Action index exceeding the total number of video slices.

Preferably, the built-in storage unit of the environment coding module stores the historic u node information.

The PCMC model is used for maximizing the expected return value J (pi _θ ) For the target, adopting an A3C method to perform asynchronous training, namely deploying a plurality of sub-threads on the basis of an AC network architecture to perform simultaneous training, and synchronizing parameters to a main thread after the sub-threads are trained; in the training process, parameters of the strategy generating module and the strategy evaluating module are respectively updated in the following modes:

wherein: status->

Take action A _n Difference from average->

From state->

And perform action A _n Is a desired return value obtainable under policy pi>

Belman's equation for V function is

The rewards of the environment obtained after the intelligent body makes action are

ω、μ、δ、/>

The weight system for each sub-item is constant. q (0) is the highest code rate, so that normalization processing is convenient; while using discount rewards R that decay over time for model trade-off the importance of near-term rewards and long-term rewards _τ To enable the policy model to take into account long-term return values:

the strategy generation module and the environment coding module of the PCMC model continuously generate rewards until reaching a termination state, and all sets of state information, actions and rewards of the process, namely a track tau, and the occurrence probability P (tau) of the track tau; the channel fading conditions follow the k-state markov model with confidence space vector +.>

For being in state information->

Under observe +.>

Probability distribution of (2); in a Partially Observable Markov Decision Process (POMDP), the return value is r' _n I.e. r can be obtained under confidence space vector _n Is a desired value of (2); will r' _n Replacement discount rewards R _τ Middle r _n Obtaining a new discount report R' _τ . Since both the environmental state transitions and the policies are stochastic, the same policy model acts on the same environment as the initial state, possibly creating distinct trajectories, so the optimization objective of the reinforcement learning model should be to maximize the observation +.>

The desired return value->

θ represents all parameter sets in the reinforcement learning model. Video streaming session total energy loss e=e _c +E _om +E _tc Wherein: energy consumption E brought by MEC server-side when executing cache task _c ＝w _cm * Z (n, m), when the cache misses, the transmission delay T of the data of the request source server _om ＝Z(n，m)/W _om Transmission energy consumption E _om ＝e _om *Z(n，m)*T _om The method comprises the steps of carrying out a first treatment on the surface of the When the code rate version exists in the cache and is higher than the request, the MEC executes the calculation energy consumption E of the transcoding task _tc ＝ρ ₀ *c _tm *(q _ext -q _tar )*T _tc (n，m)；w _cm Buffer energy consumption unit of MEC, w _om For MEC to source server bandwidth, e _om For the transmission energy consumption unit of MEC to the source server, ρ ₀ Power consumption per cycle for CPU operation, c _tm The number of cycles required to process each bit of transcoding task for the CPU.

The user QoE index comprises the following steps: average play quality of the client, smoothness of play code rate and play click time.

Through specific practical experiments, under the specific environment setting of table 1, 8 sub-threads are deployed for training, after all the sub-threads are trained for 500 epochs, the network converges, and an optimal model in the training process is recorded. Through testing, the average QoE of the optimal model on the test set can reach 289.28.

As shown in fig. 3, the method greatly reduces the time delay caused by the MEC transcoding, simultaneously gives consideration to energy efficiency and improves the QoE of users. The comparison method (Baseline 1) only considers the code rate selection of one video slice at a time in the future, and the method uses the same network architecture and performs training and testing on the same data set and test set. Because the network environment of each epoch has randomness and the average bandwidths are not consistent during training, the performance of the observation model on the same test set reflects the advantages and disadvantages of the method. As can be seen from the figure, the invention has more flexible code rate selection strategy, and the parallel execution of transmission and transcoding reduces delay, so that the optimal effect is better than that of a comparison method.

As shown in fig. 4, the MPC method is taken as a Baseline 2 method, and the three methods are respectively placed under different network scenes, and the average QoE, the average code rate, the average play-back time and the average code rate fluctuation of the three methods are counted. The average QoE of the method is higher than that of the other two methods, and the average code rate is slightly lower than that of the Baseline 2 method, but the method better considers the fluctuation of the video code rate and avoids the degradation of the playing experience quality caused by excessive switching of the video code rate.

Compared with the prior art, the method considers RAN side information more comprehensively, and dynamically selects the code rate of a plurality of video slices in the future by utilizing the information of the RAN side and the client side under the wireless communication environment aiming at configuration MEC through a more flexible code rate selection strategy model (PCMC). For video slices existing in the edge buffer, code rate transcoding is usually needed to be carried out and then the video slices are transmitted to the client, and because the model has flexible multi-video slice code rate selection characteristics, the calculation delay caused by MEC is obviously reduced by executing the transmission and transcoding tasks of the video slices in parallel. Meanwhile, the invention comprehensively considers the energy consumption factors of calculation and transmission, improves the QoE of the user and simultaneously reduces the energy consumption of the video stream session as much as possible.

The foregoing embodiments may be partially modified in numerous ways by those skilled in the art without departing from the principles and spirit of the invention, the scope of which is defined in the claims and not by the foregoing embodiments, and all such implementations are within the scope of the invention.

Claims

1. The video code rate self-adaption method based on reinforcement learning for the edge cellular network is characterized in that a server capable of multi-address edge computing (MEC) transcoding and a client with built-in cache are constructed to serve as a video streaming session simulation environment, an ABR (program counter) method (PCMC) model of parallel collaboration joint multi-video slice code rate transcoding and transmission is adopted, a video data set and a wireless bandwidth track data set are used for carrying out training based on asynchronous reinforcement learning (A3C) in the video streaming session simulation environment, and the video code rate is self-adaption adjusted through the model in an online stage;

2. The video code rate self-adaptive method based on reinforcement learning of the edge-oriented cellular network according to claim 1, wherein the client is internally provided with a buffer, when the requested video slice reaches the client, the length of the buffer increase is the length of video content contained in the video slice, and the consumption rate of the buffer is equal to the playing video rate of the client;

the length of the buffer memory satisfies the following conditions:

wherein: b (B) _ue (n) is the length of the buffer when the slice with index n reaches the client, t is the time when the video slice reaches the client, and L is the length of the video content contained in the slice; when B is _ue (n) a length greater than a threshold value (B) _thresh ) When the video is stopped and the video is dormant for an integer number of dormant periods (T _s ) Until the buffer memory meets the condition; t (T) _ts (n, m) is the transmission time of the m-level code rate slice of the nth block; when the user plays the video using the normal play rate, the length of consumption of the buffer is equal to the sum of the transmission time and the transcoding time when the requested video slice arrives.

3. The video code rate self-adapting method based on reinforcement learning for the edge-oriented cellular network according to claim 1, wherein the transcoding means: setting a transcoding task queue at a server, adopting a PCMC model to select a future k-block video code rate, and then caching and sequentially checking whether hit occurs, namely whether a high-code-rate version exists; when hit, the transcoding task queues are used for sequentially transcoding the video slices;

wherein: the original code rate is q (0), the target code rate is q (m), and C _m The number of CPU cycles required to process a single code rate difference level in a single core case; when the slice is transmittedWhen the input or client buffer memory is dormant, the transcoding tasks in the task queue can be executed in parallel so as to reduce delay caused by transcoding.

4. The video code rate self-adaption method based on reinforcement learning for the edge-oriented cellular network according to claim 1, wherein the MEC buffer is: the server side utilizes a cache resource to cache the most frequently accessed video slice and uses a lowest frequency elimination algorithm (LFU) to simulate a cached updating mechanism, and specifically comprises the following steps: preferentially caching video slices with high access frequency, and caching only the highest code rate version of different code rates of the same slice; when the cache reaches the upper limit, preferentially eliminating the slices with low frequency, preferentially eliminating the slices with earliest access time points under the condition of the same frequency, and when the code rate version of the requested video slice is lower than the version in the cache, obtaining a hit, otherwise, obtaining a miss;

5. The video code rate self-adapting method based on reinforcement learning for edge cellular network according to claim 1, wherein said video data set is constructed by: collecting 10 movie videos with resolution of 4K and length of about 120 minutes as an original data source, and segmenting the movie videos into video slices of one every 4 seconds according to an HLS protocol format after H.264/AVC coding is used; randomly selecting video slices and adding noise to form a new pseudo video slice file, and generating 100 pseudo video sources with random duration of 5-100 minutes as a video data set.

6. The video code rate self-adapting method based on reinforcement learning for edge cellular network according to claim 1, wherein said wireless bandwidth trace data set is constructed by: obtaining tracks of different network bandwidths by controlling the number of RBs distributed in each radio frame, adding a random function to the distribution rule of the number of RBs, determining the average size of the network bandwidths by the average value, determining the fluctuation amplitude of the network bandwidths by the variance, and simulating different conditions of the network by changing the average value and the variance of the random function; generating 100 total network track data sets of 2000s as wireless bandwidth track data sets;

Wherein: />

The coefficient alpha is a small-scale fading model;

the large-scale fading model uses a path loss model as follows:

wherein: g _A Is the gain coefficient of the antenna, d is the distance between the base station and the user, f _c Is the frequency of the subcarrier, d _e Is a constant coefficient;

7. The video code rate self-adaption method based on reinforcement learning for the edge cellular network according to claim 1, wherein the method that the imaginary part and the real part in the wireless bandwidth track data set are both standard normal distribution is used for simulating Rayleigh fading; the simulation environment randomly selects a record and a time starting point each time when loading the network track, so that the randomness of training is ensured, and the starting process is repeated when the simulation environment runs to the end point until the video streaming session is closed.

8. The video code rate self-adaption method based on reinforcement learning for the edge-oriented cellular network according to claim 1, wherein the PCMC model specifically comprises: the system comprises an environment coding module, a strategy generating module and a strategy evaluating module, wherein: the Environment coding module is used for coding Environment information according to Environment state information reflecting Environment (Environment) characteristics

Z (n, m), d (n-1), C (n), l (n)) outputs a feature vector; the strategy generation module generates a decision model->

When transmitting the slice of index n, the transcoding process of the slice code rate of n+1 is performed in parallel to reduce the delay caused by transcoding, v (n) is the code rate selected for the video slice of request index n, < >>

9. The video rate adaptation method based on reinforcement learning for an edge-oriented cellular network of claim 1, wherein PCMC model is used to maximize the expected return value J (pi _θ ) For the target, adopting an A3C method to perform asynchronous training, namely deploying a plurality of sub-threads on the basis of an AC network architecture to perform simultaneous training, and synchronizing parameters to a main thread after the sub-threads are trained; in the training process, parameters of the strategy generating module and the strategy evaluating module are respectively updated in the following modes:

wherein: status->

Take action A _n Difference from average->

From state->

And perform action A _n Is a desired return value obtainable under policy pi>

Belman's equation for V-function is +.>

The optimization objective of the reinforcement learning model should be to maximize the observed +.>

The desired return value->

θ represents all parameter sets in the reinforcement learning model, confidence space vector +.>

For being in state information->

Under observe +.>

Probability distribution of (2); the rewarding of the intelligent agent for obtaining the environment after the intelligent agent acts is +.>

ω、μ、δ、/>

The weight system for each sub-item is constant; q (0) is the highest code rate, so that normalization processing is convenient; while using discount rewards R that decay over time for model trade-off the importance of near-term rewards and long-term rewards _τ To enable the policy model to take into account long-term return values: />

The policy generation module and the environment coding module of the PCMC model continuously generate rewards until reaching a termination stateState, all sets of state information, actions and rewards of this process, i.e. trajectories τ, probability of occurrence P (τ); in a Partially Observable Markov Decision Process (POMDP), the return value is r' _n I.e. r can be obtained under confidence space vector _n Is a desired value of (2); will r' _m Replacement discount rewards R _τ Middle r _n Obtaining a new discount report R' _τ The method comprises the steps of carrying out a first treatment on the surface of the Since both the environmental state transitions and the policies are stochastic, the same policy model acts on the same environment as the initial state, possibly creating distinct trajectories, so the optimization objective of the reinforcement learning model should be to maximize the observation +.>

A desired return; video streaming session total energy loss e=e _c +E _om +E _tc Wherein: energy consumption E brought by MEC server-side when executing cache task _c ＝w _cm * Z (n, m), when the cache misses, the transmission delay T of the data of the request source server _om ＝Z(n，m)/W _om Transmission energy consumption E _om ＝e _om *Z(n，m)*T _om The method comprises the steps of carrying out a first treatment on the surface of the When the code rate version exists in the cache and is higher than the request, the MEC executes the calculation energy consumption E of the transcoding task _tc ＝ρ ₀ *c _tm *(q _ext -q _tar )*T _tc (n，m)；w _cm Buffer energy consumption unit of MEC, w _om For MEC to source server bandwidth, e _om For the transmission energy consumption unit of MEC to the source server, ρ ₀ Power consumption per cycle for CPU operation, c _tm The number of cycles required to process each bit of transcoding task for the CPU.

10. A system for implementing the reinforcement learning-based video rate adaptation method for an edge-oriented cellular network of any one of claims 1-9, comprising: video source server side, customer end and the code rate selection module, buffer memory module and the transcoding module that are located the server side, wherein: the client side locally maintains a video slice cache, local cache state information is added when the video slice is requested to the server side, a PCMC model is operated by a code rate selection module according to the request information to output code rate selection of a future K block, the cache module sequentially checks whether a high code rate version exists in the video slice of the future K block, a transcoding module correspondingly adds the video slice of the high code rate version into a transcoding task queue, the video slice is converted into a corresponding version and then is transmitted to the client side, otherwise, the MEC server requests the high code rate version of the video slice to a video source server side for storing all video data, and the MEC server forwards and transmits the video slice to the client side;

the transmission is performed in synchronization with the transcoding.