CN116389795A

CN116389795A - Video transmission method and system based on scalable video super-resolution model

Info

Publication number: CN116389795A
Application number: CN202310348232.5A
Authority: CN
Inventors: 雒江涛; 蒋析橙; 冉泳屹
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2023-04-03
Filing date: 2023-04-03
Publication date: 2023-07-04

Abstract

The invention discloses a video transmission method and a system based on a scalable video super-resolution model, wherein the method comprises the following steps: acquiring network state information, and determining the model size and reconstruction resolution of a scalable video super-resolution model according to the network state information; after receiving the request signal, feeding back a corresponding low-resolution video block according to the request signal; reconstructing the low-resolution video block by using the scalable video super-resolution model with the determined model size and the reconstructed resolution to obtain a high-resolution video block; and after the high-resolution video blocks are transmitted to a playing buffer area, playing the high-resolution video blocks.

Description

Video transmission method and system based on scalable video super-resolution model

Technical Field

The invention relates to video super-resolution transmission, in particular to a video transmission method and system based on a scalable video super-resolution model.

Background

Providing high-definition/ultra-high-definition video services over a backhaul or core network with poor bandwidth conditions or large fluctuations is critical, but challenging. Traditional dynamic adaptive streaming over HTTP (DASH) is highly dependent on network conditions, providing poor video quality when network conditions are poor;

in order to improve video quality, the prior art adopts the following methods:

1. reconstructing a low-resolution video into a high-resolution video near a terminal by utilizing a video super-resolution (VSR), breaking network dependence, wherein the VSR requires strong processing capacity, and most of the existing VSR models cannot realize linear speed VSR processing on mobile equipment;

2. reconstructing the low-resolution video at the edge node, and saving the bandwidth from the server to the edge node; the time complexity of the VSR model and the available computing power of the edge node should form a dynamic matching relationship, but the computing power of the edge node is time-varying because the edge node is shared by different application programs; meanwhile, the bit rate of the high-resolution video obtained by reconstructing the low-resolution video is matched with the dynamic access network condition, otherwise, extra processing delay or transmission delay is introduced;

3. DASH-based video transport frameworks that use VSR on the client to improve video quality, but require the client to have powerful computing power, which is not practical for mobile devices;

to break the impact of network bandwidth on video quality, the prior art proposes an edge-assisted adaptive video streaming solution that enhances the buffered low quality video blocks downloaded by edge VSR from remote servers, although this approach can also provide high quality video in the case of scarce backhaul network resources, it does not take into account the dynamic state of available computing power on the edge nodes, and when the available computing power cannot meet the VSR requirements, additional super-resolution processing delays will be introduced, which will lead to more refusals, reducing the QoE of the user.

Disclosure of Invention

The invention aims to provide a video transmission method and a system based on a scalable video super-resolution model, which solve the problem of poor video quality.

The invention is realized by the following technical scheme:

a first aspect provides a video transmission method based on a scalable video super-resolution model, comprising the steps of:

acquiring network state information, and determining the model size and reconstruction resolution of a scalable video super-resolution model according to the network state information and available computing resources on edge nodes;

after receiving the request signal, the server feeds back a corresponding low-resolution video block according to the request signal;

the server transmits the corresponding low-resolution video block to the edge node;

reconstructing the low-resolution video block on the edge node by using the scalable video super-resolution model with the determined model size and the reconstructed resolution to obtain a high-resolution video block;

and after the high-resolution video block is transmitted to a playing buffer area, playing the high-resolution video block.

Because the performance of each computing device is different, the real-time computing capacity of the edge node is different, for the decision of reconstructing each low-resolution video block, the model size and the reconstruction resolution of a scalable video super-resolution model are determined according to network state information, then the determined scalable video super-resolution model is utilized to reconstruct the low-resolution video block, the low-resolution video block is reconstructed into a high-resolution video block on the edge node, high-quality video can be provided for a user even if network resources are deficient, and meanwhile, the time-varying computing capacity of the edge node is considered, the model size of the scalable video super-resolution model is designed to be scalable, the matching degree of the reconstructed high-resolution video block and a dynamic access network is improved, the processing delay and the transmission delay are reduced, and the video quality is improved.

Further, the specific steps of determining the model size and reconstruction resolution of the scalable video super-resolution model according to the network state information are as follows:

a group of neural network models are pre-configured, and the neural network models matched with the group of neural network models are selected from the group of neural network models according to the number of residual convolution blocks;

and determining an output path exiting the scalable video super-resolution model according to the output characteristics of the residual convolution blocks with different numbers.

Different numbers of residual convolution blocks can obtain different model levels, the number of the residual convolution blocks is higher, the model level is higher, the reconstructed video quality is better, and the corresponding reconstruction time is longer; pre-configuring a set of neural network models for each possible reconstructed low resolution, thereby enabling selection of the neural network model at the edge node that best matches the computing device;

in order to further adapt to the time-varying computing capability of the edge node, the output paths of the scalable video super-resolution model are required to be dynamically determined through the output features of the residual convolution blocks with different numbers, the more the residual convolution blocks are added into the output paths, the better the video quality after reconstruction is, even if the number of the residual convolution blocks in the output paths is small, compared with the video block with low resolution, the user experience quality is still greatly improved.

Further, the method uses the scalable video super-resolution model after determining the model size and reconstructing the resolution to reconstruct the low-resolution video block, and specifically comprises the following steps:

caching the low-resolution video blocks into a low-resolution video block buffer area, and sequentially reconstructing the low-resolution video blocks in the low-resolution video block buffer area according to a first-in first-out principle to obtain high-resolution video blocks;

caching the high-resolution video block into a high-resolution video block buffer area; for the ith low resolution video block in the download buffer, the DRL agent should accordingly determine the video reconstruction resolution level (r _i ,r _i ′)，r _i ,r _i ′∈R；

Is provided with

Representing all reconstruction pairs; is provided with->

Represents the maximum length of the super-resolution buffer area, let +.>

Representing a start time for reconstructing an i-th low resolution video block;

calculating a reconstruction start time of the ith low resolution video block according to formula (1);

for the ith low resolution video block in the low resolution video block buffer, the depth reinforcement learning (Deep Reinforcement Learning, DRL) agent should determine the video reconstruction resolution level (r _i ,r′ _i )，r _i ,r′ _i ∈R；

Is provided with

Representing all reconstruction pairs; is provided with->

Represents the maximum length of the super-resolution buffer area, let +.>

Representing a reconstruction start time for reconstructing an i-th low resolution video block;

wherein, the liquid crystal display device comprises a liquid crystal display device,

representing a reconstruction start time of an i-th low resolution video block; />

Representing a reconstruction start time of an (i-1) th low resolution video block; phi (phi) _i-1 (r _i-1 ,r' _i-1 ) Representing reconstruction of the (i-1) th low resolution video block into high resolution videoReconstruction processing time of the block; />

Indicate->

The start time of buffering of a high resolution video block into the play buffer, D represents all reconstruction pairs,/v>

Representing a maximum length of a high resolution video block buffer;

the reconstruction processing time of the i-th low resolution video block is calculated according to formula (2),

wherein phi is _i (r _i ,r' _i ) Representing a reconstruction processing time of an ith low resolution video block; u (u) _i Representing the computational power available at the edge node to reconstruct the i-th low resolution video block;

indicating the baseline reconstruction time.

Further, the buffering start time of the high-resolution video block to the play buffer is: the reconstruction of the ith low-resolution video block is completed, and after the (i-1) th high-resolution video block is transmitted to the playing buffer zone, the ith high-resolution video block starts to be transmitted to the playing buffer zone;

calculating the buffer start time of the i-th high resolution video block buffer to the play buffer according to formula (3),

representing the buffer start time from the i-th high-resolution video block buffer to the play buffer; />

Representing a buffering start time of the (i-1) th high resolution video block to the play buffer; d, d _i-1 Representing the download time of (i-1) high resolution video blocks; />

Representing a reconstruction start time of an i-th low resolution video block; phi (phi) _i (r _i ,r' _i ) Representing a reconstruction processing time for reconstructing an i-th low resolution video block into a high resolution video block; />

Indicate->

Play start time of a high resolution video block, < >>

Representing the maximum length of the play-out buffer.

Further, the playing time of the high-resolution video block is as follows: the playing of the (i-1) th high-resolution video block is completed, and the i-th high-resolution video block is cached in a playing buffer area;

the play start time of the ith high resolution video block is calculated according to equation (4),

wherein t is _i Representing a play start time of an i-th high resolution video block; t is t _i-1 Representing a play start time of the (i-1) th high resolution video block; t represents the length of time of a high resolution video block or a low resolution video block;

representing the buffer start time from the i-th high-resolution video block buffer to the play buffer; d, d _i Representing the download time of the i-th high resolution video block.

Further, when the high-resolution video block is played, if the high-resolution video block is exhausted in the playing buffer area, the high-resolution video block needs to be cached again;

if the high resolution video block needs to be re-cached, calculating a time to re-cache the high resolution video block according to formula (5),

τ _i ＝t _i -(t _i-1 +T) equation (5)

Wherein τ _i Representing a time to re-cache the ith high resolution video block; t is t _i Representing a play start time of an i-th high resolution video block; t is t _i-1 Representing a play start time of the i-1 th high resolution video block; t denotes the length of time of the high resolution video block or the low resolution video block.

A second aspect provides a video transmission system of a scalable video super-resolution model, where the video transmission system is configured to implement the video transmission method based on the scalable video super-resolution model, and the video transmission system includes:

a client for transmitting a request signal and receiving a high resolution video block;

a server for receiving the request signal and transmitting the low resolution video block;

the edge node is provided with a node-to-node interface,

the edge node is in communication connection with the client, and is used for receiving a request signal sent by the client and sending the request signal to the server;

the edge node is in communication connection with the server and is used for receiving the low-resolution video block sent by the server, reconstructing the low-resolution video block to obtain a high-resolution video block, and sending the high-resolution video block to the client.

The edge node can collect network state information, has the capability of supporting scalable video super-resolution processing, can adaptively adjust the model size and reconstruction resolution of a scalable video super-resolution model, integrates edge calculation and video super-resolution, and improves video quality.

Further, the server is configured to store a low resolution video, divide the low resolution video into N low resolution video blocks having the same time length T, and obtain a low resolution video block group composed of N low resolution video blocks.

Further, the edge node includes a monitor, a DRL agent, and a scalable video super resolution (Scalable Video Super Resolution, SVSR) processor;

the monitor is in communication connection with the client and the DRL agent and the server, and is used for monitoring network states of the server, the edge node and the client;

the DRL agent is used for receiving network state information, and determining the model size and reconstruction resolution of a scalable video super-resolution model according to the network state information;

the SVSR processor is in communication connection with the DRL agent, and is used for reconstructing the low-resolution video block by utilizing the scalable video super-resolution model after determining the model size and reconstructing the resolution to obtain a high-resolution video block. .

Further, the SVSR processor includes a low resolution video block buffer, a high resolution video block buffer, and a processing region;

the low-resolution video block buffer is used for buffering the low-resolution video blocks sent to the SVSR processor by the server;

reconstructing the low-resolution video blocks in the low-resolution video block buffer area in sequence in the processing area according to a first-in first-out principle to obtain high-resolution video blocks;

and caching the high-resolution video block into the high-resolution video block buffer area.

Compared with the prior art, the invention has the following advantages and beneficial effects:

because the performance of each computing device is different, the real-time computing capacity of the edge node is different, for the decision of reconstructing each low-resolution video block, the model size and the reconstruction resolution of a scalable video super-resolution model are determined according to network state information, then the determined scalable video super-resolution model is utilized to reconstruct the low-resolution video block, the low-resolution video block is reconstructed into a high-resolution video block on the edge node, high-quality video can be provided for a user even if network resources are deficient, and meanwhile, the time-varying computing capacity of the edge node is considered, the model size of the scalable video super-resolution model is designed to be scalable, the matching degree of the reconstructed high-resolution video block and a dynamic access network is improved, the processing delay and the transmission delay are reduced, and the video quality is improved. The scheme also establishes joint optimization of ultra-high definition video transmission integrating edge calculation and video super-resolution, and the performance of the system is mainly embodied in the super-resolution process of the edge node and the transmission process of the reconstructed video from the edge node to the client, so that in order to ensure smooth video streaming service, a super-resolution buffer zone and a playing buffer zone should meet a series of conditions so as to avoid video interruption. On the premise that the fluency of video streaming service is guaranteed, an optimization model capable of improving video transmission quality, reducing rebuffering time and reducing quality jitter is established, and decision variables are the reconstructed video resolution level and the configuration of an SVSR model. And solving the joint optimization problem by adopting a reinforcement learning method. The intelligent edge nodes supporting reinforcement learning are trained until the algorithm converges. And (3) deciding the configuration of the resolution level of the reconstructed video and the SVSR model by using a trained reinforcement learning algorithm, improving the QoE of the user, and saving the network bandwidth.

Drawings

In order to more clearly illustrate the technical solutions of the exemplary embodiments of the present invention, the drawings that are needed in the examples will be briefly described below, it being understood that the following drawings only illustrate some examples of the present invention and therefore should not be considered as limiting the scope, and that other related drawings may be obtained from these drawings without inventive effort for a person skilled in the art. In the drawings:

FIG. 1 is a flow chart provided in example 1;

fig. 2 is a scalable video super-resolution model structure provided in embodiment 1;

fig. 3 is a block diagram of a system provided in embodiment 2.

In the drawings, the reference numerals and corresponding part names:

100-client, 110-play buffer, 200-edge node, 210-monitor, 220-DRL agent, 230-SVSR processor, 231-low resolution video block buffer, 232-processing region, 233-high resolution video block buffer, 300-server.

Detailed Description

For the purpose of making apparent the objects, technical solutions and advantages of the present invention, the present invention will be further described in detail with reference to the following examples and the accompanying drawings, wherein the exemplary embodiments of the present invention and the descriptions thereof are for illustrating the present invention only and are not to be construed as limiting the present invention.

Example 1

As shown in fig. 1, this embodiment 1 provides a video transmission method based on a scalable video super-resolution model, including the steps of:

s1, acquiring network state information, and determining the model size and reconstruction resolution of a scalable video super-resolution model according to the network state information;

s2, after receiving the request signal, feeding back a corresponding low-resolution video block according to the request signal;

s3, reconstructing the low-resolution video block by using the scalable video super-resolution model with the determined model size and the reconstructed resolution to obtain a high-resolution video block;

and S4, after the high-resolution video block is transmitted to a playing buffer area, playing the high-resolution video block.

In a specific embodiment, as shown in fig. 2, the specific steps of determining the model size and the reconstruction resolution of the scalable video super-resolution model according to the above network status information are as follows:

to further adapt to the time-varying computing power of the edge nodes, multiple output paths need to be generated by each video super-resolution neural network model, i.e. direct connection from the intermediate block to the final layer, that is, establishment of a early-back mechanism, can be added in the scalable super-resolution model. The more the number of residual convolution blocks added in the output path, the better the video quality after SVSR reconstruction, but the longer the corresponding reconstruction time. Even though the number of residual convolution blocks in the output path is small, the quality of user experience is still greatly improved compared to directly viewing low resolution video.

In a specific embodiment, the SR process starts at a time. At the intelligent edge node, the low-resolution video blocks downloaded from the server side are firstly cached in a downloading buffer area, then the SVSR processor rebuilds the low-resolution video blocks in the downloading buffer area according to a first-in first-out (FIFO) principle, and outputs the high-resolution video blocks into a super-resolution buffer area. For the ith low resolution video block in the download buffer, the DRL agent should accordingly determine the video reconstruction resolution level (r _i ,r _i ′)，r _i ,r _i ′∈R。

Is provided with

Representing all reconstruction pairs. Is provided with->

Represents the maximum length of the super-resolution buffer area, let +.>

Representing the start time of reconstructing the i-th low resolution video block.

The start time for reconstructing the i-th low resolution video block may be defined as:

Representing a reconstruction start time of an i-1 th low resolution video block; phi (phi) _i-1 (r _i-1 ,r' _i-1 ) Representing the ith-1 st low resolutionReconstructing the video block into a reconstruction processing time of the high-resolution video block; />

Indicate->

Representing a maximum length of a high resolution video block buffer;

the computing power available at the edge node to reconstruct the delta low resolution video block is calculated according to equation (6),

wherein u is _δ Representing the computational power available at the edge node to reconstruct the delta low resolution video block; phi (phi) _δ (r _δ ,r' _δ ) Representing the reconstruction processing time of the delta low resolution video block;

representing a baseline reconstruction time for estimating a time required for reconstructing a low resolution video block of time length T into a high resolution video block;

by calculating u _i-1 ,u _i-2 ,…,u _i-k To estimate the computational power u available for reconstructing the ith low resolution video block _i 。

wherein phi is _i (r _i ,r' _i ) Represent the firstReconstruction processing time of i low resolution video blocks; u (u) _i Representing the computational power available at the edge node 200 to reconstruct the i-th low resolution video block;

indicating the baseline reconstruction time.

In a specific embodiment, the high-resolution video block is buffered to a buffering start time of the play buffer: the reconstruction of the ith low-resolution video block is completed, and after the ith-1 th high-resolution video block is transmitted to the playing buffer zone, the ith high-resolution video block starts to be transmitted to the playing buffer zone;

Representing the buffer start time from the i-1 th high resolution video block buffer to the play buffer; d, d _i-1 Representing the download time of (i-1) high resolution video blocks; />

Indicate->

Play start time of a high resolution video block, < >>

Representing the maximum length of the play-out buffer.

In a specific embodiment, the playing time of the high-resolution video block is: the i-1 th high-resolution video block is played, and the i-1 th high-resolution video block is cached to a playing buffer area;

wherein t is _i Representing a play start time of an i-th high resolution video block; t is t _i-1 Representing a play start time of the i-1 th high resolution video block; t represents the length of time of a high resolution video block or a low resolution video block;

In a specific embodiment, when playing the high-resolution video block, if the high-resolution video block is exhausted in the play buffer 110, the high-resolution video block needs to be cached again;

τ _i ＝t _i -(t _i-1 +T) equation (5)

Wherein τ _i Representing a time to re-cache the ith high resolution video block; t is t _i Representing a play start time of an i-th high resolution video block; t is t _i-1 Representing the i-1 th high resolutionPlay start time of the rate video block; t denotes the length of time of the high resolution video block or the low resolution video block.

In a specific embodiment, the states of the low resolution video block buffer and the high resolution video block buffer, and the play-out buffer at time t are as shown in the following formulas (7) and (8):

wherein B is _S (t) represents the state of the low resolution video block buffer and the high resolution video block buffer at time t; b (B) _p (t) represents the state of the play buffer at time t;

representing a reconstruction start time of a kth low resolution video block; phi (phi) _k (r _k ,r' _k ) Representing a reconstruction processing time for reconstructing a kth low resolution video block into a high resolution video block; />

Representing the start time of buffering the kth high resolution video block into the play buffer.

In a specific embodiment, after the high-resolution video block buffer and the play buffer meet the conditions, the smoothness of the video streaming service is ensured, so that video interruption is avoided. On the premise that the fluency of the video streaming service is guaranteed, in order to adapt to the dynamic network condition and the time-varying computing capability of the edge node, the DRL agent needs to make decisions on the reconstructed video resolution level and the configuration of the SVSR model (including the model level and the output path), so that the optimization target is to maximize the quality of experience (QoE) of the video viewer, and since the QoE is respectively affected by three aspects of video quality, rebuffering time and quality jitter between video blocks, the QoE can be defined by adopting the three quantifiable indexes, and the specific steps are as follows:

(1) Setting an objective function; an optimized model is built that can improve user video quality, reduce rebuffering time (rebuffering process will occur when there are no unplayed video blocks in the play buffer, resulting in rebuffering time) and reduce quality jitter (video quality jitter is a key indicator of QoE, as it may cause physiological symptoms such as dizziness and headache for video viewers), and decision variables are video reconstruction resolution level and SVSR model configuration. Specifically, the objective function includes the following three parts:

Q ₁ : video quality can be defined as:

wherein B is _i ' representing the bit rate of the i-th high resolution video block after reconstruction, a Dynamic Adaptive Streaming (DASH) system will correspond to the resolution to the corresponding bit rate, F (B _i ') indicates the bit rate B _i Mapping to perceived quality of user, N _c Representing the total number of video blocks received by a client during the operation of the whole system;

Q ₂ : the rebuffering time may be defined as:

wherein τ _i Representing a time to re-cache the ith high resolution video block; n (N) _r Representing the total rebuffering times when the whole system is running;

Q ₃ : the quality jitter between two adjacent video blocks can be defined as:

wherein B' _i Representing the bit rate of the i-th high resolution video block after reconstruction; b'. _i-1 Representing the bit rate of the i-1 th high resolution video block after reconstruction; n (N) _c Representing the total number of video blocks received by the client during the entire system operation.

When the video has rebuffering and jitter, a penalty value is obtained, and the longer the delay is, the larger the penalty value is. Quality of experience (QoE) for a single user can be expressed as:

QoE＝Q ₁ -μQ ₂ -λQ ₃ (equation 12)

Wherein μ and λ are weight factors.

To further prevent the terminal from generating playback interruption, an underflow probability constraint may be imposed on the buffer length of the playback buffer, i.e

representing the buffer length of the play-out buffer at the beginning of the reconstruction of the ith video block, B _bound For the lowest threshold of the buffer length of the play buffer, when the buffer length of the play buffer is lower than B _bound When there is a possibility of video interruption, ε ₂ Is the probability of violation of the constraint.

In summary, the optimization problem can be expressed as:

max E{QoE}＝E{Q ₁ -μQ ₂ -λQ ₃ }

s.t.P(μ _δ ＜μ′ _δ )≤ε ₁

wherein Q is ₁ Is video quality; q (Q) ₂ Heavy buffering time; q (Q) ₃ Quality jitter caused by resolution switching of adjacent video blocks; μ and λ are weight factors; mu' _δ Is an edgeThe nodes require computational power for reconstructing the video block; epsilon ₁ A probability constraint threshold for a constraint; qoE is the quality of experience for a single user; b (B) _bound A lowest threshold value of video playing time length for the playing buffer queue; epsilon ₂ Is the probability of violation of the constraint.

(2) Solving an optimization problem by adopting a reinforcement learning method, training an intelligent edge node supporting reinforcement learning, guiding an algorithm to converge, and deciding the configuration of a reconstructed video resolution level and an SVSR model by utilizing the trained reinforcement learning algorithm, wherein the concrete steps are as follows:

(1) the high resolution video block playing process is a markov decision problem, while reinforcement learning does not require any data to be given in advance, but rather obtains learning information and updates model parameters by receiving rewards (feedback) of actions from the environment. The optimization problem can be solved using reinforcement learning.

(2) State space. The state of SVSR at the beginning of reconstructing the ith video block can be expressed as

and->

Respectively representing the lengths of a play buffer and a high-resolution video block buffer when the i-th video block starts to be reconstructed; the main limitation of decision is bandwidth,/>

Is past k ₁ Average download bandwidth of individual video blocks; />

Is past k ₂ The total latency of the video blocks from requested to downloaded; />

Is past k ₃ Implicit available computing power of the individual video blocks; r's' _i-1 Is the resolution of the i-1 th video block; n-i is the number of blocks remaining for a video block.

(3) Action space. For each video block, the DRL agent must select the reconstruction target video resolution based on the current state and select the most appropriate SVSR model based on the currently available computing resources. Both the size of the VSR model and the enhanced target video resolution affect the consumption of computing resources on the edge nodes. The DRL proxy need decision in SVSR therefore includes 1) video reconstruction resolution level (r _i ,r′ _i ) 2) configuration of SVSR model (including model level m _i And output path l _i ). The action vector of SVSR at the beginning of reconstructing the ith video block can be expressed as

(4) A bonus function. The DRL agent makes self-adaptive decision according to the collected real-time system state, and after the system executes a certain action, the system can feed back an instant rewards to the DRL agent. In order to enable the user to obtain better video service experience, the goal of user QoE maximization is achieved. Defining a bonus function as

R _i ＝B′ _i -μτ _i -λ|B′ _i -B′ _i-1 |

Wherein B' _i Showing the resolution of the i-th video block after reconstruction; τ _i Representing a rebuffering time occurring when processing the ith video block; i B' _i -B′ _i-1 The i represents the quality jitter between the i-th video block and the i-1 th video block; μ and λ are weight factors.

(5) Deep neural network models and model training.

The deep reinforcement learning Bellman (Bellman) equation can be written accordingly as

For a Markov Decision Process (MDP) with a large number of states and actions, a deep neural network Q (s _i ,a _i The method comprises the steps of carrying out a first treatment on the surface of the θ) to approximate Q(s) _i ,a _i ) θ is a parameter.

For deep neural network Q(s) _i ,a _i The method comprises the steps of carrying out a first treatment on the surface of the θ) follow the DQN (Deep Q-Network) algorithm.

The DQN includes two neural networks. An evaluation network (eval_net) and a target network (target_net). The evaluation network is used for evaluating each action a in the current state _i By using the Q value of the network selection action a _i Selecting action a with the largest Q value _i After that, the environment feeds back to the reward and the next state s _i+1 I.e.(s) _i ,a _i ,R _i ,s _i+1 ) And then stored therein. Inputting s to a target network _i+1 I.e. the next environmental state, so as to obtain a under each action through the network _i And then use state s _i Lower motion a _i The obtained reward R _i And the bellman equation calculates the target to obtain Q (s _i ,a _i )。

The DQN algorithm combines a neural network and Q-sparing, and approximates a simulation function Q(s) _i ,a _i ) The input is state s _i The output is each action a _i And selecting actions executed by the corresponding states according to the Q values so as to complete control.

State s _i Input to obtain Q values Q(s) _i ,a _i ) The method comprises the steps of carrying out a first treatment on the surface of the Selecting action a 'corresponding to the maximum Q value' _i And executing; the environment changes after execution and can obtain the rewards R of the environment _i The method comprises the steps of carrying out a first treatment on the surface of the With rewards R _i Update Q(s) _i ,a′ _i ) With new Q (s _i ,a′ _i ) Network parameters are updated.

(6) After training, the model can collect real-time state information as input and output an optimal control strategy, which comprises 1) video reconstruction resolution level (r _i ,r′ _i ) 2) configuration of SVSR model (includingModel level m _i And output path l _i )。

Example 2

As shown in fig. 3, embodiment 2 provides a video transmission system of a scalable video super-resolution model, for implementing the video transmission method based on the scalable video super-resolution model, the video transmission system including:

the client 100 is configured to send a request signal, and when the client is connected to the edge node 200 for the first time, send the device type and the highest display resolution to the edge node 200 along with the request signal, where the client 100 needs to periodically report the play status to the edge node 200; and is also used for receiving and playing high-resolution video blocks;

a server 300 for receiving a request signal; and also for transmitting the low resolution video block;

the edge node 200 is configured to receive a signal,

the edge node 200 is communicatively connected to the client 100, and is configured to receive a request signal sent by the client 100, and send the request signal to the server 300;

the edge node 200 is communicatively connected to the server 300, and is configured to receive the low-resolution video block sent by the server 300, reconstruct the low-resolution video block to obtain a high-resolution video block, and send the high-resolution video block to the client 100.

The edge node 200 can collect network state information, has the capability of supporting scalable video super-resolution processing, can adaptively adjust the model size and reconstruction resolution of a scalable video super-resolution model, integrates edge calculation and video super-resolution, and improves video quality.

In a specific embodiment, the client 100 sends a request signal to the edge node 200, the edge node 200 sends a request signal of the client 100 to the server 300, the server 300 provides the edge node 200 with the lowest resolution video block of the request signal after receiving the request signal, then the edge node 200 receives the lowest resolution video block, determines the model size and the reconstruction resolution of the scalable video super-resolution model through DRL processing, and the SVSR processor 230 completes reconstruction of the lowest resolution video block by using the determined scalable video super-resolution model to obtain a high resolution video block, buffers the high resolution video block into the high resolution video block buffer 233, and finally sends the high resolution video block to the client 100, thereby completing video playing.

In a specific embodiment, the server 300 is configured to store a low-resolution video, divide the low-resolution video into N low-resolution video blocks with the same time length T according to the DASH standard, and encode the N low-resolution video blocks into a group of resolutions, thereby obtaining a group of low-resolution video blocks consisting of N low-resolution video blocks.

In a specific embodiment, the edge node 200 includes a monitor 210, a DRL agent 220, and a SVSR processor 230;

the monitor 210 is communicatively connected to the client 100 and the DRL agent 220, and the monitor 210 is configured to monitor the network status of the server 300, the edge node 200, and the client 100;

the DRL agent 220 is communicatively connected to the server 300, and the DRL agent 220 is configured to receive network status information, and determine a model size and a reconstruction resolution of a scalable video super-resolution model according to the network status information;

the SVSR processor 230 is communicatively connected to the DRL agent 220, the server 300, and the client 100, and the SVSR processor 230 can support a deep learning-based SVSR algorithm, and is configured to reconstruct a low-resolution video block using the scalable video super-resolution model after determining a model size and reconstructing a resolution, thereby obtaining a high-resolution video block.

In a specific embodiment, the SVSR processor 230 includes a low-resolution video block buffer 231, a high-resolution video block buffer 233, and a processing area 232;

the low resolution video block buffer 231 is used for buffering the low resolution video blocks sent from the server 300 to the SVSR processor 230;

the processing area 232 sequentially reconstructs the low resolution video blocks in the low resolution video block buffer area 231 according to a first-in first-out principle to obtain high resolution video blocks;

the high resolution video block is buffered to the high resolution video block buffer 233.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The video transmission method based on the scalable video super-resolution model is characterized by comprising the following steps of:

2. The video transmission method based on the scalable video super-resolution model according to claim 1, wherein the specific steps of determining the model size and the reconstruction resolution of the scalable video super-resolution model according to the network state information are as follows:

3. The video transmission method based on the scalable video super-resolution model according to claim 1, wherein the low-resolution video block is reconstructed by using the scalable video super-resolution model after determining the model size and reconstructing the resolution, specifically comprising the following steps:

caching the high-resolution video block into a high-resolution video block buffer area;

a reconstruction start time for an i-th low resolution video block is calculated.

4. The scalable video super-resolution model-based video transmission method according to claim 3, wherein the buffering of the high-resolution video block to the buffering start time of the play buffer: and after the ith low-resolution video block is reconstructed and the (i-1) th high-resolution video block is transmitted to the playing buffer zone, the ith high-resolution video block starts to be transmitted to the playing buffer zone, and the buffer start time from the buffer of the ith high-resolution video block to the playing buffer zone is calculated.

5. The scalable video super-resolution model-based video transmission method according to claim 4, wherein the playing time of the high-resolution video block is: and (3) finishing playing the (i-1) th high-resolution video block, caching the i-th high-resolution video block into a playing buffer area, and calculating the playing start time of the i-th high-resolution video block.

6. The method for video transmission based on the scalable video super-resolution model according to claim 5, wherein when the high-resolution video block is played, if the high-resolution video block is exhausted in a play buffer, the high-resolution video block needs to be re-cached, and if the high-resolution video block needs to be re-cached, the time for re-caching the high-resolution video block is calculated.

7. A video transmission system of a scalable video super-resolution model, wherein the video transmission system is configured to implement the video transmission method based on the scalable video super-resolution model according to any one of claims 1 to 6, and the video transmission system comprises:

a client (100) for transmitting a request signal and receiving a high resolution video block;

a server (300) for receiving the request signal and transmitting the low resolution video block;

an edge node (200),

the edge node (200) is in communication connection with the client (100) and is used for receiving a request signal sent by the client (100) and sending the request signal to the server (300);

the edge node (200) is in communication connection with the server (300) and is used for receiving the low-resolution video block sent by the server (300), reconstructing the low-resolution video block to obtain a high-resolution video block, and sending the high-resolution video block to the client (100).

8. The video transmission system of the scalable video super-resolution model according to claim 7, wherein the server (300) is configured to store the low-resolution video, and divide the low-resolution video into N low-resolution video blocks having the same time length T, to obtain a low-resolution video block group consisting of N low-resolution video blocks.

9. The video transmission system of the scalable video super-resolution model of claim 7, wherein the edge node (200) comprises a monitor (210), a DRL agent (220), and a SVSR processor (230);

the monitor (210) is in communication connection with the client (100) and the DRL agent (220) and the server (300), and the monitor (210) is used for monitoring the network states of the server (300), the edge node (200) and the client (100);

the DRL agent (220) is used for receiving network state information, and determining the model size and reconstruction resolution of a scalable video super-resolution model according to the network state information;

the SVSR processor (230) is in communication connection with the DRL agent (220), and the SVSR processor (230) is used for reconstructing a low-resolution video block by utilizing the scalable video super-resolution model after determining the model size and reconstructing the resolution to obtain a high-resolution video block.

10. The video transmission system of the scalable video super-resolution model of claim 9, wherein the SVSR processor (230) comprises a low-resolution video block buffer (231), a high-resolution video block buffer (233), and a processing region (232);

the low resolution video block buffer (231) is configured to buffer low resolution video blocks sent by the server (300) to the SVSR processor (230);

sequentially reconstructing the low-resolution video blocks in the low-resolution video block buffer area (231) in the processing area (232) according to a first-in first-out principle to obtain high-resolution video blocks;

the high resolution video block is buffered to the high resolution video block buffer (233).