CN111031387B - Method for controlling video coding flow rate of monitoring video sending end - Google Patents

Method for controlling video coding flow rate of monitoring video sending end Download PDF

Info

Publication number
CN111031387B
CN111031387B CN201911145837.4A CN201911145837A CN111031387B CN 111031387 B CN111031387 B CN 111031387B CN 201911145837 A CN201911145837 A CN 201911145837A CN 111031387 B CN111031387 B CN 111031387B
Authority
CN
China
Prior art keywords
video
rate
encoder
real
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911145837.4A
Other languages
Chinese (zh)
Other versions
CN111031387A (en
Inventor
张旭
赵阳超
马展
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201911145837.4A priority Critical patent/CN111031387B/en
Publication of CN111031387A publication Critical patent/CN111031387A/en
Application granted granted Critical
Publication of CN111031387B publication Critical patent/CN111031387B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44004Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving video buffer management, e.g. video decoder buffer or video display buffer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44227Monitoring of local network, e.g. connection or bandwidth variations; Detecting new devices in the local network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/462Content or additional data management, e.g. creating a master electronic program guide from data received from the Internet and a Head-end, controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabilities
    • H04N21/4621Controlling the complexity of the content stream or additional data, e.g. lowering the resolution or bit-rate of the video stream for a mobile client with a small screen

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a method for controlling video coding flow rate of a monitoring video sending end, which mainly comprises the following steps: (1) collecting a real-time available bandwidth data set of a video transmission scene network; (2) the real bandwidth data is utilized to construct a simulation training environment of a monitoring video sending end, the training environment determines the highest available bandwidth of the monitoring video sending end in real time according to the real bandwidth data to serve as the sending rate of the video, and the coding rate of an encoder is adjusted by receiving the code rate selected by the deep reinforcement learning model; (3) constructing a continuous action output deep reinforcement learning model based on a trust domain, and training the model by utilizing a simulation environment; (4) putting the trained model into a monitoring video, integrating the trained model into a real environment, and performing online training optimization; (5) and integrating the optimized deep learning model to a monitoring video sending end to make a coding rate decision of a coder at the sending end. The invention solves the problem of controlling the coding flow rate of the monitoring video sending end by utilizing deep reinforcement learning.

Description

Method for controlling video coding flow rate of monitoring video sending end
Technical Field
The invention relates to the field of real-time video transmission, in particular to a method for controlling video coding flow rate of a monitoring video monitoring end.
Background
The monitoring video generally has higher requirements in the aspects of real-time performance, fluency, video picture quality and the like. However, in an actual monitoring environment, from a monitoring video acquisition end to a receiving end, such as a monitoring room, video transmission often passes through a complex network environment, and the complex network environment causes conditions of limited bandwidth and time delay fluctuation, thereby affecting real-time performance, smoothness and definition of a monitoring video playing end (receiving end). In order to ensure the transmission effect of the surveillance video and improve the viewing experience of the surveillance video, all links in the transmission process of the surveillance video need to be optimized in a targeted manner, and particularly a coding flow rate control part of a video sending end.
The monitoring video sending end needs accurate coding flow rate control, and the main reasons are as follows: on one hand, the sending rate of the monitoring video is determined by a complex network environment, and has the characteristics of rapid change, difficult prediction and the like; on the other hand, the rate of the monitoring video sending end in the coding stage after video acquisition can be controlled by manually adjusting the coding parameters of the coder, in the sending process of the video, the fluency of video sending is ensured by passing through a video sending buffer area from the coding of the video coder to the sending process of the code stream through the network, the consumption speed of the video sending buffer area is determined by the real-time available bandwidth of the network, namely the actual sending rate, the increasing speed of the video sending buffer area is determined by the coding rate of the coder, and therefore the problem that the video coding rate is not matched with the video sending rate can occur.
If the coding rate of the monitoring video sending end and the sending rate of the video do not match, the phenomenon of video sending buffer overflow or 'starvation' of the video sending end can be caused. The overflow of the video sending buffer is that the number of video frames in the video buffer has reached the upper limit of the capacity of the buffer, and if the video frames coded by the coder are to be stored again, the video frames entering the buffer at the earliest are required to be removed, which results in the frame loss phenomenon in the video transmission process. On the other hand, the starvation phenomenon of the video sending buffer is mainly caused by that the video sending buffer is empty and the code rate of video coding is often lower than the real-time available bandwidth of video sending for a long time, which shows that the utilization rate of the bandwidth is too low in the video sending process, a large amount of available bandwidth resources are wasted, and meanwhile, the video definition of the monitoring video receiving end also has a further improved space.
Therefore, the video coding flow rate control of the monitoring video sending end mainly aims to achieve the aim that the coding rate of the monitoring video coder is matched with the video sending rate, when the real-time available bandwidth is large, the video sending rate is large, and the coding rate of the video coder can be properly increased; when the real-time available bandwidth is reduced, the video sending rate is reduced, the coding rate of the coder should be reduced in time, and the phenomenon of video frame loss caused by overflow of a video sending buffer is avoided.
The most intuitive way to achieve the rate matching is to predict the real-time available bandwidth of the video transmission at the next time in advance, and then adjust the coding rate of the encoder at the next time according to the level of the predicted real-time available bandwidth. However, in an actual environment, since the change of the actual available bandwidth is not regular in general, it is quite difficult to estimate the actual available bandwidth at the next time, so that the current transmission environment can only be roughly estimated by observing some measurable characteristic parameters in the video transmission process, and the encoder coding rate at the next time is selected according to the observed characteristic parameters. The difficulty in rate matching through the measurable characteristic parameters is how to accurately judge the characteristics of the current video transmission environment, particularly the real-time available bandwidth of the current network, according to the measured parameters.
Disclosure of Invention
Aiming at the problem of code rate control during the sending of the monitoring video, the invention provides a monitoring video sending end video coding flow rate control method based on deep reinforcement learning.
The technical scheme adopted by the invention is as follows:
a method for controlling video coding flow rate at a monitoring video sending end comprises the following steps:
step 1, collecting real bandwidth change data of an actual transmission environment by using an equal-interval sampling mode, and making a video transmission scene network real-time available bandwidth data set for training;
step 2, a simulation training environment of the monitoring video sending end is constructed by using the real bandwidth data collected in the step 1, the training environment determines the highest available bandwidth sent by the monitoring video in real time according to the real bandwidth data to be used as the video sending rate, and the code rate selected by the deep reinforcement learning model is received and set as the coding code rate of the encoder in the next time period;
step 3, constructing a continuous action output depth reinforcement learning model based on a trust domain, designing a target reward function required by model training, and training the model by using the simulation training environment in the step 2; the model takes various data output by the simulation training environment in the step 2 as input, the coding code rate of the monitoring video sending end at the next moment is selected, and the goal of the training model is a target reward function set to the maximum;
step 4, integrating the model trained in the step 3 into a real environment for interaction, and performing on-line training optimization;
and 5, integrating the optimized deep reinforcement learning model to a monitoring video sending end to select a sending code rate.
Further, in step 1, the real bandwidth change data includes real-time available bandwidth change data when the surveillance video is transmitted and an existing public bandwidth change data set. The real-time available bandwidth data when the monitoring video is sent is as follows: the network available bandwidth for the collected video transmission is sampled at different time intervals.
Further, in step 2, a simulation training environment of the monitoring video sending end is constructed, and the specific process is as follows:
step 21, constructing a video encoder simulation module, wherein the input of the video encoder simulation module is some fixed encoding parameters of the monitoring video, including the frame rate of the video, the size of the video image group and the selected video encoding code rate; the output of the video encoder simulation module is the data size of one video frame; according to the input fixed coding parameters, the data size of a video frame is determined by using a uniform distribution:
Figure GDA0002736431050000031
wherein sample () operation represents sampling from a probability distribution, U (a, b) represents a uniform distribution over the interval [ a, b ]; the video encoder simulation module adds video frames with the size of FS to a buffer area in the video sending buffer area simulation module at regular time according to frame intervals determined by the frame rate of the video;
step 22, constructing a video transmission buffer simulation module, the main body of which is a simulated video transmission buffer, and the maximum frame number which can be accommodated by the buffer needs to be specified, when the buffer is full, if the simulation module of the encoder has a new incoming video frame, the existing earliest incoming video frame in the buffer needs to be cleared, and the new incoming video frame is added into the buffer;
step 23, constructing a video network transmission simulation module, wherein the input of the video network transmission simulation module is the real bandwidth change data of the actual transmission environment obtained in the step 1, and the available bandwidth is used as the video transmission rate to consume the video frame from the video transmission buffer area in the video transmission buffer area simulation module; if the available bandwidth is maintained at BW for the Δ t time interval, the total amount of data D transmitted over the network during the Δ t time interval is:
D=Δt*BW
the total number of data amounts of frames in the buffer that should be cleared out of the zone is of size D.
Further, in step 3, a confidence domain-based continuous action output deep reinforcement learning model is constructed, and the specific implementation process is as follows: step 31, processing the output of the simulation training environment in the step 2 as the input of the deep reinforcement learning model, wherein the main processing process is as follows: firstly, respectively normalizing all parameters of historical k time nodes, wherein the parameters comprise the coding code rate of an encoder, the length of a video sending buffer area, the change value of the video sending buffer area and the historical sending average rate of the video; then storing the normalized value of the parameter in an input matrix state matrix;
step 32, building a neural network part of a continuous action output deep reinforcement learning model based on a trust domain, wherein the neural network part comprises a deep neural network operator and a deep neural network critic, and building training optimization targets of two deep neural networks, namely respective loss functions;
step 33, designing and training a reward function of the depth-enhanced learning model based on the continuous action of the trust domain, wherein the reward function gives a higher reward value to the selection action of the encoder code rate for keeping the video sending buffer at a normal level and the selection action for keeping the encoder code rate stable, and gives a lower reward value to the action for causing the length of the video sending buffer to deviate from the normal level;
step 34, inputting the matrix state matrix of step 31 into the network operator and the network criticic of step 32, performing forward calculation of the neural network to obtain the output of the network operator and the network criticic, then obtaining the video encoder coding rate at the next moment according to the output of the neural network, calculating the reward function constructed in step 33, finally calculating the corresponding training optimization target according to the value of the reward function and the output of the two neural networks, performing back propagation of the neural network to update the neural network parameters, and setting the encoder coding rate obtained by the output of the neural network as a new encoder coding rate, wherein the encoding rate will affect the matrix state at the next moment;
step 35, repeat step 34 until the resulting reward function no longer rises.
Further, in the step 5, the deep reinforcement learning model optimized in the step 4 is integrated to a monitoring video sending end, and only a network operator in the model needs to be deployed to the sending end, and the specific process is as follows:
step 51, deploying a lightweight operating environment of the selected deep learning framework at a monitoring video sending end;
step 52, converting the network operator in the deep reinforcement learning model optimized in the step 4 into a mobile lightweight model;
and step 53, calling the mobile lightweight model generated in the step 52 by using the operating environment configured in the step 51 to perform forward calculation to obtain a code rate to be selected, setting the code rate of the encoder, directly collecting characteristic parameters from the system and calculating a matrix state according to the mode of the step 4, continuously calculating the code rate of the encoder at the next moment by taking the new matrix state as the input of the lightweight model, and repeating the interactive process.
The invention solves the problem of controlling the coding flow rate of the monitoring video sending end by utilizing deep reinforcement learning on the basis of a large amount of bandwidth change data of the actual transmission environment. In order to achieve the best control effect, on one hand, the invention selects a prior strengthening learning method based on a trust domain; on the other hand, in order to ensure the response speed of the coding flow rate control and the continuous change range of the flow rate control, the invention selects a reinforced learning model of continuous action output, and the model of the continuous action output can directly output the selected code rate value instead of the preset code rate grade. Secondly, after the deep reinforcement model is trained in a simulation environment under the support of a large amount of data, the model is further deployed to an actual system to perform online optimization training of the model, and the performance of the model in a specific actual scene is improved on the premise of ensuring the generalization capability of the model.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a video sender simulation environment;
FIG. 3 is a continuous motion output deep reinforcement learning model based on confidence domains.
Detailed Description
The invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, a method for controlling video encoding flow rate at a monitoring video sending end in this embodiment specifically includes the following steps:
step 1, collecting real-time available bandwidth change data during monitoring video transmission and collecting an existing public bandwidth change data set by using an equal-interval sampling mode, and making a video transmission scene network real-time available bandwidth data set for training.
Setting the time t of samplingsample150ms, if the sampling time in the disclosed data set is not 150ms, the sampling time can be uniformly modified to 150ms, the network bandwidth data is stored in a plurality of text files, each line of the text files has two values, wherein, the 1 st value represents the current time stamp, the time stamp starts from 0 and is tsampleThe 2 nd numerical value represents for interval incrementsThe available bandwidth value of the current time, and the two values are separated by a tab. The network available bandwidth of video transmission is sampled and collected by setting different time intervals, and the purpose is to simulate the network bandwidth of various change speeds and increase data diversity.
Step 2, constructing a simulation training environment of the monitoring video sending end by using the real bandwidth data collected in the step 1:
and step 21, constructing a video encoder simulation module. The simulation module of the video encoder inputs some fixed encoding parameters of the monitoring video, the Frame Rate (FR) of the video is 25FPS, the size (GOP) of a video image group is 3s and is correspondingly 75 frames, and the selected video encoding rate (BR), wherein the video encoding rate parameters are obtained by calculating the output of the operator network constructed in the step 32 and are direct control quantity of video encoding flow rate control. The output of the video encoder emulation module is primarily the data size (FS) of one video frame. The uniform distribution is used to determine, in this embodiment:
Figure GDA0002736431050000051
where sample () operation represents sampling from a probability distribution, U (a, b) represents a uniform distribution over the interval [ a, b ], and FS is in MB when the unit of the selected video coding rate is Mbps.
And step 22, constructing a video sending buffer area simulation module. The main function of the video transmission buffer is to maintain a first-in-first-out video frame queue, each video frame in the queue will have corresponding frame data size information (FS), the maximum number of frames that can be accommodated by the specified buffer is 125 frames, the corresponding video duration is 5s, when the queue is full, if there is a new incoming video frame from the encoder simulation module, it is necessary to clear the existing earliest incoming video frame in the queue and add the new incoming video frame to the queue.
And step 23, constructing a video network transmission simulation module. The input of the video network transmission simulation module is the data of the timestamp and the bandwidth value read from the text file storing the network real-time available bandwidth data obtained in the step 1, and the current bandwidth value is used as the transmission rate from the current timestamp to the next timestamp to consume a video frame from a video transmission buffer area in the video transmission buffer area simulation module; if the available bandwidth is maintained at BW for the Δ t interval, which is 150ms in this embodiment, the total amount of data D transmitted over the network during this interval is:
D=0.15*BW
the video frames should be sequentially consumed from the end of the video transmission buffer until the sum of the data amounts of the consumed video frames reaches D.
Step 3, constructing a continuous action output deep reinforcement learning model based on a trust domain, and training the model by utilizing a simulation training environment:
step 31, processing the output of the simulation training environment in the step 2 as the input of the deep reinforcement learning model, wherein the main processing process is as follows: firstly, respectively normalizing all parameters of historical 2 time nodes, including the coding code rate BR (unit: Mbps) of an encoder, the length BL (unit: frame) of a video sending buffer area, the variation value delta B (unit: frame) of the video sending buffer area and the historical sending average rate TH (unit: Mbps) of a video; a total of 4 parameters would have 8 values. And then storing the normalized values in an input state matrix, wherein each column of the state matrix is a vector with the length of 2, and 4 columns in total represent the four characteristic parameters, so that the dimension of the state matrix is 2 multiplied by 4.
Step 32, building a neural network part of a continuous action output deep reinforcement learning model based on a trust domain by using the existing popular deep learning framework tensorflow, building an operator network and a critic network respectively, and building a training Optimization target of the operator network and the critic network, namely respective loss functions.
Step 33, designing and training a reward function of the confidence domain-based continuous action output deep reinforcement learning model, wherein the reward function is mainly used for considering whether the selection of the model can maintain the proper range of a video sending buffer area and whether the model keeps consistent with the previous selection as much as possible, and the reward function has the specific form:
Figure GDA0002736431050000061
BR and lastBR are respectively the code rate selected by the current decision and the code rate selected by the last decision, BL is the video transmission buffer length observed in the next decision after the current decision, and the unit is converted into the corresponding time length from the video frame number.
Step 34, inputting the state matrix of step 31 into the operator network and the criticic network of step 32, performing forward calculation of the neural network to obtain outputs of the operator network and the criticic network, then sampling from normal distribution constructed by mean values and variances output by the operator network to obtain the coding rate of the video encoder at the next moment, calculating the reward function constructed in step 33, finally calculating corresponding training optimization targets according to the values of the reward function and the outputs of the two neural networks, performing back propagation of the neural networks to update parameters of the neural networks, and taking the coding rate of the encoder obtained by the output of the neural networks as the coding rate of a new encoder, wherein the coding rate will influence the state matrix at the next moment.
And step 35, repeating the step 34, wherein the time interval of each decision is 1s until the obtained reward function does not rise any more.
And 4, integrating the model trained in the step 3 into a real system for interaction, wherein the whole process is consistent with the step 3, the difference is that the construction parts of the steps 32 and 33 are not needed any more, the deep reinforcement learning model and the reward function in the step 3 are directly used, and the difference is that the four characteristic parameters in the step 31 are directly collected from the real system every 1s to form a state matrix as the training input of the neural network, and after the newly selected coding rate is obtained, the coding rate is directly set as the coding rate of a system coder.
Step 5, integrating the optimized deep reinforcement learning model to a monitoring video sending end, and selecting a sending code rate:
step 51, deploying the lightweight running environment of the selected deep learning framework at the monitoring video sending end, selecting the tenserflow as the deep learning framework in the embodiment, compiling a static library deployed at the tenserflow-lite mobile end, and deploying the compiled model at the monitoring video sending end to be directly called.
And step 52, converting the actor network in the deep reinforcement learning model optimized in the step 4 into a tensoflow-lite model.
And step 53, calling the tensoflow-lite model generated in the step 52 by using the tensoflow-lite static library configured in the step 51 to perform forward calculation to obtain a code rate to be selected, setting the coding code rate of the encoder, directly collecting characteristic parameters from the system according to the mode of the step 4 to calculate a state matrix, calculating a new state matrix every 1s as the input of the lightweight model to calculate the code rate of the encoder at the next moment, and repeating the interaction process.

Claims (4)

1. A method for controlling video coding flow rate at a monitoring video sending end is characterized by comprising the following steps:
step 1, collecting real bandwidth change data of an actual transmission environment by using an equal-interval sampling mode, and making a video transmission scene network real-time available bandwidth data set for training;
step 2, a simulation training environment of the monitoring video sending end is constructed by using the real bandwidth data collected in the step 1, the training environment determines the highest available bandwidth sent by the monitoring video in real time according to the real bandwidth data to be used as the video sending rate, and the code rate selected by the deep reinforcement learning model is received and set as the coding code rate of the encoder in the next time period;
the specific process of constructing the simulation training environment of the monitoring video sending end is as follows:
step 21, constructing a video encoder simulation module, wherein the input of the video encoder simulation module is some fixed encoding parameters of the monitoring video, including the frame rate of the video, the size of the video image group and the selected video encoding code rate; the output of the video encoder simulation module is the data size of one video frame; according to the input fixed coding parameters, the data size of a video frame is determined by using a uniform distribution:
Figure FDA0002736431040000011
wherein sample () operation represents sampling from a probability distribution, U (a, b) represents a uniform distribution over the interval [ a, b ]; the video encoder simulation module adds video frames with the size of FS to a buffer area in the video sending buffer area simulation module at regular time according to frame intervals determined by the frame rate of the video;
step 22, constructing a video transmission buffer simulation module, the main body of which is a simulated video transmission buffer, and the maximum frame number which can be accommodated by the buffer needs to be specified, when the buffer is full, if the simulation module of the encoder has a new incoming video frame, the existing earliest incoming video frame in the buffer needs to be cleared, and the new incoming video frame is added into the buffer;
step 23, constructing a video network transmission simulation module, wherein the input of the video network transmission simulation module is the real bandwidth change data of the actual transmission environment obtained in the step 1, and the available bandwidth is used as the video transmission rate to consume the video frame from the video transmission buffer area in the video transmission buffer area simulation module; if the available bandwidth is maintained at BW for the Δ t time interval, the total amount of data D transmitted over the network during the Δ t time interval is:
D=Δt*BW
the total number of data amount of frames in the buffer that should be cleared out of the zone is D;
step 3, constructing a continuous action output depth reinforcement learning model based on a trust domain, designing a target reward function required by model training, and training the model by using the simulation training environment in the step 2; the model takes various data output by the simulation training environment in the step 2 as input, the coding code rate of the monitoring video sending end at the next moment is selected, and the goal of the training model is a target reward function set to the maximum;
the specific implementation process for constructing the confidence domain-based continuous action output deep reinforcement learning model comprises the following steps:
step 31, processing the output of the simulation training environment in the step 2 as the input of the deep reinforcement learning model, wherein the main processing process is as follows: firstly, respectively normalizing all parameters of historical k time nodes, wherein the parameters comprise the coding code rate of an encoder, the length of a video sending buffer area, the change value of the video sending buffer area and the historical sending average rate of the video; then storing the normalized value of the parameter in an input matrix state matrix;
step 32, building a neural network part of a continuous action output deep reinforcement learning model based on a trust domain, wherein the neural network part comprises a deep neural network operator and a deep neural network critic, and building training optimization targets of two deep neural networks, namely respective loss functions;
step 33, designing and training a reward function of the depth-enhanced learning model based on the continuous action of the trust domain, wherein the reward function gives a higher reward value to the selection action of the encoder code rate for keeping the video sending buffer at a normal level and the selection action for keeping the encoder code rate stable, and gives a lower reward value to the action for causing the length of the video sending buffer to deviate from the normal level;
step 34, inputting the matrix state matrix of step 31 into the network operator and the network criticic of step 32, performing forward calculation of the neural network to obtain the output of the network operator and the network criticic, then obtaining the video encoder coding rate at the next moment according to the output of the neural network, calculating the reward function constructed in step 33, finally calculating the corresponding training optimization target according to the value of the reward function and the output of the two neural networks, performing back propagation of the neural network to update the neural network parameters, and setting the encoder coding rate obtained by the output of the neural network as a new encoder coding rate, wherein the encoding rate will affect the matrix state at the next moment;
step 35, repeating step 34 until the obtained reward function does not rise any more;
step 4, integrating the model trained in the step 3 into a real environment for interaction, and performing on-line training optimization;
and 5, integrating the optimized deep reinforcement learning model to a monitoring video sending end to select a sending code rate.
2. The method for controlling video coding flow rate at a sending end of surveillance video according to claim 1, wherein in step 1, the real bandwidth change data includes real-time available bandwidth change data at the sending end of the surveillance video and an existing public bandwidth change data set.
3. The method according to claim 2, wherein in step 1, the real-time available bandwidth data during the transmission of the surveillance video is: the network available bandwidth for the collected video transmission is sampled at different time intervals.
4. The method for controlling the video coding flow rate at the transmitting end of the surveillance video according to claim 1, wherein in the step 5, the deep reinforcement learning model optimized in the step 4 is integrated into the transmitting end of the surveillance video, and only a network operator in the model needs to be deployed to the transmitting end, and the specific process is as follows:
step 51, deploying a lightweight operating environment of the selected deep learning framework at a monitoring video sending end;
step 52, converting the network operator in the deep reinforcement learning model optimized in the step 4 into a mobile lightweight model;
and step 53, calling the mobile lightweight model generated in the step 52 by using the operating environment configured in the step 51 to perform forward calculation to obtain a code rate to be selected, setting the code rate of the encoder, directly collecting characteristic parameters from the system and calculating a matrix state according to the mode of the step 4, continuously calculating the code rate of the encoder at the next moment by taking the new matrix state as the input of the lightweight model, and repeating the interactive process.
CN201911145837.4A 2019-11-21 2019-11-21 Method for controlling video coding flow rate of monitoring video sending end Active CN111031387B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911145837.4A CN111031387B (en) 2019-11-21 2019-11-21 Method for controlling video coding flow rate of monitoring video sending end

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911145837.4A CN111031387B (en) 2019-11-21 2019-11-21 Method for controlling video coding flow rate of monitoring video sending end

Publications (2)

Publication Number Publication Date
CN111031387A CN111031387A (en) 2020-04-17
CN111031387B true CN111031387B (en) 2020-12-04

Family

ID=70206094

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911145837.4A Active CN111031387B (en) 2019-11-21 2019-11-21 Method for controlling video coding flow rate of monitoring video sending end

Country Status (1)

Country Link
CN (1) CN111031387B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113132765A (en) * 2020-01-16 2021-07-16 北京达佳互联信息技术有限公司 Code rate decision model training method and device, electronic equipment and storage medium
CN112954401A (en) * 2020-08-19 2021-06-11 赵蒙 Model determination method based on video interaction service and big data platform
CN112468808B (en) * 2020-11-26 2022-08-12 深圳大学 I frame target bandwidth allocation method and device based on reinforcement learning
CN112911408B (en) * 2021-01-25 2022-03-25 电子科技大学 Intelligent video code rate adjustment and bandwidth allocation method based on deep learning
CN114039870B (en) * 2021-09-27 2022-12-09 河海大学 Deep learning-based real-time bandwidth prediction method for video stream application in cellular network
CN115086667B (en) * 2022-07-26 2022-11-18 香港中文大学(深圳) Real-time video transmission method based on adaptive learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108063961A (en) * 2017-12-22 2018-05-22 北京联合网视文化传播有限公司 A kind of self-adaption code rate video transmission method and system based on intensified learning
CN109982118A (en) * 2019-03-27 2019-07-05 北京奇艺世纪科技有限公司 A kind of video code rate self-adapting regulation method, device and electronic equipment
CN110351555A (en) * 2018-04-03 2019-10-18 朱政 Multipass based on intensified learning goes through video frequency coding rate distribution and control optimization method
CN110430398A (en) * 2019-08-06 2019-11-08 杭州微帧信息科技有限公司 A kind of Video coding distributed method based on intensified learning

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9264664B2 (en) * 2010-12-03 2016-02-16 Intouch Technologies, Inc. Systems and methods for dynamic bandwidth allocation
CN106331717B (en) * 2015-06-30 2019-05-07 成都鼎桥通信技术有限公司 Video code rate self-adapting regulation method and sending ending equipment
EP3360085B1 (en) * 2015-11-12 2021-05-19 Deepmind Technologies Limited Asynchronous deep reinforcement learning
CN108494772B (en) * 2018-03-25 2021-08-17 上饶市中科院云计算中心大数据研究院 Model optimization, network intrusion detection method and device and computer storage medium
CN110351561B (en) * 2018-04-03 2021-05-07 杭州微帧信息科技有限公司 Efficient reinforcement learning training method for video coding optimization
CN109947567B (en) * 2019-03-14 2021-07-20 深圳先进技术研究院 Multi-agent reinforcement learning scheduling method and system and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108063961A (en) * 2017-12-22 2018-05-22 北京联合网视文化传播有限公司 A kind of self-adaption code rate video transmission method and system based on intensified learning
CN110351555A (en) * 2018-04-03 2019-10-18 朱政 Multipass based on intensified learning goes through video frequency coding rate distribution and control optimization method
CN109982118A (en) * 2019-03-27 2019-07-05 北京奇艺世纪科技有限公司 A kind of video code rate self-adapting regulation method, device and electronic equipment
CN110430398A (en) * 2019-08-06 2019-11-08 杭州微帧信息科技有限公司 A kind of Video coding distributed method based on intensified learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Experience-driven Networking:A Deep Reinforce Learning based Approach;Zhiyuan Xu;《IEEE》;20181011;全文 *
Improving Cloud Gaming Experience through Mobile Edge Computing;Xu Zhang;《IEEE》;20190411;全文 *

Also Published As

Publication number Publication date
CN111031387A (en) 2020-04-17

Similar Documents

Publication Publication Date Title
CN111031387B (en) Method for controlling video coding flow rate of monitoring video sending end
WO2021057245A1 (en) Bandwidth prediction method and apparatus, electronic device and storage medium
CN110460880B (en) Industrial wireless streaming media self-adaptive transmission method based on particle swarm and neural network
WO2022028304A1 (en) Multimedia data processing method and apparatus, device and readable storage medium
CN112291620A (en) Video playing method and device, electronic equipment and storage medium
CN106454437B (en) A kind of streaming media service rate prediction method and device
CN113242469A (en) Self-adaptive video transmission configuration method and system
CN112954385A (en) Self-adaptive shunt decision method based on control theory and data driving
Cui et al. TCLiVi: Transmission control in live video streaming based on deep reinforcement learning
CN111813858B (en) Distributed neural network hybrid synchronous training method based on self-organizing grouping of computing nodes
Feng et al. Vabis: Video adaptation bitrate system for time-critical live streaming
CN111740925B (en) Deep reinforcement learning-based flow scheduling method
Sun et al. Optimal strategies for live video streaming in the low-latency regime
CN114726799B (en) Training method of congestion control agent, congestion control method and device
CN113132765A (en) Code rate decision model training method and device, electronic equipment and storage medium
CN114040257A (en) Self-adaptive video stream transmission playing method, device, equipment and storage medium
CN116455820A (en) Multi-transmission path adjustment system and method based on congestion avoidance
CN112202800A (en) VR video edge prefetching method and system based on reinforcement learning in C-RAN architecture
Kim et al. HTTP adaptive streaming scheme based on reinforcement learning with edge computing assistance
CN116320620A (en) Stream media bit rate self-adaptive adjusting method based on personalized federal reinforcement learning
CN115834924A (en) Interactive video-oriented loosely-coupled coding rate-transmission rate adjusting method
CN115022684B (en) Video stream self-adaptive transmission method based on deep reinforcement learning under QUIC protocol
Zhang et al. Cache-enabled adaptive bit rate streaming via deep self-transfer reinforcement learning
CN113645487B (en) Code rate self-adaptive distribution method
CN114500561A (en) Power internet of things network resource allocation decision method, system, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant