CN109802964A

CN109802964A - A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN

Info

Publication number: CN109802964A
Application number: CN201910060941.7A
Authority: CN
Inventors: 高岭; 赵子鑫; 袁璐; 刘艺; 秦晨光; 任杰; 王海; 郑杰
Original assignee: Northwest University
Current assignee: Northwest University
Priority date: 2019-01-23
Filing date: 2019-01-23
Publication date: 2019-05-24
Anticipated expiration: 2039-01-23
Also published as: CN109802964B

Abstract

A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN, consider different Network status, loaded condition in buffer zone, and client device electricity residue situation, and based on this environment Imitating behaviour in service, in client and the interactive process of server, Streaming Media carries out the different switching of quality to multimedia file by DQN learning system, and the switching of high frequency low frequency kernel is to achieve the purpose that energy optimization.

Description

A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN

Technical field

The invention belongs to computer network communication technology fields, and in particular to a kind of HTTP self-adapting flow control based on DQN Energy consumption optimization method processed.

Background technique

In recent years, MultiMedia Field development speed is very fast, and the transmission of multimedia content is also increasingly valued by the people, HTTP video protocols are after internet is universal, and a kind of mode of the Online Video viewing of mainstream, http protocol transmits multimedia File is broadly divided into two stages, and first stage is the progressive download stage, is generally exactly to support user in downloading It plays, without being played again after finishing entire file download.But this is not really to transmit as a stream, with ordinary file Downloading is not different, and second stage HTTP fluidization technique, is mainly divided into media file in server end small one by one Slice, service receive the slice that request sends the media file by http response again, and in the friendship of server and client During mutually, client adjusts slice code rate according to the state of network in real time, and high code rate is used in the case where network state is good, Low bit- rate is used when network state is busy and is automatically switched, and the method mainly realized is each list text of the server section in offer There is dated code rate in part, the player of client can be automatically adjusted according to the progress of broadcasting and the speed of downloading, protect Demonstrate,prove play continuity and fluency on the basis of, as far as possible promotion user experience, and we to do is to guarantee everything Under the premise of the optimization of a deep level is carried out to client device energy consumption, in client terminal playing Online Video, network state is delayed Deposit state, mobile phone remaining capacity is the part that people often ignore, HTTP self adaptation stream there is also code rate selection flexibility compared with It is low, complicated Network status can not be coped with well, and the code rate of frequent switching video flowing can not only cause discomfort to viewer Experience also ignores switching bring energy consumption expense, here it is proposed that a kind of deep based on enhancing study and neural network The energy optimization model of q learning.

Q-learning is a kind of classical way of intensified learning, and the essential core thought of intensified learning is that intelligent body passes through With the continuous interaction of environment, intelligent body is by taking the suitable movement value that is recompensed to enter NextState, and Q-learning Core Q-table, row and column shows respectively state and action, and the Q value in Q-table is exactly that measurement state s takes The quality of a is acted, and how neural network works herein, we can be it as a flight data recorder, and input is One state value, output be this state value, and training data from whole system operate during generate Some data can be corrected during calculating and returning by these data, we use corrected value as nerve The input of network, second training are finally reached convergent effect, select optimal policy.

Summary of the invention

In order to overcome the above-mentioned deficiencies of the prior art, the object of the present invention is to provide a kind of HTTP based on DQN is adaptive Flow control energy consumption optimization method is increased using the q-learning that one kind combines BP (black propagation) neural network Strong study with environment to interact, and during user watches video online, environment is constantly in change, the change of network Change, the consumption of electricity, the system under changeable environment in video player video quality carry out Dynamic Matching switching with Dynamic dispatching is carried out to different cpu kernels, obtains most suitable media quality rank and most suitable cpu core.It is finally reached Reduce the function of energy consumption.

To achieve the goals above, the technical solution adopted by the present invention is that:

A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN, comprising the following steps:

1) environment acquisition modeling: using network used in Dummynet simulation daily life, in 3g, 4g, Wifi net Client is used under network environment, and current context information is acquired, and has client-side data cache state B respectively, i.e., currently Fragment length in buffer zone, network state N, the set of tri- states of battery capacity E composition, S=(B, N, E), by the time Multiple time points are divided into, are corresponded, and save data；

2) definition of client-action collection and Reward Program: according to environmental data collected in step 1) as state set The state space for establishing Q-learning, establishes the behavior aggregate of model, and system passes through to network state, buffer status and battery Mainly there are two action states to constitute to select suitably to act the behavior aggregate for establishing model into next state for electricity, cuts Change video quality, the switching of high frequency core low frequency core；The switching of video segment quality, will consume energy the sum of grade and handover overhead It is defined as Reward Program, Reward Program composition has following two points, and first is energy consumption grade point, by energy consumption grade, different networks Grade, different video qualities, different cpu core uses form a mapping relations, and energy consumption grade point here is by mapping It is chosen in table, second value is that video switching and big small nut switch brought expense, this value is a negative-feedback, so Reward Program expression formula are as follows: R=C₁R_e _nergy+C₂R_switch, here with C₁C₂It is the weight of two return values respectively, according to user Preference stresses to set specific value, and weighted value can be 1；

3) algorithm is realized: using Deep Q Learning algorithm, the Q-learning for being combined with bp neural network is calculated Method chooses best movement, the main function of neural network is to be converted to high-dimensional state by the continuous interaction with environment Low-dimensional output, neural network is that ambient condition s is inputted by will become low latitudes state value in ambient condition, output The corresponding Q value of movement has used ε-greedy greedy algorithm, under each state, with small in the form of a vector Probability ε random selection acts action, and optimal action is selected according to bp neural network with 1- ε, later will be randomly selected Two are carried out in movement and the replay_buffer experience pond being added in our neural networks according to the action that neural network selects Movement is made in secondary training, reaches NextState, and neural metwork training optimizes input state, and output valve uses optimal solution strategy, defeated Optimal solution out；

4) in practical problem, equipment obtains ambient condition value by system, selects most matched quality video by DQN The kernel of user experience is not influenced with most power saving and.

The based environment information contains network hierarchy in the state set S of definition, is divided into six etc. on earth according to again high Grade, but by measurement, in the case of 1,2 two-stage or 3g can not quality is minimum in normal load test video, mobile phone electricity is remaining Value caches fragment length, calls the script of cache information by writing here, selects the buffer status of unit time point, also It is segment length.

System in the present invention distributes each segment reasonable by interacting with state changing in environment Stream media quality and reasonable CPU core, the experimental results showed that, this optimization method in the case where not influencing user experience, Mobile flow medium energy consumption caused by equipment can be effectively reduced, loading section energy consumption reduces 21 percent.

Detailed description of the invention

Fig. 1 is present system flow chart.

Fig. 2 is DQN learning process figure of the present invention.

Fig. 3 is application scenario diagram of the present invention.

Specific embodiment

The present invention is further discussed below with reference to embodiments, but the present invention does not limit to and following embodiment.

A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN, as shown in Figure 1,3, http self adaptation stream The groundwork situation of work be files in stream media is divided into one by one a lesser segment carry out HTTP request, transmit Deng, thus we first client receive be files in stream media slice, system acquisition network environment and current electricity Situation, and data are handled, specific process is as follows:

Definition status collection S, network hierarchy are divided into six grades according to again high on earth, but by measurement, 1,2 two-stage or 3g In the case of can not in normal load test video quality it is minimum, so being 0 to calculate by return value, mobile phone electricity remaining value is delayed Fragment length is deposited, calls the script of cache information by writing here, selects the buffer status of unit time point, that is, segment Length.

Define set of actions, the exploitation version Odroid XU3 that we use here, main Cortex-A15 high frequency kernel with Cortex-A7 low frequency kernel, movement here are mainly worked according to the variation of environment adjustment using which core, which core is slept It sleeps, main actions are task choosing A15 and task choosing A7, and stream media quality is divided into lossless, high definition, it is low clear, it is only limitted to here Test the video collection of test.

3) building of reward function and model is selected,

Firstly, initializing to neural network, we are each states of estimation using the main function of BP neural network here The value of lower movement, and the dimension of vector is reduced, to the learning rate α and discount factor γ in Q value iterative formula, and movement choosing Exploration probability ε assignment in selecting.To the period of each iteration, following process will do it as shown in Fig. 2, initializing completion Afterwards, the state state of system is inputted, and output is value caused by current action, we replace according to this output of estimation Optimal solution is found in output before instead of, optimization step by step, and after obtaining the value of each movement, we use ε- Gree strategy finds optimal solution, we initialize a threshold value here, and initial value is set to 0.8, that is to say, that we are dynamic in selection When work, 8 percent tenth is that random selection one movement, 2 percent tenth is that by neural computing act income, Choose it is most suitable that, with continuous study, that value that we initialize is lower and lower, until not randomly choosing.

Claims

1. a kind of HTTP self adaptation stream based on DQN controls energy consumption optimization method, which comprises the following steps:

1) environment acquisition modeling: using network used in Dummynet simulation daily life, in 3g, 4g, Wifi network rings Client is used under border, and current context information is acquired, and has client-side data cache state B, i.e. current cache respectively Fragment length in region, network state N, the set of tri- states of battery capacity E composition, S=(B, N, E) will be divided the time It for multiple time points, corresponds, and saves data；

2) it the definition of client-action collection and Reward Program: is established according to environmental data collected in step 1) as state set The state space of Q-learning, establishes the behavior aggregate of model, and system passes through to network state, buffer status and battery capacity To select suitably to act the behavior aggregate for establishing model into next state, mainly there are two action states to constitute, and switching regards Frequency quality, the switching of high frequency core low frequency core；The switching of video segment quality, the sum of the grade and handover overhead of will consuming energy definition For Reward Program, Reward Program composition has following two points, and first is energy consumption grade point, by energy consumption grade, different network hierarchies, Different video qualities, different cpu core uses form a mapping relations, and energy consumption grade point here in mapping table by selecting It takes, second value is that video switching and big small nut switch brought expense, this value is a negative-feedback, so return letter Number expression formula are as follows: R=C₁R_{e nergy}+C₂R_switch, here with C₁C₂It is the weight of two return values respectively, according to user preference Stress to set specific value, weighted value can be 1；

3) algorithm is realized: being used Deep Q Learning algorithm, is combined with the Q-learning algorithm of bp neural network, leads to The continuous interaction with environment is crossed, chooses best movement, the main function of neural network is that high-dimensional state is converted to low-dimensional Output, neural network is that ambient condition s is inputted by will become low latitudes state value in ambient condition, output action Corresponding Q value has used ε-greedy greedy algorithm in the form of a vector, under each state, with small probability ε random selection acts action, optimal action is selected according to bp neural network with 1- ε, later by randomly selected movement Secondary instruction is carried out in the replay_buffer experience pond being added in our neural networks with the action selected according to neural network Practice, make movement, reach NextState, neural metwork training optimizes input state, and output valve uses optimal solution strategy, and output is most Excellent solution；

4) in practical problem, equipment obtains ambient condition value by system, by the most matched quality video of DQN selection and most Power saving and the kernel for not influencing user experience.

2. a kind of HTTP self adaptation stream based on DQN according to claim 1 controls energy consumption optimization method, feature exists Contain network hierarchy in, the based environment information, the state set S of definition, is divided into six grades on earth according to again high, but pass through Cross measurement, in the case of 1,2 two-stage or 3g can not in normal load test video quality it is minimum, mobile phone electricity remaining value, caching Fragment length calls the script of cache information by writing here, selects the buffer status of unit time point, that is, piece segment length It is short.