CN109802964A - A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN - Google Patents

A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN Download PDF

Info

Publication number
CN109802964A
CN109802964A CN201910060941.7A CN201910060941A CN109802964A CN 109802964 A CN109802964 A CN 109802964A CN 201910060941 A CN201910060941 A CN 201910060941A CN 109802964 A CN109802964 A CN 109802964A
Authority
CN
China
Prior art keywords
state
energy consumption
value
network
dqn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910060941.7A
Other languages
Chinese (zh)
Other versions
CN109802964B (en
Inventor
高岭
赵子鑫
袁璐
刘艺
秦晨光
任杰
王海
郑杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwest University
Original Assignee
Northwest University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwest University filed Critical Northwest University
Priority to CN201910060941.7A priority Critical patent/CN109802964B/en
Publication of CN109802964A publication Critical patent/CN109802964A/en
Application granted granted Critical
Publication of CN109802964B publication Critical patent/CN109802964B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks

Landscapes

  • Information Transfer Between Computers (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN, consider different Network status, loaded condition in buffer zone, and client device electricity residue situation, and based on this environment Imitating behaviour in service, in client and the interactive process of server, Streaming Media carries out the different switching of quality to multimedia file by DQN learning system, and the switching of high frequency low frequency kernel is to achieve the purpose that energy optimization.

Description

A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN
Technical field
The invention belongs to computer network communication technology fields, and in particular to a kind of HTTP self-adapting flow control based on DQN Energy consumption optimization method processed.
Background technique
In recent years, MultiMedia Field development speed is very fast, and the transmission of multimedia content is also increasingly valued by the people, HTTP video protocols are after internet is universal, and a kind of mode of the Online Video viewing of mainstream, http protocol transmits multimedia File is broadly divided into two stages, and first stage is the progressive download stage, is generally exactly to support user in downloading It plays, without being played again after finishing entire file download.But this is not really to transmit as a stream, with ordinary file Downloading is not different, and second stage HTTP fluidization technique, is mainly divided into media file in server end small one by one Slice, service receive the slice that request sends the media file by http response again, and in the friendship of server and client During mutually, client adjusts slice code rate according to the state of network in real time, and high code rate is used in the case where network state is good, Low bit- rate is used when network state is busy and is automatically switched, and the method mainly realized is each list text of the server section in offer There is dated code rate in part, the player of client can be automatically adjusted according to the progress of broadcasting and the speed of downloading, protect Demonstrate,prove play continuity and fluency on the basis of, as far as possible promotion user experience, and we to do is to guarantee everything Under the premise of the optimization of a deep level is carried out to client device energy consumption, in client terminal playing Online Video, network state is delayed Deposit state, mobile phone remaining capacity is the part that people often ignore, HTTP self adaptation stream there is also code rate selection flexibility compared with It is low, complicated Network status can not be coped with well, and the code rate of frequent switching video flowing can not only cause discomfort to viewer Experience also ignores switching bring energy consumption expense, here it is proposed that a kind of deep based on enhancing study and neural network The energy optimization model of q learning.
Q-learning is a kind of classical way of intensified learning, and the essential core thought of intensified learning is that intelligent body passes through With the continuous interaction of environment, intelligent body is by taking the suitable movement value that is recompensed to enter NextState, and Q-learning Core Q-table, row and column shows respectively state and action, and the Q value in Q-table is exactly that measurement state s takes The quality of a is acted, and how neural network works herein, we can be it as a flight data recorder, and input is One state value, output be this state value, and training data from whole system operate during generate Some data can be corrected during calculating and returning by these data, we use corrected value as nerve The input of network, second training are finally reached convergent effect, select optimal policy.
Summary of the invention
In order to overcome the above-mentioned deficiencies of the prior art, the object of the present invention is to provide a kind of HTTP based on DQN is adaptive Flow control energy consumption optimization method is increased using the q-learning that one kind combines BP (black propagation) neural network Strong study with environment to interact, and during user watches video online, environment is constantly in change, the change of network Change, the consumption of electricity, the system under changeable environment in video player video quality carry out Dynamic Matching switching with Dynamic dispatching is carried out to different cpu kernels, obtains most suitable media quality rank and most suitable cpu core.It is finally reached Reduce the function of energy consumption.
To achieve the goals above, the technical solution adopted by the present invention is that:
A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN, comprising the following steps:
1) environment acquisition modeling: using network used in Dummynet simulation daily life, in 3g, 4g, Wifi net Client is used under network environment, and current context information is acquired, and has client-side data cache state B respectively, i.e., currently Fragment length in buffer zone, network state N, the set of tri- states of battery capacity E composition, S=(B, N, E), by the time Multiple time points are divided into, are corresponded, and save data;
2) definition of client-action collection and Reward Program: according to environmental data collected in step 1) as state set The state space for establishing Q-learning, establishes the behavior aggregate of model, and system passes through to network state, buffer status and battery Mainly there are two action states to constitute to select suitably to act the behavior aggregate for establishing model into next state for electricity, cuts Change video quality, the switching of high frequency core low frequency core;The switching of video segment quality, will consume energy the sum of grade and handover overhead It is defined as Reward Program, Reward Program composition has following two points, and first is energy consumption grade point, by energy consumption grade, different networks Grade, different video qualities, different cpu core uses form a mapping relations, and energy consumption grade point here is by mapping It is chosen in table, second value is that video switching and big small nut switch brought expense, this value is a negative-feedback, so Reward Program expression formula are as follows: R=C1Re nergy+C2Rswitch, here with C1C2It is the weight of two return values respectively, according to user Preference stresses to set specific value, and weighted value can be 1;
3) algorithm is realized: using Deep Q Learning algorithm, the Q-learning for being combined with bp neural network is calculated Method chooses best movement, the main function of neural network is to be converted to high-dimensional state by the continuous interaction with environment Low-dimensional output, neural network is that ambient condition s is inputted by will become low latitudes state value in ambient condition, output The corresponding Q value of movement has used ε-greedy greedy algorithm, under each state, with small in the form of a vector Probability ε random selection acts action, and optimal action is selected according to bp neural network with 1- ε, later will be randomly selected Two are carried out in movement and the replay_buffer experience pond being added in our neural networks according to the action that neural network selects Movement is made in secondary training, reaches NextState, and neural metwork training optimizes input state, and output valve uses optimal solution strategy, defeated Optimal solution out;
4) in practical problem, equipment obtains ambient condition value by system, selects most matched quality video by DQN The kernel of user experience is not influenced with most power saving and.
The based environment information contains network hierarchy in the state set S of definition, is divided into six etc. on earth according to again high Grade, but by measurement, in the case of 1,2 two-stage or 3g can not quality is minimum in normal load test video, mobile phone electricity is remaining Value caches fragment length, calls the script of cache information by writing here, selects the buffer status of unit time point, also It is segment length.
System in the present invention distributes each segment reasonable by interacting with state changing in environment Stream media quality and reasonable CPU core, the experimental results showed that, this optimization method in the case where not influencing user experience, Mobile flow medium energy consumption caused by equipment can be effectively reduced, loading section energy consumption reduces 21 percent.
Detailed description of the invention
Fig. 1 is present system flow chart.
Fig. 2 is DQN learning process figure of the present invention.
Fig. 3 is application scenario diagram of the present invention.
Specific embodiment
The present invention is further discussed below with reference to embodiments, but the present invention does not limit to and following embodiment.
A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN, as shown in Figure 1,3, http self adaptation stream The groundwork situation of work be files in stream media is divided into one by one a lesser segment carry out HTTP request, transmit Deng, thus we first client receive be files in stream media slice, system acquisition network environment and current electricity Situation, and data are handled, specific process is as follows:
Definition status collection S, network hierarchy are divided into six grades according to again high on earth, but by measurement, 1,2 two-stage or 3g In the case of can not in normal load test video quality it is minimum, so being 0 to calculate by return value, mobile phone electricity remaining value is delayed Fragment length is deposited, calls the script of cache information by writing here, selects the buffer status of unit time point, that is, segment Length.
Define set of actions, the exploitation version Odroid XU3 that we use here, main Cortex-A15 high frequency kernel with Cortex-A7 low frequency kernel, movement here are mainly worked according to the variation of environment adjustment using which core, which core is slept It sleeps, main actions are task choosing A15 and task choosing A7, and stream media quality is divided into lossless, high definition, it is low clear, it is only limitted to here Test the video collection of test.
3) building of reward function and model is selected,
Firstly, initializing to neural network, we are each states of estimation using the main function of BP neural network here The value of lower movement, and the dimension of vector is reduced, to the learning rate α and discount factor γ in Q value iterative formula, and movement choosing Exploration probability ε assignment in selecting.To the period of each iteration, following process will do it as shown in Fig. 2, initializing completion Afterwards, the state state of system is inputted, and output is value caused by current action, we replace according to this output of estimation Optimal solution is found in output before instead of, optimization step by step, and after obtaining the value of each movement, we use ε- Gree strategy finds optimal solution, we initialize a threshold value here, and initial value is set to 0.8, that is to say, that we are dynamic in selection When work, 8 percent tenth is that random selection one movement, 2 percent tenth is that by neural computing act income, Choose it is most suitable that, with continuous study, that value that we initialize is lower and lower, until not randomly choosing.

Claims (2)

1. a kind of HTTP self adaptation stream based on DQN controls energy consumption optimization method, which comprises the following steps:
1) environment acquisition modeling: using network used in Dummynet simulation daily life, in 3g, 4g, Wifi network rings Client is used under border, and current context information is acquired, and has client-side data cache state B, i.e. current cache respectively Fragment length in region, network state N, the set of tri- states of battery capacity E composition, S=(B, N, E) will be divided the time It for multiple time points, corresponds, and saves data;
2) it the definition of client-action collection and Reward Program: is established according to environmental data collected in step 1) as state set The state space of Q-learning, establishes the behavior aggregate of model, and system passes through to network state, buffer status and battery capacity To select suitably to act the behavior aggregate for establishing model into next state, mainly there are two action states to constitute, and switching regards Frequency quality, the switching of high frequency core low frequency core;The switching of video segment quality, the sum of the grade and handover overhead of will consuming energy definition For Reward Program, Reward Program composition has following two points, and first is energy consumption grade point, by energy consumption grade, different network hierarchies, Different video qualities, different cpu core uses form a mapping relations, and energy consumption grade point here in mapping table by selecting It takes, second value is that video switching and big small nut switch brought expense, this value is a negative-feedback, so return letter Number expression formula are as follows: R=C1Re nergy+C2Rswitch, here with C1C2It is the weight of two return values respectively, according to user preference Stress to set specific value, weighted value can be 1;
3) algorithm is realized: being used Deep Q Learning algorithm, is combined with the Q-learning algorithm of bp neural network, leads to The continuous interaction with environment is crossed, chooses best movement, the main function of neural network is that high-dimensional state is converted to low-dimensional Output, neural network is that ambient condition s is inputted by will become low latitudes state value in ambient condition, output action Corresponding Q value has used ε-greedy greedy algorithm in the form of a vector, under each state, with small probability ε random selection acts action, optimal action is selected according to bp neural network with 1- ε, later by randomly selected movement Secondary instruction is carried out in the replay_buffer experience pond being added in our neural networks with the action selected according to neural network Practice, make movement, reach NextState, neural metwork training optimizes input state, and output valve uses optimal solution strategy, and output is most Excellent solution;
4) in practical problem, equipment obtains ambient condition value by system, by the most matched quality video of DQN selection and most Power saving and the kernel for not influencing user experience.
2. a kind of HTTP self adaptation stream based on DQN according to claim 1 controls energy consumption optimization method, feature exists Contain network hierarchy in, the based environment information, the state set S of definition, is divided into six grades on earth according to again high, but pass through Cross measurement, in the case of 1,2 two-stage or 3g can not in normal load test video quality it is minimum, mobile phone electricity remaining value, caching Fragment length calls the script of cache information by writing here, selects the buffer status of unit time point, that is, piece segment length It is short.
CN201910060941.7A 2019-01-23 2019-01-23 DQN-based HTTP adaptive flow control energy consumption optimization method Active CN109802964B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910060941.7A CN109802964B (en) 2019-01-23 2019-01-23 DQN-based HTTP adaptive flow control energy consumption optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910060941.7A CN109802964B (en) 2019-01-23 2019-01-23 DQN-based HTTP adaptive flow control energy consumption optimization method

Publications (2)

Publication Number Publication Date
CN109802964A true CN109802964A (en) 2019-05-24
CN109802964B CN109802964B (en) 2021-09-28

Family

ID=66560085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910060941.7A Active CN109802964B (en) 2019-01-23 2019-01-23 DQN-based HTTP adaptive flow control energy consumption optimization method

Country Status (1)

Country Link
CN (1) CN109802964B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414725A (en) * 2019-07-11 2019-11-05 山东大学 The integrated wind power plant energy-storage system dispatching method of forecast and decision and device
CN114885208A (en) * 2022-03-21 2022-08-09 中南大学 Dynamic self-adapting method, equipment and medium for scalable streaming media transmission under NDN (named data networking)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129974A1 (en) * 2016-11-04 2018-05-10 United Technologies Corporation Control systems using deep reinforcement learning
CN108063961A (en) * 2017-12-22 2018-05-22 北京联合网视文化传播有限公司 A kind of self-adaption code rate video transmission method and system based on intensified learning
CN108737382A (en) * 2018-04-23 2018-11-02 浙江工业大学 SVC based on Q-Learning encodes HTTP streaming media self-adapting methods
AU2017268276A1 (en) * 2016-05-16 2018-12-06 Wi-Tronix, Llc Video content analysis system and method for transportation system
CN108966330A (en) * 2018-09-21 2018-12-07 西北大学 A kind of mobile terminal music player dynamic regulation energy consumption optimization method based on Q-learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2017268276A1 (en) * 2016-05-16 2018-12-06 Wi-Tronix, Llc Video content analysis system and method for transportation system
US20180129974A1 (en) * 2016-11-04 2018-05-10 United Technologies Corporation Control systems using deep reinforcement learning
CN108063961A (en) * 2017-12-22 2018-05-22 北京联合网视文化传播有限公司 A kind of self-adaption code rate video transmission method and system based on intensified learning
CN108737382A (en) * 2018-04-23 2018-11-02 浙江工业大学 SVC based on Q-Learning encodes HTTP streaming media self-adapting methods
CN108966330A (en) * 2018-09-21 2018-12-07 西北大学 A kind of mobile terminal music player dynamic regulation energy consumption optimization method based on Q-learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HONGFENG XU: "Live Streaming with Content Centric Networking", 《2012 THIRD INTERNATIONAL CONFERENCE ON NETWORKING AND DISTRIBUTED COMPUTING》 *
VIRGINIA MARTIN: "Evaluation of Q-Learning approach for HTTP Adaptive Streaming", 《2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS》 *
熊丽荣: "基于Q-learning的HTTP自适应流码率控制方法研究", 《通信学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414725A (en) * 2019-07-11 2019-11-05 山东大学 The integrated wind power plant energy-storage system dispatching method of forecast and decision and device
CN114885208A (en) * 2022-03-21 2022-08-09 中南大学 Dynamic self-adapting method, equipment and medium for scalable streaming media transmission under NDN (named data networking)
CN114885208B (en) * 2022-03-21 2023-08-08 中南大学 Dynamic self-adapting method, equipment and medium for scalable streaming media transmission under NDN (network discovery network)

Also Published As

Publication number Publication date
CN109802964B (en) 2021-09-28

Similar Documents

Publication Publication Date Title
CN109639760B (en) It is a kind of based on deeply study D2D network in cache policy method
CN111835827B (en) Internet of things edge computing task unloading method and system
CN113434212B (en) Cache auxiliary task cooperative unloading and resource allocation method based on meta reinforcement learning
CN107690176B (en) Network selection method based on Q learning algorithm
CN110062357B (en) D2D auxiliary equipment caching system and caching method based on reinforcement learning
Gaing Constrained dynamic economic dispatch solution using particle swarm optimization
CN112218337B (en) Cache strategy decision method in mobile edge calculation
CN109814951A (en) The combined optimization method of task unloading and resource allocation in mobile edge calculations network
CN110351754A (en) Industry internet machinery equipment user data based on Q-learning calculates unloading decision-making technique
CN113114756A (en) Video cache updating method for self-adaptive code rate selection in mobile edge calculation
CN110312277B (en) Mobile network edge cooperative cache model construction method based on machine learning
Zhu et al. Computation offloading for workflow in mobile edge computing based on deep Q-learning
CN109802964A (en) A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN
Yan et al. Distributed edge caching with content recommendation in fog-rans via deep reinforcement learning
CN114205791A (en) Depth Q learning-based social perception D2D collaborative caching method
CN107949007A (en) A kind of resource allocation algorithm based on Game Theory in wireless caching system
CN116321307A (en) Bidirectional cache placement method based on deep reinforcement learning in non-cellular network
Lin et al. Vehicle-to-cloudlet: Game-based computation demand response for mobile edge computing through vehicles
CN111314960A (en) Social awareness-based collaborative caching method in fog wireless access network
US11570063B2 (en) Quality of experience optimization system and method
CN111643901A (en) Method and device for intelligently rendering cloud game interface
CN112822727B (en) Self-adaptive edge content caching method based on mobility and popularity perception
Mowafi et al. Energy efficient fuzzy-based DASH adaptation algorithm
CN113672372B (en) Multi-edge collaborative load balancing task scheduling method based on reinforcement learning
Lin et al. Knn-q learning algorithm of bitrate adaptation for video streaming over http

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant