CN113596138A - Heterogeneous information center network cache allocation method based on deep reinforcement learning - Google Patents
Heterogeneous information center network cache allocation method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN113596138A CN113596138A CN202110843043.6A CN202110843043A CN113596138A CN 113596138 A CN113596138 A CN 113596138A CN 202110843043 A CN202110843043 A CN 202110843043A CN 113596138 A CN113596138 A CN 113596138A
- Authority
- CN
- China
- Prior art keywords
- content
- network
- heterogeneous
- cache
- learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000002787 reinforcement Effects 0.000 title claims abstract description 25
- 238000013528 artificial neural network Methods 0.000 claims abstract description 49
- 238000005457 optimization Methods 0.000 claims abstract description 22
- 230000008859 change Effects 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims abstract description 6
- 230000000875 corresponding effect Effects 0.000 claims description 47
- 230000009471 action Effects 0.000 claims description 32
- 238000005265 energy consumption Methods 0.000 claims description 25
- 230000008569 process Effects 0.000 claims description 15
- 238000011156 evaluation Methods 0.000 claims description 5
- 238000011478 gradient descent method Methods 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/568—Storing data temporarily at an intermediate stage, e.g. caching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a heterogeneous information center network cache allocation method based on deep reinforcement learning, and relates to the technical field of network cache space allocation. The method specifically comprises the following steps: abstracting a heterogeneous ICN into a topological model; defining a dynamically changing content request in a heterogeneous ICN; converting the cache space distribution problem of the heterogeneous ICN into a network performance optimization problem, and constructing a network performance optimization model, wherein the network performance optimization model comprises an optimization objective function and corresponding constraints; applying a Q learning algorithm to each content request to obtain a cache allocation scheme with optimal network performance corresponding to the content request at each moment: combining the deep neural network with a Q learning algorithm, and training an optimal cache allocation scheme which is suitable for the content request of the heterogeneous ICN dynamic change by utilizing a cache allocation scheme with optimal network performance corresponding to the content request at each moment solved by the Q learning algorithm. The method can adaptively solve the cache allocation scheme with the optimal network performance and can be more suitable for dynamically changing network requests.
Description
Technical Field
The invention relates to the technical field of heterogeneous information center networks, in particular to a cache allocation method of a heterogeneous information center network based on deep reinforcement learning.
Background
With the development of internet technology, more and more network users and more requests for network content are provided. Information Centric Networking (ICN) is a new type of Network architecture that caches content provided by servers on routers to serve users. The outstanding advantage of ICN over traditional network architectures is network caching, where each router can store content. Because the Content Router (Content Router) in the ICN caches different contents from the server, the Content requested by the user is responded by the Router storing the requested Content, thereby avoiding the overhead of long-distance transmission from the client to the server and greatly improving the response speed. For network caching in ICNs, cache allocation (allocation of cache capacity to each content router) is the basis for caching content. In heterogeneous ICNs, each content router may be allocated a different size of cache capacity, which becomes more complex than in homogeneous ICNs. In addition, since the cost of configuring the cache space for the content router is expensive and consumes energy, if the cache space allocated to the content router is too large, unnecessary waste may be caused; if the allocated cache space is too small to meet the request requirement of the cache user, the user experience and the network performance are affected. Therefore, allocating the appropriate cache space for each content router is important to optimize heterogeneous ICN network performance.
For cache allocation of heterogeneous ICNs, two aspects are mainly considered: the method comprises the following steps that firstly, the centrality of a router in a network topology is high, the higher the centrality is, the higher the importance degree of a node in a topological structure is, and the higher the cache capacity needs to be allocated; secondly, the request frequency of the nodes, and the more frequently requested nodes need to allocate more cache space. The existing cache allocation methods of heterogeneous ICN are divided into two types: one is to perform cache allocation based on the importance of the nodes in the network topology; the other method is to convert the cache allocation problem into a network performance optimization problem and obtain an optimal cache allocation scheme by solving an optimal solution which enables the network performance to be optimal. However, these methods are all performed for static networks, and in reality, network requests are dynamically changed, and the existing methods cannot meet the dynamic requirements of the network requests.
Disclosure of Invention
In order to solve the above problems, the present invention provides a heterogeneous information center network cache allocation method based on deep reinforcement learning, and aims to allocate a suitable cache space for each routing node according to the dynamic property of a network request.
The technical scheme of the invention is as follows:
a cache allocation method of a heterogeneous information center network based on deep reinforcement learning comprises the following steps:
step 1: abstracting a heterogeneous ICN into a topological model;
step 2: defining a dynamically changing content request in a heterogeneous ICN;
and step 3: converting the cache space distribution problem of the heterogeneous ICN into a network performance optimization problem of the heterogeneous ICN, and constructing a network performance optimization model, wherein the network performance optimization model comprises an optimization objective function and corresponding constraints;
and 4, step 4: applying a Q learning algorithm to each content request in the heterogeneous ICN to obtain a cache allocation scheme with optimal network performance corresponding to the content request at each moment:
and 5: and (4) combining the deep neural network with the Q learning algorithm, and training an optimal cache allocation scheme which is suitable for the content request with the dynamic change of the heterogeneous ICN by using the cache allocation scheme with the optimal network performance corresponding to the content request at each moment solved by using the Q learning algorithm in the step 4.
Further, according to the cache allocation method of the heterogeneous information center network based on deep reinforcement learning, the heterogeneous ICN with n content routers is abstracted into a topology model G (V, E, C, Long, Lati):
wherein V represents a group consisting ofA content router set composed of the n content routers; e represents a set of edges between content routers; c represents a set of cache capacities allocated to the content routers; long represents the longitude of the location of the content router in the topology model G; lati represents the latitude of the position of the content router in the topology model G; CRiRepresents the ith content router; e.g. of the typeijPresentation content router CRiAnd the jth content router CRjA path between; c. CiPresentation content router CRiThe allocated buffer capacity; longiPresentation content router CRiLongitude of the location in the topology model G; lati iiPresentation content router CRiLatitude of the position in the topological model G; CRi,eijYet further can be expressed as follows:
wherein,indicating allocated buffer capacity ciThe ith content router of (1) in the content router,presentation content routerAnd allocated buffer capacity cjJth content router of (1)Path between CmaxIndicating the maximum cache capacity that the content router can allocate.
Further, according to the heterogeneous information center network cache allocation method based on deep reinforcement learning, the hit rate and energy consumption of a content request are used as evaluation indexes of the performance of the heterogeneous ICN, and an optimization objective function shown in a formula (12) is established:
wherein, NetPtotalIs the overall network performance of the heterogeneous ICN;indicating a successful cache hit CRiThe number of times of the operation of the motor,represents CRiThe total number of requests received is,presentation content router CRiThe request hit rate of;indicating a routing node CRiEnergy consumption of (2); piIs CRiFixed energy consumption of router hardware when caching content;to pass through CRiTransmitting energy consumption corresponding to the content of the unit byte; traiTo pass through CRiThe size of the data stream of (a);representing content requesting nodes CRjAnd service node CRiThe distance of (d); ω and μ are request hit rate and energy consumption, respectively, for content router CRiAnd caching the weight value of the network performance corresponding to the content of the unit size.
Further, according to the method for allocating cache of a heterogeneous information-centric network based on deep reinforcement learning, the constraints include a cache space constraint of each content router shown in formula (13) and a cache space constraint in the overall network topology:
wherein, CmaxRepresenting the maximum cache capacity that a content router in a heterogeneous ICN can allocate; ctotalRepresenting the maximum cache space of the whole of all content routers in the heterogeneous ICN.
Further, according to the method for allocating cache of the heterogeneous information center network based on deep reinforcement learning, the method for applying the Q learning algorithm to each content request in the heterogeneous ICN comprises the following steps: the content request at each time is expressed as a Q-learned state Status ═ s1,s2,…,stIn which s istRequesting q for content at time ttThe corresponding Q-learned state; representing a topology model G (V, E, C, Long, Lati) of the heterogeneous information-centric network as an Environment of Q learning { E }1,e2,…,etIn which etRequesting q for content at time ttA corresponding Q learning environment; representing the cache allocation scheme for a content router as the Action of Q-learning ═ { a }1,a2,…,atIn which a istRequesting q for content at time ttA corresponding Q learning action; performing a cache allocation scheme for network content requests returns a network performance value, denoted as the reward value of Q learning, r1,r2,…,rtIn which r istRequesting q for content at time ttThe corresponding Q learned reward value; in the Q learning process, selecting the action with the maximum reward value corresponding to each state for execution, and obtaining the strategy of Q learning after the Q learning process is finishedThe action with the largest corresponding prize value is selected for each entered state and executed.
Further, according to the heterogeneous information center network cache allocation method based on deep reinforcement learning, the deep neural network is a BP neural network.
Further, according to the heterogeneous information center network cache allocation method based on deep reinforcement learning, the step 5 includes the following specific steps:
step 5.1: randomly initializing a weight theta of the BP neural network;
step 5.2: learning the state and action(s) of Q at time T within a period Tt,at) As the input value of the neural network, the maximum reward value R(s) obtained by the Q learning algorithm is used correspondinglyt,atθ) and corresponding actions atOutput value y as a deep neural networkoutput;
Step 5.3: calculating an estimated value of an output value of the BP neural network according to a Bellman equation;
step 5.4: calculating a corresponding loss value according to the output value of the BP neural network and the estimated value of the output value;
step 5.5: updating the weight of the BP neural network by adopting a gradient descent method according to the loss value;
step 5.6: and according to the method of the steps 5.2 to 5.5, repeatedly executing the steps 5.2 to 5.5, and iteratively updating the theta until a condition T of stopping iteration is met, so as to obtain a final weight theta of the neural network, wherein the weight theta is used as an optimal cache allocation scheme of the content request adapting to the dynamic change of the T time period.
Compared with the prior art, the heterogeneous information center network cache allocation method based on deep reinforcement learning has the following beneficial effects: after the heterogeneous information center network is modeled, the dynamic property of the network request is analyzed, and compared with the existing topological model of the heterogeneous information center network, the dynamic network model is more in line with the actual situation. The deep learning and the Q learning are combined and applied to the problem of cache allocation of the dynamic heterogeneous information center network, and compared with the existing cache allocation method, the cache allocation scheme with the optimal network performance can be solved in a self-adaptive mode, and the dynamic heterogeneous information center network cache allocation method can be more suitable for network requests with dynamic changes.
Drawings
FIG. 1 is a schematic diagram of an information-centric network architecture;
fig. 2 is a schematic flow chart of a cache allocation method of a heterogeneous information center network based on deep reinforcement learning according to the embodiment;
fig. 3 is a schematic structural diagram of the deep Q learning algorithm of the present embodiment;
fig. 4 is a schematic flowchart of solving the network cache allocation scheme by deep learning according to this embodiment.
Detailed Description
To facilitate an understanding of the present application, the present application will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present application are given in the accompanying drawings. This application may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
The invention provides a dynamic cache space allocation scheme aiming at the problem of cache space allocation of nodes in a heterogeneous information center network, and particularly provides a cache allocation strategy for the nodes in the network to adapt to the dynamic property of the network. When modeling is carried out aiming at the cache allocation problem, the network request hit rate and the energy consumption are used as performance evaluation indexes, the hit rate and the energy consumption are integrated into a comprehensive performance evaluation cache allocation scheme, and the cache allocation problem is modeled into the problem of maximizing the network performance. In order to obtain the optimal cache allocation of each content request, the cache allocation is used as an action selected by an agent by applying a reinforcement learning method, and a cache allocation scheme corresponding to the optimal performance of each request is obtained. In order to adapt to the dynamic property of the network request, the existing content request is used as input, the cache allocation scheme obtained by reinforcement learning is used as output, and the optimal cache allocation scheme adapting to the dynamic request at different moments is obtained through training.
Fig. 1 is a schematic diagram of an information-centric network architecture, which is composed of nodes and paths between the nodes, wherein the nodes include request nodes, routing nodes and service nodes. The request node is responsible for receiving a content request of a user and transmitting the request to the routing node; the routing node is responsible for transmitting requests or contents and can cache the contents; the service node stores the content and is responsible for returning the requested content to the user. Nodes between paths are responsible for passing requests or content. When a user sends a content request to a request node, the request node transmits the request to a routing node through a path, the routing node judges whether the request content is cached or not, if the request content is cached, the content is returned to the request node, and if the request content is not cached, the request is delivered to the next routing node or a service node according to a forwarding information base; finally, the request is transmitted to a routing node or a service node which caches the request content, and the node returns the content to the request node according to the request path to complete the request. The efficiency of the process of one-time request completion is proportional to network performance and is related to the cache space and corresponding cache contents of each routing node. Through proper cache allocation, the frequently requested content is cached in the corresponding node frequently requesting the content, so that the network performance can be improved, and the content requesting efficiency can be improved. The invention provides a heterogeneous information center network cache allocation method based on deep reinforcement learning, and aims to allocate a proper cache space for each routing node.
Fig. 2 is a schematic flow chart of a deep reinforcement learning-based heterogeneous information centric network cache allocation method provided by the present invention, where the deep reinforcement learning-based heterogeneous information centric network cache allocation method includes the following steps:
step 1: abstracting a heterogeneous information center network into a topological model;
in this embodiment, a heterogeneous information center network with n content routers is abstracted into a topology model G (V, E, C, Long, Lati), where V represents a content router set composed of the n content routers; e represents a set of edges between content routers; c represents a set of cache capacities allocated to the content routers; long represents the longitude of the location of the content router in the topology model; lati represents the latitude of the position of the content router in the topological model; each component of the heterogeneous information center network topology model is specifically expressed as follows:
wherein, CRiRepresents the ith content router; e.g. of the typeijPresentation content router CRiAnd the jth content router CRjA path between; c. CiPresentation content router CRiThe allocated buffer capacity; longiPresentation content router CRiLongitude of the location in the topology model; lati iiPresentation content router CRiThe latitude of the location in the topology model. CRi,eijYet further can be expressed as follows:
wherein,indicating allocated buffer capacity ciThe ith content router of (1) in the content router,presentation content routerAnd allocated buffer capacity cjJth content router of (1)Path between CmaxIndicating the maximum cache capacity that the content router can allocate.
Step 2: on the basis of a topological model of a heterogeneous information center network, defining a dynamically changing content request;
the content request at each moment is dynamically changed, and the content request Qr in the period T is defined as:
Qr={qt|1≤t≤T} (3)
wherein q istRefers to a content request that occurs at time t by the network, including: content request node and content request node-in-topology modelLatitude and longitude of the location of (1), the requested content, the content server node providing the requested content, and the request time.
To elaborate the dynamically changing network requests at different times, qtCan be further expressed as:
wherein,respectively represent qtThe content requesting node of the kth content request, the requested content, the longitude of the location of the content requesting node in the network topology model, the latitude of the location of the content requesting node in the network topology model, the content server node providing the requested content, and the request time.
On the basis of a static network topology model, the dynamic analysis of the network request is added, and the requirements of the dynamically changing network request on different cache spaces can be met.
And step 3: converting the cache allocation problem into a network performance optimization problem, and constructing a network performance optimization model, wherein the network performance optimization model comprises an optimization objective function and corresponding constraints;
in the embodiment, the cache allocation problem is converted into the optimization problem of the network performance, and the hit rate and the energy consumption of the content request are used as the evaluation indexes of the network performance for the network content request. With Etotal、HtotalRespectively representing the energy consumption and hit rate of the whole network by eci、hriEach content router CR is represented separatelyiThe unit energy consumption and the unit hit rate of the network are respectively the sum of the energy consumption of each router and the sum of the hit rate, and are specifically expressed as follows:
wherein, ci={0,1,2,...,Cmax},CmaxMaximum buffer capacity that can be allocated for each router, ci0 represents CRiNot allocated cache, c i1 represents CRiA buffer allocated 1 predetermined unit, ci2 represents CRiThe cache of 2 preset units is distributed; hr ofiPresentation content router CRiBy the content router CR as shown in equation (6)iNumber of requests and content router CR received and successfully hitiThe ratio of all requests received is calculated, wherein the content router CRiThe number of requests received and successfully hit is the actual number of requests occurring at the content router CRiAnd node CRiThe number of requests for which the requested content is cached, and the content router CRiThe number of all requests received is the actual number of requests occurring at the node CRiThe number of requests above; eciIndicating a routing node CRiThe energy consumption of the ICN is calculated according to the formula (7), and comprises two parts of caching energy consumption and transmission energy consumption, and the cost of ICN content caching is reflected. The cache energy consumption refers to energy consumed by the router for caching content, is related to the caching performance of the router and the size of the caching content, and is calculated. The transmission energy consumption refers to energy consumed by the router for transmitting the request, is related to the size of the transmitted content, and is used for calculating the time consumed by transmission energy;
wherein,indicating a successful cache hit CRiThe number of times of the operation of the motor,represents CRiThe total number of requests received.
Wherein, PiIs CRiFixed energy consumption of router hardware when caching content;to pass through CRiEnergy consumption, t, corresponding to the content of a transmission unit byteiIs CRiRun time of traiTo pass through CRiThe size of the data stream.
The running time includes the time for the node to process the cache request and the transmission time for returning the request content to the requesting node, assuming the processing time is ignored, the CRjRequesting a node for the content, then tiThe calculation was performed according to equation (8).
Wherein distancei,jRepresenting content requesting nodes CRjAnd service node CRiThe distance of (2) is calculated according to the position of the node in the heterogeneous information center network topology model, and the formula (9) is referred to:
NetPipresentation content router CRiThe network performance corresponding to the content of the cache unit size is proportional to the hit rate and inversely proportional to the energy consumption, NetPiCalculation with reference to equation (10):
NetPi=ωhri+μeci (10)
wherein, ω andμ for hit rate and energy consumption respectively for content router CRiAnd caching the weight value of the network performance corresponding to the content of the unit size.
In the whole heterogeneous information center network topology, the whole network performance NetPtotalIs represented as follows:
aiming at the problem of cache space allocation of an ICN node, the goal is to find a cache allocation scheme, so that the network performance is optimal for dynamic content requests, namely the overall network performance is maximized, and an optimization objective function shown in formula (12) is established:
while maximizing network performance, a single node cache space and all network cache spaces need to satisfy certain constraint conditions, as shown in formula (13), including the cache space constraint of each content router and the cache space constraint in the overall network topology:
and (3) a final network performance optimization model is shown as an equation (14):
in the above formula, ciDenotes the ith content Router CRiThe allocated buffer capacity; ctotalThe maximum cache space of the whole router representing all the content in the network;
and 4, step 4: applying a Q learning algorithm to cache allocation of a heterogeneous information center network, applying the Q learning algorithm to each content request of the network, and obtaining a cache allocation scheme with optimal network performance corresponding to the content request at each moment:
because the network structure cannot change along with time change in practice, the network dynamics is mainly embodied in the network request dynamics, different content requests can occur in the network at different times, and the network dynamics is caused, so when the Q learning is applied to cache allocation of the heterogeneous information center network, the content request at each time is expressed as a Q learning state Status, and for the content requests at different times, the Q learning state is specifically expressed as Status ═ s {(s) }1,s2,…,stIn which s istRequesting q for content at time ttThe corresponding Q-learned state; representing a topological model G (V, E, C, Long, Lati) of the heterogeneous information center network as an Environment for Q learning; the cache allocation scheme for the content router is represented as Q-learned Action, and the execution of the cache allocation scheme for the network content request returns a network performance value, represented as Q-learned reward value, rework. In the Q learning process, the action with the maximum reward value corresponding to each state is selected and executed. After the Q learning process is finished, the obtained Policy of Q learning selects the action with the maximum reward value for each input state to execute.
The action of Q learning refers to different cache allocation schemes allocated to the network, specifically, a certain cache space is allocated to each routing node, and the constraint condition of the size of the cache space of the routing node is met. In a heterogeneous information centric network, the cache space size of each node may be unequal. In a real network, the number of nodes is often huge, and the cache space of each node can be selected, so that the selectable cache allocation schemes of the network are various, namely Q learning has a plurality of selectable actions. Network content request q for different time instantstThe Action of Q learning is specifically expressed as Action ═ { a ═ a1,a2,…,atIn which a istRequesting q for content at time ttCorresponding Q learning actions.
The environment for Q learning consists of an integral network, can interact with states through actions and returns to statesAnd evaluating the quality of the action according to the corresponding reward value of the state. The Environment of Q learning is specifically expressed as Environment { e } for network content requests at different times1,e2,…,etIn which etRequesting q for content at time ttCorresponding to the environment of Q learning.
The reward value for Q learning refers to the value that a state performs an action interacting with the environment, which returns to the state, represented by the network performance. For different cache allocation schemes, the network has different performances when processing the request, and the higher the performance is, the better the corresponding cache allocation scheme is, i.e. the action with the high reward value is selected to be executed. The reward value is obtained through network performance calculation, in a known network topology model, different network performances can be obtained through different cache allocation schemes for the same network content request, and the Q learning algorithm can select the cache allocation scheme corresponding to the optimal network performance, namely the action of Q learning and selecting the reward value to be the maximum. Network content request q for different time instantstThe reward value for Q learning is specifically expressed as rework ═ { r ═ r1,r2,…,rtIn which r istRequesting q for content at time ttThe corresponding Q learned reward value.
The strategy of Q learning directs a state to select an action. Depending on the policy, a cache allocation scheme for a network content request may be determined. The strategy is expressed as
And 5: and (4) combining the deep neural network and the Q learning, applying the combined result to the solution of the cache allocation scheme of the heterogeneous information center network, and training an optimal cache allocation scheme of the content request adapting to the dynamic change of the network according to the cache allocation scheme with optimal network performance corresponding to the network content request at each moment solved by the Q learning in the step 4.
In the deep neural network portion, a bp (back propagation) neural network is used in the present embodiment, and includes two processes of forward propagation and backward propagation. The forward propagation process is used for constructing a neural network structure, as shown in fig. 3, the state and action of Q learning are regarded as input of the neural network, the reward value is regarded as output of the neural network, the strategy is regarded as weight of the neural network, and the neural network can train the optimal weight fitting input and output at different times. The back propagation process is used to adjust the neural network structure, optimizing the weights by minimizing the loss value per training, where the loss value is related to the output value of the neural network and an estimate of the output value of the neural network. The information center network cache allocation process for solving the dynamic request by the deep Q learning algorithm specifically comprises the following steps:
as shown in fig. 3, the state and action of Q learning serves as input to the neural network. The input layer of the neural network receives input data, and inputs the state and Action of Q learning, denoted as (Status, Action). And the output layer of the neural network receives the performance value of the network output by the Q learning and takes the reward value of the Q learning as the output of the neural network. In a state stAccording to the policy(s)t,at) Performing action atObtain the corresponding reward value rtIs shown as rt(st,at;policy(st,at)). The weight value theta of the neural network corresponds to the strategy Policy of Q learning.
Reward value rtWith reference to equation (15):
as shown in fig. 4, the step 5 includes the following specific steps:
step 5.1: randomly initializing a weight theta of the BP neural network;
step 5.2: learning the state and action(s) of Q at time T within a period Tt,at) As input value x for a neural networkinputThe maximum reward value R(s) for Q learning is accordingly determinedt,atθ) and corresponding actions atOutput value y as a deep neural networkoutput。
Step 5.3: and calculating an estimated value of the output value of the BP neural network according to the Bellman equation.
Wherein alpha and gamma are respectively the learning rate and discount rate of the Bellman equation, and a is the state st+1A corresponding selectable action.
Step 5.4: calculating a corresponding loss value according to the output value of the BP neural network and the estimated value of the output value;
and in the back propagation process of the BP neural network, the BP neural network adjusts the weight according to the loss value. The loss value is calculated from the output value of the neural network and the estimated value of the output value.
Calculation of loss values with reference to equation (17):
wherein m is the number of neurons of the preset output layer.
Step 5.5: and updating the weight of the BP neural network by adopting a gradient descent method according to the loss value.
In order to make the weight value approach to the optimum continuously, the weight value should be updated toward the direction of decreasing loss value. The updating mode of the weight value refers to a formula (18), and in the back propagation process, the weight value of the neural network is adjusted according to the loss value by adopting a gradient descent method, namely according to loss (x)input,youtputθ) update weight θ, expressed as follows:
wherein η is the learning rate of the gradient descent method. Since the objective of the algorithm is to find the weight corresponding to the minimum loss value, η < 0.
Step 5.6: and repeating the steps 5.2-5.5, and iterating the process of updating theta until the condition of stopping iteration is met, namely T is equal to T, wherein the obtained theta is the weight of the final neural network, and the obtained final weight is the optimal cache allocation strategy adapted to the dynamic request of the T time period.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.
Claims (7)
1. A cache allocation method of a heterogeneous information center network based on deep reinforcement learning is characterized by comprising the following steps:
step 1: abstracting a heterogeneous ICN into a topological model;
step 2: defining a dynamically changing content request in a heterogeneous ICN;
and step 3: converting the cache space distribution problem of the heterogeneous ICN into a network performance optimization problem of the heterogeneous ICN, and constructing a network performance optimization model, wherein the network performance optimization model comprises an optimization objective function and corresponding constraints;
and 4, step 4: applying a Q learning algorithm to each content request in the heterogeneous ICN to obtain a cache allocation scheme with optimal network performance corresponding to the content request at each moment:
and 5: and (4) combining the deep neural network with the Q learning algorithm, and training an optimal cache allocation scheme which is suitable for the content request with the dynamic change of the heterogeneous ICN by using the cache allocation scheme with the optimal network performance corresponding to the content request at each moment solved by using the Q learning algorithm in the step 4.
2. The deep reinforcement learning-based cache allocation method for the heterogeneous information-centric network according to claim 1, wherein the heterogeneous ICN with n content routers is abstracted as a topology model G (V, E, C, Long, Lati):
wherein V represents a content router set composed of the n content routers; e represents a set of edges between content routers; c represents a set of cache capacities allocated to the content routers; long represents the longitude of the location of the content router in the topology model G; lati represents the latitude of the position of the content router in the topology model G; CRiRepresents the ith content router; e.g. of the typeijPresentation content router CRiAnd the jth content router CRjA path between; c. CiPresentation content router CRiThe allocated buffer capacity; longiPresentation content router CRiLongitude of the location in the topology model G; lati iiPresentation content router CRiLatitude of the position in the topological model G; CRi,eijYet further can be expressed as follows:
3. The heterogeneous information center network cache allocation method based on deep reinforcement learning as claimed in claim 2, wherein the hit rate and energy consumption of content requests are used as evaluation indexes of the performance of the heterogeneous ICN network, and an optimization objective function shown in formula (12) is established:
wherein, NetPtotalIs the overall network performance of the heterogeneous ICN;indicating a successful cache hit CRiThe number of times of the operation of the motor,represents CRiThe total number of requests received is,presentation content router CRiThe request hit rate of;indicating a routing node CRiEnergy consumption of (2); piIs CRiFixed energy consumption of router hardware when caching content;to pass through CRiTransmitting energy consumption corresponding to the content of the unit byte; traiTo pass through CRiThe size of the data stream of (a);representing content requesting nodes CRjAnd service node CRiThe distance of (d); ω and μ are request hit rate and energy consumption, respectively, for content router CRiAnd caching the weight value of the network performance corresponding to the content of the unit size.
4. The method for cache allocation in the heterogeneous information-centric network based on deep reinforcement learning according to claim 2, wherein the constraints include a cache space constraint of each content router and a cache space constraint in the overall network topology as shown in formula (13):
wherein, CmaxRepresenting the maximum cache capacity that a content router in a heterogeneous ICN can allocate; ctotalRepresenting the maximum cache space of the whole of all content routers in the heterogeneous ICN.
5. The method for distributing the cache of the heterogeneous information-centric network based on the deep reinforcement learning of claim 2, wherein the method for applying the Q learning algorithm to each content request in the heterogeneous ICN comprises: the content request at each time is expressed as a Q-learned state Status ═ s1,s2,…,stIn which s istRequesting q for content at time ttThe corresponding Q-learned state; representing a topology model G (V, E, C, Long, Lati) of the heterogeneous information-centric network as an Environment of Q learning { E }1,e2,…,etIn which etRequesting q for content at time ttA corresponding Q learning environment; representing the cache allocation scheme for a content router as the Action of Q-learning ═ { a }1,a2,…,atIn which a istRequesting q for content at time ttA corresponding Q learning action; performing a cache allocation scheme for network content requests returns a networkPerformance value, Reword ═ r expressed as Q learning1,r2,…,rtIn which r istRequesting q for content at time ttThe corresponding Q learned reward value; in the Q learning process, selecting the action with the maximum reward value corresponding to each state for execution, and obtaining the strategy of Q learning after the Q learning process is finishedThe action with the largest corresponding prize value is selected for each entered state and executed.
6. The method according to claim 5, wherein the deep neural network is a BP neural network.
7. The method for allocating the cache of the heterogeneous information-centric network based on the deep reinforcement learning of claim 6, wherein the step 5 comprises the following specific steps:
step 5.1: randomly initializing a weight theta of the BP neural network;
step 5.2: learning the state and action(s) of Q at time T within a period Tt,at) As the input value of the neural network, the maximum reward value R(s) obtained by the Q learning algorithm is used correspondinglyt,atθ) and corresponding actions atOutput value y as a deep neural networkoutput;
Step 5.3: calculating an estimated value of an output value of the BP neural network according to a Bellman equation;
step 5.4: calculating a corresponding loss value according to the output value of the BP neural network and the estimated value of the output value;
step 5.5: updating the weight of the BP neural network by adopting a gradient descent method according to the loss value;
step 5.6: and according to the method of the steps 5.2 to 5.5, repeatedly executing the steps 5.2 to 5.5, and iteratively updating the theta until a condition T of stopping iteration is met, so as to obtain a final weight theta of the neural network, wherein the weight theta is used as an optimal cache allocation scheme of the content request adapting to the dynamic change of the T time period.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110843043.6A CN113596138B (en) | 2021-07-26 | 2021-07-26 | Heterogeneous information center network cache allocation method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110843043.6A CN113596138B (en) | 2021-07-26 | 2021-07-26 | Heterogeneous information center network cache allocation method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113596138A true CN113596138A (en) | 2021-11-02 |
CN113596138B CN113596138B (en) | 2022-06-21 |
Family
ID=78250075
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110843043.6A Active CN113596138B (en) | 2021-07-26 | 2021-07-26 | Heterogeneous information center network cache allocation method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113596138B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116996921A (en) * | 2023-09-27 | 2023-11-03 | 香港中文大学(深圳) | Whole-network multi-service joint optimization method based on element reinforcement learning |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106131202A (en) * | 2016-07-20 | 2016-11-16 | 中南大学 | In Information central site network, caching based on fluid dynamic theory places decision-making methods of marking |
EP3206348A1 (en) * | 2016-02-15 | 2017-08-16 | Tata Consultancy Services Limited | Method and system for co-operative on-path and off-path caching policy for information centric networks |
CN108322352A (en) * | 2018-03-19 | 2018-07-24 | 北京工业大学 | It is a kind of based on the honeycomb isomery caching method to cooperate between group |
WO2018236723A1 (en) * | 2017-06-19 | 2018-12-27 | Northeastern University | Joint routing and caching method for content delivery with optimality guarantees for arbitrary networks |
CN110049039A (en) * | 2019-04-15 | 2019-07-23 | 哈尔滨工程大学 | A kind of information centre's network-caching contamination detection method based on GBDT |
CN110138748A (en) * | 2019-04-23 | 2019-08-16 | 北京交通大学 | A kind of network integration communication means, gateway and system |
CN111586439A (en) * | 2020-05-25 | 2020-08-25 | 河南科技大学 | Green video caching method for cognitive content center network |
CN111885648A (en) * | 2020-07-22 | 2020-11-03 | 北京工业大学 | Energy-efficient network content distribution mechanism construction method based on edge cache |
CN112995950A (en) * | 2021-02-07 | 2021-06-18 | 华南理工大学 | Resource joint allocation method based on deep reinforcement learning in Internet of vehicles |
-
2021
- 2021-07-26 CN CN202110843043.6A patent/CN113596138B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3206348A1 (en) * | 2016-02-15 | 2017-08-16 | Tata Consultancy Services Limited | Method and system for co-operative on-path and off-path caching policy for information centric networks |
CN106131202A (en) * | 2016-07-20 | 2016-11-16 | 中南大学 | In Information central site network, caching based on fluid dynamic theory places decision-making methods of marking |
WO2018236723A1 (en) * | 2017-06-19 | 2018-12-27 | Northeastern University | Joint routing and caching method for content delivery with optimality guarantees for arbitrary networks |
CN108322352A (en) * | 2018-03-19 | 2018-07-24 | 北京工业大学 | It is a kind of based on the honeycomb isomery caching method to cooperate between group |
CN110049039A (en) * | 2019-04-15 | 2019-07-23 | 哈尔滨工程大学 | A kind of information centre's network-caching contamination detection method based on GBDT |
CN110138748A (en) * | 2019-04-23 | 2019-08-16 | 北京交通大学 | A kind of network integration communication means, gateway and system |
CN111586439A (en) * | 2020-05-25 | 2020-08-25 | 河南科技大学 | Green video caching method for cognitive content center network |
CN111885648A (en) * | 2020-07-22 | 2020-11-03 | 北京工业大学 | Energy-efficient network content distribution mechanism construction method based on edge cache |
CN112995950A (en) * | 2021-02-07 | 2021-06-18 | 华南理工大学 | Resource joint allocation method based on deep reinforcement learning in Internet of vehicles |
Non-Patent Citations (4)
Title |
---|
PING ZHOU等: ""An Ant Colony Inspired Cache Allocation Mechanism"", 《IEEE ACCESS》 * |
姚进发: "命名数据网络中基于Dec-POMDP的缓存策略", 《信息技术与网络安全》 * |
田铭等: "信息中心网络中基于局部内容活跃度的自适应缓存算法", 《计算机科学》 * |
郭建宇等: "面向ICN的非合作博弈优化缓存策略", 《电讯技术》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116996921A (en) * | 2023-09-27 | 2023-11-03 | 香港中文大学(深圳) | Whole-network multi-service joint optimization method based on element reinforcement learning |
CN116996921B (en) * | 2023-09-27 | 2024-01-02 | 香港中文大学(深圳) | Whole-network multi-service joint optimization method based on element reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN113596138B (en) | 2022-06-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Qi et al. | Knowledge-driven service offloading decision for vehicular edge computing: A deep reinforcement learning approach | |
He et al. | QoE-based task offloading with deep reinforcement learning in edge-enabled Internet of Vehicles | |
CN111556461B (en) | Vehicle-mounted edge network task distribution and unloading method based on deep Q network | |
CN110365514B (en) | SDN multistage virtual network mapping method and device based on reinforcement learning | |
Sun et al. | Cooperative computation offloading for multi-access edge computing in 6G mobile networks via soft actor critic | |
Rjoub et al. | Trust-driven reinforcement selection strategy for federated learning on IoT devices | |
CN113434212B (en) | Cache auxiliary task cooperative unloading and resource allocation method based on meta reinforcement learning | |
CN111400001B (en) | Online computing task unloading scheduling method facing edge computing environment | |
CN110247795B (en) | Intent-based cloud network resource service chain arranging method and system | |
ABDULKAREEM et al. | OPTIMIZATION OF LOAD BALANCING ALGORITHMS TO DEAL WITH DDOS ATTACKS USING WHALE OPTIMIZATION ALGORITHM | |
Chen et al. | Joint caching and computing service placement for edge-enabled IoT based on deep reinforcement learning | |
Qi et al. | Vehicular edge computing via deep reinforcement learning | |
CN113543160A (en) | 5G slice resource allocation method and device, computing equipment and computer storage medium | |
CN117939505B (en) | Edge collaborative caching method and system based on excitation mechanism in vehicle edge network | |
Li et al. | DQN-enabled content caching and quantum ant colony-based computation offloading in MEC | |
CN113596138B (en) | Heterogeneous information center network cache allocation method based on deep reinforcement learning | |
Hu et al. | Dynamic task offloading in MEC-enabled IoT networks: A hybrid DDPG-D3QN approach | |
CN113411826A (en) | Edge network equipment caching method based on attention mechanism reinforcement learning | |
CN118433787A (en) | Service caching and task migration mechanism based on Internet of vehicles | |
CN114125745A (en) | MQTT protocol power control and QoS mechanism selection method | |
CN116566891A (en) | Delay-sensitive service function chain parallel route optimization method, device and medium | |
Liu et al. | QoS-aware task offloading and resource allocation optimization in vehicular edge computing networks via MADDPG | |
CN116684291A (en) | Service function chain mapping resource intelligent allocation method suitable for generalized platform | |
Chen et al. | A Hybrid Deep Reinforcement Learning Approach for Jointly Optimizing Offloading and Resource Management in Vehicular Networks | |
CN113766540B (en) | Low-delay network content transmission method, device, electronic equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |