CN113596138A

CN113596138A - Heterogeneous information center network cache allocation method based on deep reinforcement learning

Info

Publication number: CN113596138A
Application number: CN202110843043.6A
Authority: CN
Inventors: 马连博; 周萍; 王兴伟; 黄敏
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2021-07-26
Filing date: 2021-07-26
Publication date: 2021-11-02
Anticipated expiration: 2041-07-26
Also published as: CN113596138B

Abstract

The invention discloses a heterogeneous information center network cache allocation method based on deep reinforcement learning, and relates to the technical field of network cache space allocation. The method specifically comprises the following steps: abstracting a heterogeneous ICN into a topological model; defining a dynamically changing content request in a heterogeneous ICN; converting the cache space distribution problem of the heterogeneous ICN into a network performance optimization problem, and constructing a network performance optimization model, wherein the network performance optimization model comprises an optimization objective function and corresponding constraints; applying a Q learning algorithm to each content request to obtain a cache allocation scheme with optimal network performance corresponding to the content request at each moment: combining the deep neural network with a Q learning algorithm, and training an optimal cache allocation scheme which is suitable for the content request of the heterogeneous ICN dynamic change by utilizing a cache allocation scheme with optimal network performance corresponding to the content request at each moment solved by the Q learning algorithm. The method can adaptively solve the cache allocation scheme with the optimal network performance and can be more suitable for dynamically changing network requests.

Description

Heterogeneous information center network cache allocation method based on deep reinforcement learning

Technical Field

The invention relates to the technical field of heterogeneous information center networks, in particular to a cache allocation method of a heterogeneous information center network based on deep reinforcement learning.

Background

With the development of internet technology, more and more network users and more requests for network content are provided. Information Centric Networking (ICN) is a new type of Network architecture that caches content provided by servers on routers to serve users. The outstanding advantage of ICN over traditional network architectures is network caching, where each router can store content. Because the Content Router (Content Router) in the ICN caches different contents from the server, the Content requested by the user is responded by the Router storing the requested Content, thereby avoiding the overhead of long-distance transmission from the client to the server and greatly improving the response speed. For network caching in ICNs, cache allocation (allocation of cache capacity to each content router) is the basis for caching content. In heterogeneous ICNs, each content router may be allocated a different size of cache capacity, which becomes more complex than in homogeneous ICNs. In addition, since the cost of configuring the cache space for the content router is expensive and consumes energy, if the cache space allocated to the content router is too large, unnecessary waste may be caused; if the allocated cache space is too small to meet the request requirement of the cache user, the user experience and the network performance are affected. Therefore, allocating the appropriate cache space for each content router is important to optimize heterogeneous ICN network performance.

For cache allocation of heterogeneous ICNs, two aspects are mainly considered: the method comprises the following steps that firstly, the centrality of a router in a network topology is high, the higher the centrality is, the higher the importance degree of a node in a topological structure is, and the higher the cache capacity needs to be allocated; secondly, the request frequency of the nodes, and the more frequently requested nodes need to allocate more cache space. The existing cache allocation methods of heterogeneous ICN are divided into two types: one is to perform cache allocation based on the importance of the nodes in the network topology; the other method is to convert the cache allocation problem into a network performance optimization problem and obtain an optimal cache allocation scheme by solving an optimal solution which enables the network performance to be optimal. However, these methods are all performed for static networks, and in reality, network requests are dynamically changed, and the existing methods cannot meet the dynamic requirements of the network requests.

Disclosure of Invention

In order to solve the above problems, the present invention provides a heterogeneous information center network cache allocation method based on deep reinforcement learning, and aims to allocate a suitable cache space for each routing node according to the dynamic property of a network request.

The technical scheme of the invention is as follows:

a cache allocation method of a heterogeneous information center network based on deep reinforcement learning comprises the following steps:

step 1: abstracting a heterogeneous ICN into a topological model;

step 2: defining a dynamically changing content request in a heterogeneous ICN;

and step 3: converting the cache space distribution problem of the heterogeneous ICN into a network performance optimization problem of the heterogeneous ICN, and constructing a network performance optimization model, wherein the network performance optimization model comprises an optimization objective function and corresponding constraints;

and 4, step 4: applying a Q learning algorithm to each content request in the heterogeneous ICN to obtain a cache allocation scheme with optimal network performance corresponding to the content request at each moment:

and 5: and (4) combining the deep neural network with the Q learning algorithm, and training an optimal cache allocation scheme which is suitable for the content request with the dynamic change of the heterogeneous ICN by using the cache allocation scheme with the optimal network performance corresponding to the content request at each moment solved by using the Q learning algorithm in the step 4.

Further, according to the cache allocation method of the heterogeneous information center network based on deep reinforcement learning, the heterogeneous ICN with n content routers is abstracted into a topology model G (V, E, C, Long, Lati):

wherein V represents a group consisting ofA content router set composed of the n content routers; e represents a set of edges between content routers; c represents a set of cache capacities allocated to the content routers; long represents the longitude of the location of the content router in the topology model G; lati represents the latitude of the position of the content router in the topology model G; CR_iRepresents the ith content router; e.g. of the type_ijPresentation content router CR_iAnd the jth content router CR_jA path between; c. C_iPresentation content router CR_iThe allocated buffer capacity; long_iPresentation content router CR_iLongitude of the location in the topology model G; lati i_iPresentation content router CR_iLatitude of the position in the topological model G; CR_i，e_ijYet further can be expressed as follows:

wherein,

indicating allocated buffer capacity c_iThe ith content router of (1) in the content router,

presentation content router

And allocated buffer capacity c_jJth content router of (1)

Path between C_maxIndicating the maximum cache capacity that the content router can allocate.

Further, according to the heterogeneous information center network cache allocation method based on deep reinforcement learning, the hit rate and energy consumption of a content request are used as evaluation indexes of the performance of the heterogeneous ICN, and an optimization objective function shown in a formula (12) is established:

wherein, NetP_totalIs the overall network performance of the heterogeneous ICN;

indicating a successful cache hit CR_iThe number of times of the operation of the motor,

represents CR_iThe total number of requests received is,

presentation content router CR_iThe request hit rate of;

indicating a routing node CR_iEnergy consumption of (2); p_iIs CR_iFixed energy consumption of router hardware when caching content;

to pass through CR_iTransmitting energy consumption corresponding to the content of the unit byte; tra_iTo pass through CR_iThe size of the data stream of (a);

representing content requesting nodes CR_jAnd service node CR_iThe distance of (d); ω and μ are request hit rate and energy consumption, respectively, for content router CR_iAnd caching the weight value of the network performance corresponding to the content of the unit size.

Further, according to the method for allocating cache of a heterogeneous information-centric network based on deep reinforcement learning, the constraints include a cache space constraint of each content router shown in formula (13) and a cache space constraint in the overall network topology:

wherein, C_maxRepresenting the maximum cache capacity that a content router in a heterogeneous ICN can allocate; c_totalRepresenting the maximum cache space of the whole of all content routers in the heterogeneous ICN.

Further, according to the method for allocating cache of the heterogeneous information center network based on deep reinforcement learning, the method for applying the Q learning algorithm to each content request in the heterogeneous ICN comprises the following steps: the content request at each time is expressed as a Q-learned state Status ═ s₁,s₂,…,s_tIn which s is_tRequesting q for content at time t_tThe corresponding Q-learned state; representing a topology model G (V, E, C, Long, Lati) of the heterogeneous information-centric network as an Environment of Q learning { E }₁,e₂,…,e_tIn which e_tRequesting q for content at time t_tA corresponding Q learning environment; representing the cache allocation scheme for a content router as the Action of Q-learning ═ { a }₁,a₂,…,a_tIn which a is_tRequesting q for content at time t_tA corresponding Q learning action; performing a cache allocation scheme for network content requests returns a network performance value, denoted as the reward value of Q learning, r₁,r₂,…,r_tIn which r is_tRequesting q for content at time t_tThe corresponding Q learned reward value; in the Q learning process, selecting the action with the maximum reward value corresponding to each state for execution, and obtaining the strategy of Q learning after the Q learning process is finished

The action with the largest corresponding prize value is selected for each entered state and executed.

Further, according to the heterogeneous information center network cache allocation method based on deep reinforcement learning, the deep neural network is a BP neural network.

Further, according to the heterogeneous information center network cache allocation method based on deep reinforcement learning, the step 5 includes the following specific steps:

step 5.1: randomly initializing a weight theta of the BP neural network;

step 5.2: learning the state and action(s) of Q at time T within a period T_t,a_t) As the input value of the neural network, the maximum reward value R(s) obtained by the Q learning algorithm is used correspondingly_t,a_tθ) and corresponding actions a_tOutput value y as a deep neural network_output；

Step 5.3: calculating an estimated value of an output value of the BP neural network according to a Bellman equation;

step 5.4: calculating a corresponding loss value according to the output value of the BP neural network and the estimated value of the output value;

step 5.5: updating the weight of the BP neural network by adopting a gradient descent method according to the loss value;

step 5.6: and according to the method of the steps 5.2 to 5.5, repeatedly executing the steps 5.2 to 5.5, and iteratively updating the theta until a condition T of stopping iteration is met, so as to obtain a final weight theta of the neural network, wherein the weight theta is used as an optimal cache allocation scheme of the content request adapting to the dynamic change of the T time period.

Compared with the prior art, the heterogeneous information center network cache allocation method based on deep reinforcement learning has the following beneficial effects: after the heterogeneous information center network is modeled, the dynamic property of the network request is analyzed, and compared with the existing topological model of the heterogeneous information center network, the dynamic network model is more in line with the actual situation. The deep learning and the Q learning are combined and applied to the problem of cache allocation of the dynamic heterogeneous information center network, and compared with the existing cache allocation method, the cache allocation scheme with the optimal network performance can be solved in a self-adaptive mode, and the dynamic heterogeneous information center network cache allocation method can be more suitable for network requests with dynamic changes.

Drawings

FIG. 1 is a schematic diagram of an information-centric network architecture;

fig. 2 is a schematic flow chart of a cache allocation method of a heterogeneous information center network based on deep reinforcement learning according to the embodiment;

fig. 3 is a schematic structural diagram of the deep Q learning algorithm of the present embodiment;

fig. 4 is a schematic flowchart of solving the network cache allocation scheme by deep learning according to this embodiment.

Detailed Description

To facilitate an understanding of the present application, the present application will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present application are given in the accompanying drawings. This application may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

The invention provides a dynamic cache space allocation scheme aiming at the problem of cache space allocation of nodes in a heterogeneous information center network, and particularly provides a cache allocation strategy for the nodes in the network to adapt to the dynamic property of the network. When modeling is carried out aiming at the cache allocation problem, the network request hit rate and the energy consumption are used as performance evaluation indexes, the hit rate and the energy consumption are integrated into a comprehensive performance evaluation cache allocation scheme, and the cache allocation problem is modeled into the problem of maximizing the network performance. In order to obtain the optimal cache allocation of each content request, the cache allocation is used as an action selected by an agent by applying a reinforcement learning method, and a cache allocation scheme corresponding to the optimal performance of each request is obtained. In order to adapt to the dynamic property of the network request, the existing content request is used as input, the cache allocation scheme obtained by reinforcement learning is used as output, and the optimal cache allocation scheme adapting to the dynamic request at different moments is obtained through training.

Fig. 1 is a schematic diagram of an information-centric network architecture, which is composed of nodes and paths between the nodes, wherein the nodes include request nodes, routing nodes and service nodes. The request node is responsible for receiving a content request of a user and transmitting the request to the routing node; the routing node is responsible for transmitting requests or contents and can cache the contents; the service node stores the content and is responsible for returning the requested content to the user. Nodes between paths are responsible for passing requests or content. When a user sends a content request to a request node, the request node transmits the request to a routing node through a path, the routing node judges whether the request content is cached or not, if the request content is cached, the content is returned to the request node, and if the request content is not cached, the request is delivered to the next routing node or a service node according to a forwarding information base; finally, the request is transmitted to a routing node or a service node which caches the request content, and the node returns the content to the request node according to the request path to complete the request. The efficiency of the process of one-time request completion is proportional to network performance and is related to the cache space and corresponding cache contents of each routing node. Through proper cache allocation, the frequently requested content is cached in the corresponding node frequently requesting the content, so that the network performance can be improved, and the content requesting efficiency can be improved. The invention provides a heterogeneous information center network cache allocation method based on deep reinforcement learning, and aims to allocate a proper cache space for each routing node.

Fig. 2 is a schematic flow chart of a deep reinforcement learning-based heterogeneous information centric network cache allocation method provided by the present invention, where the deep reinforcement learning-based heterogeneous information centric network cache allocation method includes the following steps:

step 1: abstracting a heterogeneous information center network into a topological model;

in this embodiment, a heterogeneous information center network with n content routers is abstracted into a topology model G (V, E, C, Long, Lati), where V represents a content router set composed of the n content routers; e represents a set of edges between content routers; c represents a set of cache capacities allocated to the content routers; long represents the longitude of the location of the content router in the topology model; lati represents the latitude of the position of the content router in the topological model; each component of the heterogeneous information center network topology model is specifically expressed as follows:

wherein, CR_iRepresents the ith content router; e.g. of the type_ijPresentation content router CR_iAnd the jth content router CR_jA path between; c. C_iPresentation content router CR_iThe allocated buffer capacity; long_iPresentation content router CR_iLongitude of the location in the topology model; lati i_iPresentation content router CR_iThe latitude of the location in the topology model. CR_i，e_ijYet further can be expressed as follows:

wherein,

presentation content router

And allocated buffer capacity c_jJth content router of (1)

Step 2: on the basis of a topological model of a heterogeneous information center network, defining a dynamically changing content request;

the content request at each moment is dynamically changed, and the content request Qr in the period T is defined as:

Qr＝{q_t|1≤t≤T} (3)

wherein q is_tRefers to a content request that occurs at time t by the network, including: content request node and content request node-in-topology modelLatitude and longitude of the location of (1), the requested content, the content server node providing the requested content, and the request time.

To elaborate the dynamically changing network requests at different times, q_tCan be further expressed as:

wherein,

respectively represent q_tThe content requesting node of the kth content request, the requested content, the longitude of the location of the content requesting node in the network topology model, the latitude of the location of the content requesting node in the network topology model, the content server node providing the requested content, and the request time.

On the basis of a static network topology model, the dynamic analysis of the network request is added, and the requirements of the dynamically changing network request on different cache spaces can be met.

And step 3: converting the cache allocation problem into a network performance optimization problem, and constructing a network performance optimization model, wherein the network performance optimization model comprises an optimization objective function and corresponding constraints;

in the embodiment, the cache allocation problem is converted into the optimization problem of the network performance, and the hit rate and the energy consumption of the content request are used as the evaluation indexes of the network performance for the network content request. With E_total、H_totalRespectively representing the energy consumption and hit rate of the whole network by ec_i、hr_iEach content router CR is represented separately_iThe unit energy consumption and the unit hit rate of the network are respectively the sum of the energy consumption of each router and the sum of the hit rate, and are specifically expressed as follows:

wherein, c_i＝{0,1,2,...,C_max}，C_maxMaximum buffer capacity that can be allocated for each router, c_i0 represents CR_iNot allocated cache, c _i1 represents CR_iA buffer allocated 1 predetermined unit, c_i2 represents CR_iThe cache of 2 preset units is distributed; hr of_iPresentation content router CR_iBy the content router CR as shown in equation (6)_iNumber of requests and content router CR received and successfully hit_iThe ratio of all requests received is calculated, wherein the content router CR_iThe number of requests received and successfully hit is the actual number of requests occurring at the content router CR_iAnd node CR_iThe number of requests for which the requested content is cached, and the content router CR_iThe number of all requests received is the actual number of requests occurring at the node CR_iThe number of requests above; ec_iIndicating a routing node CR_iThe energy consumption of the ICN is calculated according to the formula (7), and comprises two parts of caching energy consumption and transmission energy consumption, and the cost of ICN content caching is reflected. The cache energy consumption refers to energy consumed by the router for caching content, is related to the caching performance of the router and the size of the caching content, and is calculated. The transmission energy consumption refers to energy consumed by the router for transmitting the request, is related to the size of the transmitted content, and is used for calculating the time consumed by transmission energy;

wherein,

represents CR_iThe total number of requests received.

Wherein, P_iIs CR_iFixed energy consumption of router hardware when caching content;

to pass through CR_iEnergy consumption, t, corresponding to the content of a transmission unit byte_iIs CR_iRun time of tra_iTo pass through CR_iThe size of the data stream.

The running time includes the time for the node to process the cache request and the transmission time for returning the request content to the requesting node, assuming the processing time is ignored, the CR_jRequesting a node for the content, then t_iThe calculation was performed according to equation (8).

Wherein distance_i,jRepresenting content requesting nodes CR_jAnd service node CR_iThe distance of (2) is calculated according to the position of the node in the heterogeneous information center network topology model, and the formula (9) is referred to:

NetP_ipresentation content router CR_iThe network performance corresponding to the content of the cache unit size is proportional to the hit rate and inversely proportional to the energy consumption, NetP_iCalculation with reference to equation (10):

NetP_i＝ωhr_i+μec_i (10)

wherein, ω andμ for hit rate and energy consumption respectively for content router CR_iAnd caching the weight value of the network performance corresponding to the content of the unit size.

In the whole heterogeneous information center network topology, the whole network performance NetP_totalIs represented as follows:

aiming at the problem of cache space allocation of an ICN node, the goal is to find a cache allocation scheme, so that the network performance is optimal for dynamic content requests, namely the overall network performance is maximized, and an optimization objective function shown in formula (12) is established:

while maximizing network performance, a single node cache space and all network cache spaces need to satisfy certain constraint conditions, as shown in formula (13), including the cache space constraint of each content router and the cache space constraint in the overall network topology:

and (3) a final network performance optimization model is shown as an equation (14):

in the above formula, c_iDenotes the ith content Router CR_iThe allocated buffer capacity; c_totalThe maximum cache space of the whole router representing all the content in the network;

and 4, step 4: applying a Q learning algorithm to cache allocation of a heterogeneous information center network, applying the Q learning algorithm to each content request of the network, and obtaining a cache allocation scheme with optimal network performance corresponding to the content request at each moment:

because the network structure cannot change along with time change in practice, the network dynamics is mainly embodied in the network request dynamics, different content requests can occur in the network at different times, and the network dynamics is caused, so when the Q learning is applied to cache allocation of the heterogeneous information center network, the content request at each time is expressed as a Q learning state Status, and for the content requests at different times, the Q learning state is specifically expressed as Status ═ s {(s) }₁,s₂,…,s_tIn which s is_tRequesting q for content at time t_tThe corresponding Q-learned state; representing a topological model G (V, E, C, Long, Lati) of the heterogeneous information center network as an Environment for Q learning; the cache allocation scheme for the content router is represented as Q-learned Action, and the execution of the cache allocation scheme for the network content request returns a network performance value, represented as Q-learned reward value, rework. In the Q learning process, the action with the maximum reward value corresponding to each state is selected and executed. After the Q learning process is finished, the obtained Policy of Q learning selects the action with the maximum reward value for each input state to execute.

The action of Q learning refers to different cache allocation schemes allocated to the network, specifically, a certain cache space is allocated to each routing node, and the constraint condition of the size of the cache space of the routing node is met. In a heterogeneous information centric network, the cache space size of each node may be unequal. In a real network, the number of nodes is often huge, and the cache space of each node can be selected, so that the selectable cache allocation schemes of the network are various, namely Q learning has a plurality of selectable actions. Network content request q for different time instants_tThe Action of Q learning is specifically expressed as Action ═ { a ═ a₁,a₂,…,a_tIn which a is_tRequesting q for content at time t_tCorresponding Q learning actions.

The environment for Q learning consists of an integral network, can interact with states through actions and returns to statesAnd evaluating the quality of the action according to the corresponding reward value of the state. The Environment of Q learning is specifically expressed as Environment { e } for network content requests at different times₁,e₂,…,e_tIn which e_tRequesting q for content at time t_tCorresponding to the environment of Q learning.

The reward value for Q learning refers to the value that a state performs an action interacting with the environment, which returns to the state, represented by the network performance. For different cache allocation schemes, the network has different performances when processing the request, and the higher the performance is, the better the corresponding cache allocation scheme is, i.e. the action with the high reward value is selected to be executed. The reward value is obtained through network performance calculation, in a known network topology model, different network performances can be obtained through different cache allocation schemes for the same network content request, and the Q learning algorithm can select the cache allocation scheme corresponding to the optimal network performance, namely the action of Q learning and selecting the reward value to be the maximum. Network content request q for different time instants_tThe reward value for Q learning is specifically expressed as rework ═ { r ═ r₁,r₂,…,r_tIn which r is_tRequesting q for content at time t_tThe corresponding Q learned reward value.

The strategy of Q learning directs a state to select an action. Depending on the policy, a cache allocation scheme for a network content request may be determined. The strategy is expressed as

And 5: and (4) combining the deep neural network and the Q learning, applying the combined result to the solution of the cache allocation scheme of the heterogeneous information center network, and training an optimal cache allocation scheme of the content request adapting to the dynamic change of the network according to the cache allocation scheme with optimal network performance corresponding to the network content request at each moment solved by the Q learning in the step 4.

In the deep neural network portion, a bp (back propagation) neural network is used in the present embodiment, and includes two processes of forward propagation and backward propagation. The forward propagation process is used for constructing a neural network structure, as shown in fig. 3, the state and action of Q learning are regarded as input of the neural network, the reward value is regarded as output of the neural network, the strategy is regarded as weight of the neural network, and the neural network can train the optimal weight fitting input and output at different times. The back propagation process is used to adjust the neural network structure, optimizing the weights by minimizing the loss value per training, where the loss value is related to the output value of the neural network and an estimate of the output value of the neural network. The information center network cache allocation process for solving the dynamic request by the deep Q learning algorithm specifically comprises the following steps:

as shown in fig. 3, the state and action of Q learning serves as input to the neural network. The input layer of the neural network receives input data, and inputs the state and Action of Q learning, denoted as (Status, Action). And the output layer of the neural network receives the performance value of the network output by the Q learning and takes the reward value of the Q learning as the output of the neural network. In a state s_tAccording to the policy(s)_t,a_t) Performing action a_tObtain the corresponding reward value r_tIs shown as r_t(s_t,a_t；policy(s_t,a_t)). The weight value theta of the neural network corresponds to the strategy Policy of Q learning.

Reward value r_tWith reference to equation (15):

as shown in fig. 4, the step 5 includes the following specific steps:

step 5.1: randomly initializing a weight theta of the BP neural network;

step 5.2: learning the state and action(s) of Q at time T within a period T_t,a_t) As input value x for a neural network_inputThe maximum reward value R(s) for Q learning is accordingly determined_t,a_tθ) and corresponding actions a_tOutput value y as a deep neural network_output。

Step 5.3: and calculating an estimated value of the output value of the BP neural network according to the Bellman equation.

Calculation of estimated value of output value by BP neural network through equation (16)

Wherein alpha and gamma are respectively the learning rate and discount rate of the Bellman equation, and a is the state s_t+1A corresponding selectable action.

and in the back propagation process of the BP neural network, the BP neural network adjusts the weight according to the loss value. The loss value is calculated from the output value of the neural network and the estimated value of the output value.

Calculation of loss values with reference to equation (17):

wherein m is the number of neurons of the preset output layer.

Step 5.5: and updating the weight of the BP neural network by adopting a gradient descent method according to the loss value.

In order to make the weight value approach to the optimum continuously, the weight value should be updated toward the direction of decreasing loss value. The updating mode of the weight value refers to a formula (18), and in the back propagation process, the weight value of the neural network is adjusted according to the loss value by adopting a gradient descent method, namely according to loss (x)_input,y_outputθ) update weight θ, expressed as follows:

wherein η is the learning rate of the gradient descent method. Since the objective of the algorithm is to find the weight corresponding to the minimum loss value, η < 0.

Step 5.6: and repeating the steps 5.2-5.5, and iterating the process of updating theta until the condition of stopping iteration is met, namely T is equal to T, wherein the obtained theta is the weight of the final neural network, and the obtained final weight is the optimal cache allocation strategy adapted to the dynamic request of the T time period.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.

Claims

1. A cache allocation method of a heterogeneous information center network based on deep reinforcement learning is characterized by comprising the following steps:

step 1: abstracting a heterogeneous ICN into a topological model;

step 2: defining a dynamically changing content request in a heterogeneous ICN;

2. The deep reinforcement learning-based cache allocation method for the heterogeneous information-centric network according to claim 1, wherein the heterogeneous ICN with n content routers is abstracted as a topology model G (V, E, C, Long, Lati):

wherein V represents a content router set composed of the n content routers; e represents a set of edges between content routers; c represents a set of cache capacities allocated to the content routers; long represents the longitude of the location of the content router in the topology model G; lati represents the latitude of the position of the content router in the topology model G; CR_iRepresents the ith content router; e.g. of the type_ijPresentation content router CR_iAnd the jth content router CR_jA path between; c. C_iPresentation content router CR_iThe allocated buffer capacity; long_iPresentation content router CR_iLongitude of the location in the topology model G; lati i_iPresentation content router CR_iLatitude of the position in the topological model G; CR_i，e_ijYet further can be expressed as follows:

wherein,

presentation content router

And allocated buffer capacity c_jJth content router of (1)

3. The heterogeneous information center network cache allocation method based on deep reinforcement learning as claimed in claim 2, wherein the hit rate and energy consumption of content requests are used as evaluation indexes of the performance of the heterogeneous ICN network, and an optimization objective function shown in formula (12) is established:

wherein, NetP_totalIs the overall network performance of the heterogeneous ICN;

represents CR_iThe total number of requests received is,

presentation content router CR_iThe request hit rate of;

4. The method for cache allocation in the heterogeneous information-centric network based on deep reinforcement learning according to claim 2, wherein the constraints include a cache space constraint of each content router and a cache space constraint in the overall network topology as shown in formula (13):

5. The method for distributing the cache of the heterogeneous information-centric network based on the deep reinforcement learning of claim 2, wherein the method for applying the Q learning algorithm to each content request in the heterogeneous ICN comprises: the content request at each time is expressed as a Q-learned state Status ═ s₁,s₂,…,s_tIn which s is_tRequesting q for content at time t_tThe corresponding Q-learned state; representing a topology model G (V, E, C, Long, Lati) of the heterogeneous information-centric network as an Environment of Q learning { E }₁,e₂,…,e_tIn which e_tRequesting q for content at time t_tA corresponding Q learning environment; representing the cache allocation scheme for a content router as the Action of Q-learning ═ { a }₁,a₂,…,a_tIn which a is_tRequesting q for content at time t_tA corresponding Q learning action; performing a cache allocation scheme for network content requests returns a networkPerformance value, Reword ═ r expressed as Q learning₁,r₂,…,r_tIn which r is_tRequesting q for content at time t_tThe corresponding Q learned reward value; in the Q learning process, selecting the action with the maximum reward value corresponding to each state for execution, and obtaining the strategy of Q learning after the Q learning process is finished

6. The method according to claim 5, wherein the deep neural network is a BP neural network.

7. The method for allocating the cache of the heterogeneous information-centric network based on the deep reinforcement learning of claim 6, wherein the step 5 comprises the following specific steps:

step 5.1: randomly initializing a weight theta of the BP neural network;