CN116321307A - Bidirectional cache placement method based on deep reinforcement learning in non-cellular network - Google Patents

Bidirectional cache placement method based on deep reinforcement learning in non-cellular network Download PDF

Info

Publication number
CN116321307A
CN116321307A CN202310257897.5A CN202310257897A CN116321307A CN 116321307 A CN116321307 A CN 116321307A CN 202310257897 A CN202310257897 A CN 202310257897A CN 116321307 A CN116321307 A CN 116321307A
Authority
CN
China
Prior art keywords
content
edge server
cache
network
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310257897.5A
Other languages
Chinese (zh)
Inventor
王朝炜
于小飞
王子夜
王卫东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202310257897.5A priority Critical patent/CN116321307A/en
Publication of CN116321307A publication Critical patent/CN116321307A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/10Flow control between communication endpoints
    • H04W28/14Flow control between communication endpoints using intermediate storage
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
    • H04W28/18Negotiating wireless communication parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
    • H04W28/18Negotiating wireless communication parameters
    • H04W28/20Negotiating bandwidth
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
    • H04W28/24Negotiating SLA [Service Level Agreement]; Negotiating QoS [Quality of Service]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention provides a bidirectional cache placement method based on deep reinforcement learning in a non-cellular network, and relates to the technical fields of mobile communication, internet of things and the like. The method comprises the following steps: establishing a utility function based on cache hit rate, cache space resource utilization rate, content response time delay and energy consumption index for the edge server node, constructing a multi-objective optimization problem based on the utility function to solve a content cache decision, wherein an optimization objective is to maximize the cache hit rate, and minimize the system cost as much as possible; then establishing a buffer resource allocation decision network by using the deep Q network; and updating the content caching decision by using the experience replay training Q network according to the preference timing update of the user by the continuously received user request. The method of the invention can reasonably allocate bandwidth and computing resources for users, and achieve the aim of improving the utilization rate of the whole system resources in the non-cellular network and simultaneously ensuring the requirements of the user application service quality.

Description

Bidirectional cache placement method based on deep reinforcement learning in non-cellular network
Technical Field
The invention relates to the technical fields of mobile communication, internet of things and the like, in particular to a bidirectional cache placement method based on deep reinforcement learning in a non-cellular network.
Background
The development of mobile communication technology and artificial intelligence brings challenges to the internet of things system such as high dynamic topology, computational resource limitation, network scale expansion, quality of service (Quality of Service, qoS) and the like. The research on the corresponding cache placement strategy in the mobile edge computing system has become a problem to be solved in the emerging 6G application oriented bidirectional cache characteristics. Particularly, the method is combined with a Deep Reinforcement Learning (DRL) technology, and a cache service is provided at the edge of a wireless access network, so that the communication and the efficient utilization of computing resources can be assisted, the communication overhead caused by content retransmission is reduced, and the burden of a backhaul link is reduced. The honeycomb-free large-scale multiple input multiple output (Cell-Free Massive MIMO) technology combines the concepts of distributed MIMO and massive MIMO, replaces complex multi-antenna macro base stations (Macro Base Station, MBS) with a group of simple distributed Access Points (APs), and all Access points are directly connected with a central processing unit (Central Processor Unit, CPU) through a backhaul link and cooperate with each other to serve all users in the same time-frequency resource, so that the system has expandability and improves coverage rate.
In the Cell-Free Massive MIMO technology-assisted mobile edge computing communication scenario, a plurality of edge servers equipped with single antennas can be used as edge APs and flexibly distributed in a wide domain. Both the edge APs and the mobile user terminal have certain computing and caching capabilities, and can cache content. A set of distributed edge APs simultaneously provides caching services for all users in coverage and receives task delivery requests from content and processing of the Internet collected by the CPU. The CPU can control the buffer status of each edge APs and the transfer and forwarding of the content. If the content requested by the user cannot be acquired in time at the edge side of the wireless network, the content can be acquired from the CPU through a backhaul link.
The response delay of the user request and the computational power consumption of the terminal device are two indicators that significantly affect the quality of user experience (Quality of Experience, qoE). The content is cached in the mobile user terminal and the edge APs, so that the content can be processed and returned quickly, and the requirement of reducing the energy consumption of equipment when the mobile user acquires the content quickly is met. When users are in the coverage range of the edge APs, they can request corresponding content according to the computation service of interest, if the users locally pre-cache the requested content, the content can be selected to be locally computed or unloaded to the edge APs for computation. If the user terminal does not cache the requested content, the content needs to be acquired from the edge APs, and then local calculation is selected or is handed to the edge APs for calculation. Compared with the content retrieval from the CPU, the content is cached at the edge side, so that the communication cost can be effectively reduced, the network capacity can be increased, and the content delivery rate can be improved. However, considering the limited caching capability of the user terminal, the content-centric entertainment and information enrichment and diversity in the internet of things in the future face huge tests on spectrum resources, network capacity and user experience quality. Therefore, by combining with a finer bidirectional caching task model, future caching positions of different contents can be predicted according to historical request information of a user, so that the related popular content auxiliary computing performance is pre-cached on the edge APs, and the method has important significance for development of emerging immersive applications.
Disclosure of Invention
Aiming at the scene that an edge server provides assistance for a user bidirectional computing task under a mobile edge computing system in a non-cellular network, the invention provides a bidirectional cache placement method based on deep reinforcement learning in the non-cellular network in order to realize the cooperative optimization of the bandwidth, computing and cache resource allocation of the whole system with finer granularity.
The invention provides a bidirectional cache placement method based on deep reinforcement learning in a non-cellular network, which comprises the following steps:
step 1, establishing a utility function based on cache hit rate, cache space resource utilization rate, content response time delay and energy consumption index for an edge server node, constructing a multi-objective optimization problem based on the utility function to solve a content cache decision, wherein an optimization objective is to maximize the cache hit rate and minimize the system cost as much as possible;
step 2, mapping a content caching decision process based on user history preference into a Markov decision process, and establishing a caching resource allocation decision network by using a deep Q network;
the method comprises the steps that an edge server node collects historical requests and terminal equipment resource information of all users in a base station signal coverage area, predicts interested contents of the users according to the historical requests of the users, generates an initial content caching decision, utilizes an experience replay training Q network, and updates preferences of the users at regular time to update the content caching decision;
step 3, the user terminal generates a service requirement and sends a request to the edge server node, the user terminal detects whether the requested content is cached locally, if not, the requested content is acquired from the edge server node, and then the requested content is selected to be locally or unloaded to the edge server node for processing; if yes, directly selecting to process locally or unloading to an edge server node; and the edge server updates the content caching decision according to the continuously received user request.
In the step 1, the utility function in the time slot t is expressed as
Figure BDA0004130303650000021
Wherein P is hit (t) is the cache hit rate of the edge server node in time slot t; y (t) is the normalized system cost of the edge server node in the time slot t, and comprises the utilization rate of cache space resources, content response time delay and energy consumption;
cost of the system
Figure BDA0004130303650000022
Wherein ω, & gt>
Figure BDA0004130303650000026
And mu is the weight proportion, T total (t) and E total (T) represents the total delay and total energy consumption of the edge server in time slot T, respectively, T max (t) and E max (t) represents maximum time delay and maximum energy consumption of edge server in time slot t, C M Representing the cache capacity of the edge server, C F_M Representing the sum of the cache contents size of the edge server at time slot t.
In the step 1, K users are set in the coverage area of the edge server node, and the user set is as follows
Figure BDA0004130303650000027
Providing F environmental frame contents, wherein the content set is +.>
Figure BDA0004130303650000028
The data size of the content i is S i ,C i Representing the cache state of content i on edge server, C i =0 means uncached, C i =1 indicates buffered, ++>
Figure BDA0004130303650000023
Then construct the following multi-objective optimization problem solving edge server content caching decisions:
Figure BDA0004130303650000024
Figure BDA0004130303650000025
Figure BDA0004130303650000031
the optimization problem represents optimizing content caching decisions
Figure BDA0004130303650000032
An optimization target for maximizing the cache hit rate and minimizing the system cost as much as possible is achieved; />
Figure BDA0004130303650000033
Representing an average utility function; />
Figure BDA0004130303650000034
Indicating whether content i is requested or not +.>
Figure BDA0004130303650000035
Representing content i->
Figure BDA0004130303650000036
Indicating that at least one user requested content i, is->
Figure BDA0004130303650000037
Representation->
Figure BDA0004130303650000039
No user request; h is a k Indicating whether user k's request hits in the edge server's cache space, h if the content hits k =1, otherwise h k =0; τ represents the current slot.
In the step 2, in the deep Q network, the state of the agent is set to be the content cache state s (t) = [ C ] of the current edge server 1 ,C 2 ,…,C F ]The method comprises the steps of carrying out a first treatment on the surface of the Actions of agent output
Figure BDA00041303036500000310
Figure BDA00041303036500000311
Indicating whether or not to take action on content i; evaluating rewards of the state after the network computing action is executed, the rewards being set to an optimal target value
Figure BDA0004130303650000038
Compared with the prior art, the invention has the advantages and positive effects that:
(1) The method of the invention models the utilization rate of system resources and the service quality of different services in a non-cellular network as a multi-objective optimization problem, comprehensively considers the edge multi-dimensional resource coupling, the application bidirectional input characteristic and the user fairness, maps the edge server content caching decision process based on the user history preference as a Markov process, establishes a caching resource allocation decision network by utilizing a DQN neural network, solves the multi-objective optimization problem on the basis, and finds an optimal caching strategy. This decision is based on the long-term request information for service content by each user and the corresponding device caching capabilities. The user history request information is utilized in each time slot to predict the next content demand of the user and perform buffer resource allocation, so that bandwidth and computing resources are reasonably allocated for the user, and the purposes of improving the utilization rate of the whole system resources in the non-cellular network and simultaneously guaranteeing the application service quality requirements of the user are achieved through reasonably allocating the bandwidth, computing and buffer resources of the edge server.
(2) Experimental comparison shows that the method can coordinate the unloading decision of the user by combining the iterative optimization method in each time slot, jointly optimize the computing communication resource of the edge server, and ensure the experience quality requirement of each user and the high-efficiency utilization of the edge multidimensional resource; compared with the existing caching scheme, the method can obtain better effect, and the optimization targets of maximizing the utilization rate of system resources and guaranteeing the experience quality of all users are achieved.
Drawings
FIG. 1 is a schematic diagram of a bi-directional cache scenario of a mobile edge computing network facilitated by the present invention l-Free Massive MIMO technique;
FIG. 2 is a flow chart of optimizing system resource utilization and user quality of service experience using the method of the present invention;
FIG. 3 is a schematic diagram of a content caching decision for an edge server using deep Q network optimization in the method of the present invention;
FIG. 4 is a graph comparing the effect of the method of the present invention on utility function as a function of content quantity with existing caching schemes;
FIG. 5 is a graph comparing the effect of the method of the present invention on utility function as a function of the number of users with the prior caching scheme.
Detailed Description
The invention will be described in further detail with reference to the drawings and examples.
Because of limited edge resources, the mobile edge computing system can use the Cell-Free Massive MIMO technology and the deep reinforcement learning technology when facing the emerging business with the content as the center, thereby improving the utilization rate of the edge resources and ensuring the QoS requirement of the business and the QoE of the user. The invention considers the bidirectional cache scene of the mobile edge computing network assisted by the Cell-Free Massive MIMO technology. As shown in fig. 1, the scenario consists of one central processing unit CPU, edge APs and mobile user terminals. A user requests content through a mobile device having caching and computing capabilities. Edge APs with MEC (mobile edge computing) servers can provide caching and offloading services with some communication coverage. In the method of the present invention, it is assumed that the input of the application computing task comes from two aspects: 1) Data generated by the user equipment; 2) Data from the internet. The edge AP has enough content storage space to store all content. The buffer space of the user terminal is limited, so that only the content can be selectively buffered, and the QoE of the user can be influenced by the calculation time delay and the energy consumption.
Unlike conventional unidirectional computing task models, the bidirectional computing task model of the present invention has input data composed of two parts: 1) The data generated by the user equipment, namely the local input data, comprises the strategy selection, the three-dimensional action information and the like of the current equipment; 2) Data from the internet, i.e., remote input data, such as map information, etc. However, the resource coupling of the multi-user system is complex, so that the bandwidth of the system needs to be dynamically allocated according to the information such as the user preference, the calculation amount requirement and the like, and the resources are cached and calculated, so that the resource utilization rate is improved as much as possible, and finally, the purposes of ensuring and improving the QoS of different services and the corresponding QoE of the user are achieved. According to the bidirectional cache placement method based on deep reinforcement learning in the non-cellular network, the edge AP comprehensively considers the conditions of equipment energy consumption, storage capacity, calculation time delay and the like to perform joint allocation of the edge multidimensional resources, so that the aims of maximizing the cache hit rate of system content and the utilization rate of storage resources and improving the overall experience of the system are achieved.
As shown in fig. 2, the overall flow of optimizing the system resource utilization rate and the user service experience quality by adopting the method in the application scene is as follows: firstly, a user terminal in the coverage area of an edge AP generates service requirements, selects required contents according to the own preference of a user and sends a request to the edge AP. The edge server, namely the edge AP, collects the equipment resource information and the historical request data of each user, and then makes resource allocation decisions according to the information after receiving the request. According to the method, the edge AP predicts the content possibly interested by the user according to the history request of the user to form an initial caching decision, and further uses experience replay training deep Q network (Q-network) to obtain a stable customized user caching decision. And the edge server updates the content caching decision according to the continuously received user request. If the requested content is already cached in the mobile terminal, the user may choose to either locally calculate or offload to the edge AP calculation. If the content is not cached, the content can be acquired from the edge AP, and then local calculation is selected or is given to the edge AP for calculation processing. And (3) carrying out timing updating iteration on the preference of each user by utilizing the deep Q network, and continuously improving the accuracy of the decision made by the system for the user request prediction, thereby maximizing the utilization rate of system resources and ensuring the experience quality of all users.
Considering a single edge wireless access point, the edge server cache capacity is labeled C M . The application provides F ambient frame contents, expressed as
Figure BDA0004130303650000041
Each content data is different in size by +.>
Figure BDA0004130303650000046
And (3) representing. The content requested by the user is in F environmental frames. K users are randomly distributed in the coverage area of the AP base station and expressed as
Figure BDA0004130303650000042
The system operates indefinitely and may be divided into a number of time slots, denoted t=0, 1,2,3 …. At most one content request is submitted by each terminal device per time slot t. The request content of the ith user in the time slot t is recorded as
Figure BDA0004130303650000043
Use->
Figure BDA0004130303650000044
Representing the requested content of K users in time slot t. Use->
Figure BDA0004130303650000045
Figure BDA0004130303650000051
Representation content->
Figure BDA0004130303650000052
Whether or not it is requested. />
Figure BDA0004130303650000053
The value is 1 or 0. If it is
Figure BDA0004130303650000054
Indicating that at least one user has requested content +.>
Figure BDA0004130303650000055
Figure BDA0004130303650000056
Representation content->
Figure BDA0004130303650000057
There is no user request. Then the request status of all content can be expressed as +.>
Figure BDA0004130303650000058
Suppose that the AP must respond to the user's request and provide service before the end of the current slot. By C i E {0,1} represents the cache state of the ith content on the MEC server as follows:
Figure BDA0004130303650000059
then the caching decisions for all content are expressed as
Figure BDA00041303036500000510
MEC cache resources are limited, with the following limitations:
Figure BDA00041303036500000511
wherein C is F_M Is the sum of the cache content sizes in the MEC.
In order to evaluate the QoE of the user with finer granularity, the invention constructs the integral utility function comprising a plurality of indexes such as cache hit rate, cache space resource utilization rate, content response time delay, energy consumption and the like. Use h k E {0,1} to indicate whether the content requested by user hits in the cache space, h if the requested content of user k hits k =1, otherwise h k =0. The cache hit rate for time slot t can be expressed as:
Figure BDA00041303036500000512
then, a normalized system cost Y (t) of the time slot t is defined, including the content response delay, the energy consumption, and the utilization of the buffered space resources, expressed as:
Figure BDA00041303036500000513
wherein ω is,
Figure BDA00041303036500000519
μ represents the weight ratio occupied by the content response time delay, the energy consumption and the buffer space resource rate utilization respectively. T (T) max (t) and E max (T) represents the maximum delay and maximum energy consumption of the edge server system in time slot T, respectively, T total (t) and E total (t) represents the total delay and total energy consumption of the edge server system at time slot t, respectively. The invention accordingly defines a new effectThe function reliability (t), i.e., the ratio of cache hit rate to normalized system cost, can be expressed as:
Figure BDA00041303036500000514
the transmission load required for local computation of the terminal device is different from that of the edge server, and the corresponding computation cost is also different, so that careful design of a communication-computation coordination strategy is required to achieve equilibrium. For this purpose, the invention constructs the complete multi-objective optimization problem as follows:
Figure BDA00041303036500000515
Figure BDA00041303036500000516
Figure BDA00041303036500000517
Figure BDA00041303036500000518
Figure BDA0004130303650000061
Figure BDA0004130303650000062
Figure BDA0004130303650000069
representing the average optimization objective function +.>
Figure BDA0004130303650000063
Representing the average utility function of a segment of time slots, the utility function of each time slot, utility (t), is calculated as in equation (6). The optimization objective of the above-described multi-objective optimization problem is to maximize cache hit rate and minimize system cost, using the optimization variable +.>
Figure BDA00041303036500000610
To represent caching decisions for all content. Constraint (7 a) indicates that the sum of the sizes of the contents cached in the edge servers is ensured not to exceed the cache capacity of the edge servers. τ represents slot τ. Constraints (7 b) - (7 d) represent binary variables for content caching, content request, and user request hit cases, respectively. Constraint (7 e) indicates that the sum of the superparameters is 1.
Considering that the multi-objective optimization problem is a non-convex problem and an NP-hard problem, and meanwhile, the calculation communication resource allocation decision at the time t+1 of the system is influenced by the cache resource allocation condition at the time t. The invention comprehensively considers the coupling of the edge multidimensional resources, the application bidirectional input characteristic and the fairness of the users, maps the content caching decision process of the edge server based on the historical preference of the users into a Markov decision process, and establishes a caching resource allocation decision network by utilizing a deep reinforcement learning network so as to achieve the combined optimization target of maximizing the utilization rate of the system resources and ensuring the QoS of the service.
As shown in FIG. 3, the bidirectional buffer storage placement method based on deep reinforcement learning takes a non-cellular network edge computing system as an environment, and selects actions to obtain maximum rewards through interaction with the environment so as to achieve the aim of searching an optimal state-action scheme. The present invention uses Deep Q Networks (DQNs) to train Q-networks with empirical replay to improve the stability of the scheme, and thus to select resource allocation decisions based on environmental conditions. The following describes the design of states, actions and rewards in the DQN in detail, respectively.
(1) And (5) state design. The state is a description of the external environment, and the intelligent agent needs to make subsequent decisions by means of the state parameter, defines the state in the decision network as s, and changes the state with time. In the embodiment of the invention, the system state at the time t is the current edge serverBuffer status s (t) = [ C 1 ,C 2 ,…,C F ]。
(2) And (5) designing actions. The action is an output parameter of the agent, which is used to adjust variable information in the system environment, defining the action in the network as a. In the embodiment of the invention, the network action a is a buffer resource allocation decision for the predicted condition of the next moment, and needs to be implemented into a real system to adjust a resource variable. At each time slot, the edge server should decide which content to cache in the user device and server, respectively, to maximize the utility function. Thus, an action may be expressed as
Figure BDA0004130303650000064
Wherein->
Figure BDA0004130303650000065
When->
Figure BDA0004130303650000066
Representing +.>
Figure BDA00041303036500000611
An action is taken to take place,
Figure BDA0004130303650000067
representation pair->
Figure BDA0004130303650000068
No action is taken, either as caching the content or no longer caching the content.
(3) And (5) rewarding the design. The reward value of the evaluation network is required to embody the advantages and disadvantages of the buffer resource allocation decision made by the deep reinforcement learning network on the overall performance of the system. For each slot, the environment designs a system prize value based on the current state, the actions in the current state, and the next state. The prize value should be designed in relation to the goal of the resource allocation decision. The invention adopts the Q-learning method to train, and can obtain the discount accumulated rewards after the action a is executed in the state s. The agent learns how to select the action with the largest Q value in each iteration and intelligently performs the action according to the optimal solution after a number of iterations.
The prize value is designed as follows: the system returns a prize in each state, and the invention sets the prize as the optimal target value. Since the optimization objective is to maximize the utility function, the reinforcement learning reward is defined as U (X) as follows:
Figure BDA0004130303650000071
wherein X represents a variable that needs to be optimized. τ represents the current slot.
In order to avoid dimensional explosion, the method adopts a core algorithm Q-network. The mapping between the input s (τ) and the output Q (s (τ), a (τ), θ) is determined by the neural network structure, where θ represents the weighting parameters of the deep neural network (Deep Neural Networks, DNN). The invention uses DNN to approach nonlinear functions to realize Q-network. The structure of DNN includes three fully connected hidden layers, each with 256, 512 neurons, respectively. In DNN, the activation function of the first two hidden layers is set to a linear rectification function (ReLUs), and the third hidden layer function is set to the tanh function.
In addition, the Q-network is trained using empirical replay to improve the stability of the scheme, and empirical data (s (τ), a (τ), r (τ), s (τ+1)) is stored at a capacity
Figure BDA0004130303650000074
Playback pool->
Figure BDA0004130303650000072
Where r (τ) represents the reward for the state after action a (τ) is performed at time τ. When the number of stored experience tuples is greater than N D At the time, from playback pool->
Figure BDA0004130303650000075
Randomly select N M The network is trained on empirical data. Action a (τ) is selected using an ε -greedy strategy to balance development and exploration. The search rate is from the initial value epsilon s Linearly decreasing to a final value epsilon e . In DRLThe relevant parameters were set as follows: learning rate α=1e-4, discount factor e=0.9, initial search rate e s =0.9, ending the search rate ε s =0.001. Assume that popularity of requested content is modeled as a Zipf distribution. Thus, the popularity of the ith content requested by the user is: />
Figure BDA0004130303650000073
ζ represents the shape parameter of the Zipf distribution, set to a constant value of 0.56.
In fig. 4 and fig. 5, the caching method of the present invention is abbreviated as DRL caching, and is compared with other existing caching schemes. Existing caching schemes include random caching, greedy caching, and genetic caching.
As shown in fig. 4, the impact of different caching schemes on utility functions with different amounts of content under the same environmental conditions is compared. As the number of content increases, the overall utility function value exhibits a decreasing trend. Because as the number of content increases, the user requests are more targeted, the cache hit rate decreases and the delay increases. The utility function value has fluctuation, and under the condition of each content number, contents with different sizes can be randomly generated, and when the total size of the contents is smaller, the same MEC server cache memory space can cache more contents, so that the hit rate is improved, the time delay is reduced, and the utility function value is increased. As can be seen from FIG. 4, the caching scheme of the present invention has higher utility function values under different content amounts.
As shown in fig. 5, the impact of different caching schemes on utility functions with different numbers of users under the same environmental conditions is compared. Overall, the utility function of all caching schemes gradually decreases as the number of users increases. This is because as users increase, the bandwidth allocated to each user decreases, the transmission rate decreases, the delay increases, and the utility function value decreases. Further, as the number of users increases, the degree of decrease in the utility function gradually decreases, because the rate of decrease in the transmission rate becomes smaller as the number of users increases. As can be seen from FIG. 5, the caching scheme of the present invention has the highest utility function value under different user numbers.
The experimental result shows that the bidirectional cache placement method based on deep reinforcement learning in the non-cellular network achieves the aim of improving the utilization efficiency of system resources on the premise of guaranteeing the fairness of users under the condition of different system resources, and can achieve better effects than the existing cache scheme, and achieve the optimization targets of maximizing the utilization rate of the system resources and guaranteeing the experience quality of all users.
Other than the technical features described in the specification, all are known to those skilled in the art. Descriptions of well-known components and well-known techniques are omitted so as to not unnecessarily obscure the present invention. The embodiments described in the above examples are not intended to represent all the embodiments consistent with the present application, and on the basis of the technical solutions of the present invention, various modifications or variations may be made by those skilled in the art without the need for inventive efforts, while remaining within the scope of the present invention.

Claims (3)

1. The bidirectional cache placement method based on deep reinforcement learning in the non-cellular network is characterized by comprising the following steps of:
(1) Establishing a utility function based on cache hit rate, cache space resource utilization rate, content response time delay and energy consumption index for the edge server node, wherein the utility function utility (t) in the time slot t is as follows:
Figure FDA0004130303630000011
wherein P is hit (t) is the cache hit rate of the edge server node in the time slot t, Y (t) is the normalized system cost of the edge server node in the time slot t, and Y (t) is calculated as follows:
Figure FDA0004130303630000012
wherein ω is,
Figure FDA0004130303630000013
And mu is the weight proportion, T total (t) and E total (T) represents the total delay and total energy consumption of the edge server in time slot T, respectively, T max (t) and E max (t) represents maximum time delay and maximum energy consumption of edge server in time slot t, C M Representing the cache capacity of the edge server, C F_M Representing the sum of the cache content sizes of the edge servers in the time slot t;
k users in the coverage area of the edge server node are set, and the user set is
Figure FDA0004130303630000014
Providing F environmental frame contents, wherein the content set is +.>
Figure FDA0004130303630000015
The data size of the content i is S i ,C i Representing the cache state of content i on edge server, C i =0 means uncached, C i =1 indicates buffered, ++>
Figure FDA0004130303630000016
Then construct the following multi-objective optimization problem solving edge server content caching decisions:
Figure FDA0004130303630000017
Figure FDA0004130303630000018
Figure FDA0004130303630000019
the optimization problem represents optimizing content caching decisions
Figure FDA00041303036300000110
An optimization target for maximizing the cache hit rate and minimizing the system cost as much as possible is achieved; />
Figure FDA00041303036300000111
Representing an average utility function, τ representing the current slot;
Figure FDA00041303036300000112
indicating whether content i is requested or not +.>
Figure FDA00041303036300000113
Representing content i->
Figure FDA00041303036300000114
Indicating that at least one user requested content i,
Figure FDA00041303036300000115
representation->
Figure FDA00041303036300000116
No user request; h is a k Indicating whether user k's request hits in the edge server's cache space, h if the content hits k =1, otherwise h k =0;
(2) Mapping a content caching decision process based on user history preference into a Markov decision process, and establishing a caching resource allocation decision network by using a deep Q network;
in the deep Q network, the state of the agent is set to be the content cache state s (t) = [ C ] of the current edge server 1 ,C 2 ,…,C F ]The method comprises the steps of carrying out a first treatment on the surface of the Actions of agent output
Figure FDA00041303036300000117
Figure FDA00041303036300000118
Indicating whether or not to take action on content i; evaluating rewards of the network computing action after execution, the rewards being set to an optimization target value +.>
Figure FDA00041303036300000119
The method comprises the steps that an edge server node collects historical requests and terminal equipment resource information of all users in a base station signal coverage area, predicts interested contents of the users according to the historical requests of the users, generates an initial content caching decision, utilizes an experience replay training Q network, and updates preferences of the users at regular time to update the content caching decision;
(3) The user terminal generates a service demand and sends a request to the edge server node, the user terminal detects whether the requested content is cached locally, if not, the requested content is obtained from the edge server node; if yes, selecting to process locally or uninstalled to the edge server node; and the edge server updates the content caching decision according to the continuously received user request.
2. The method of claim 1 wherein the edge server node's cache hit rate at time slot t
Figure FDA0004130303630000021
3. The method of claim 1 wherein the method establishes a buffer resource allocation decision network using a deep Q network, the Q network is implemented using a deep neural network DNN approximating a nonlinear function, the DNN includes three fully connected hidden layers, the activation function of the first two hidden layers is set as a linear rectification function, and the third hidden layer function is set as a tanh function.
CN202310257897.5A 2023-03-10 2023-03-10 Bidirectional cache placement method based on deep reinforcement learning in non-cellular network Pending CN116321307A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310257897.5A CN116321307A (en) 2023-03-10 2023-03-10 Bidirectional cache placement method based on deep reinforcement learning in non-cellular network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310257897.5A CN116321307A (en) 2023-03-10 2023-03-10 Bidirectional cache placement method based on deep reinforcement learning in non-cellular network

Publications (1)

Publication Number Publication Date
CN116321307A true CN116321307A (en) 2023-06-23

Family

ID=86823708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310257897.5A Pending CN116321307A (en) 2023-03-10 2023-03-10 Bidirectional cache placement method based on deep reinforcement learning in non-cellular network

Country Status (1)

Country Link
CN (1) CN116321307A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116761152A (en) * 2023-08-14 2023-09-15 合肥工业大学 Roadside unit edge cache placement and content delivery method
CN116996921A (en) * 2023-09-27 2023-11-03 香港中文大学(深圳) Whole-network multi-service joint optimization method based on element reinforcement learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116761152A (en) * 2023-08-14 2023-09-15 合肥工业大学 Roadside unit edge cache placement and content delivery method
CN116761152B (en) * 2023-08-14 2023-11-03 合肥工业大学 Roadside unit edge cache placement and content delivery method
CN116996921A (en) * 2023-09-27 2023-11-03 香港中文大学(深圳) Whole-network multi-service joint optimization method based on element reinforcement learning
CN116996921B (en) * 2023-09-27 2024-01-02 香港中文大学(深圳) Whole-network multi-service joint optimization method based on element reinforcement learning

Similar Documents

Publication Publication Date Title
Zhong et al. A deep reinforcement learning-based framework for content caching
Sadeghi et al. Deep reinforcement learning for adaptive caching in hierarchical content delivery networks
CN109639760B (en) It is a kind of based on deeply study D2D network in cache policy method
CN111556461B (en) Vehicle-mounted edge network task distribution and unloading method based on deep Q network
CN112995950B (en) Resource joint allocation method based on deep reinforcement learning in Internet of vehicles
CN112218337B (en) Cache strategy decision method in mobile edge calculation
CN116321307A (en) Bidirectional cache placement method based on deep reinforcement learning in non-cellular network
CN114390057B (en) Multi-interface self-adaptive data unloading method based on reinforcement learning under MEC environment
CN114553963B (en) Multi-edge node collaborative caching method based on deep neural network in mobile edge calculation
CN116489712B (en) Mobile edge computing task unloading method based on deep reinforcement learning
CN116260871A (en) Independent task unloading method based on local and edge collaborative caching
Yan et al. Distributed edge caching with content recommendation in fog-rans via deep reinforcement learning
Zhang et al. Two time-scale caching placement and user association in dynamic cellular networks
Li et al. DQN-enabled content caching and quantum ant colony-based computation offloading in MEC
Somesula et al. Cooperative cache update using multi-agent recurrent deep reinforcement learning for mobile edge networks
CN112911614B (en) Cooperative coding caching method based on dynamic request D2D network
CN109587715B (en) Distributed caching method based on multi-agent reinforcement learning
Lei et al. Partially collaborative edge caching based on federated deep reinforcement learning
Chen et al. Cooperative caching for scalable video coding using value-decomposed dimensional networks
CN115580900A (en) Unmanned aerial vehicle assisted cooperative task unloading method based on deep reinforcement learning
CN112261628B (en) Content edge cache architecture method applied to D2D equipment
Li et al. A smart cache content update policy based on deep reinforcement learning
Jiang et al. Caching strategy based on content popularity prediction using federated learning for F-RAN
CN110392409B (en) WMSNs multipath QoS routing method, system and storage medium based on distribution communication network
Gao et al. Deep Reinforcement Learning Based Rendering Service Placement for Cloud Gaming in Mobile Edge Computing Systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination