CN111901833B - Combined service scheduling and content caching method for unreliable channel transmission - Google Patents

Combined service scheduling and content caching method for unreliable channel transmission Download PDF

Info

Publication number
CN111901833B
CN111901833B CN202010677841.1A CN202010677841A CN111901833B CN 111901833 B CN111901833 B CN 111901833B CN 202010677841 A CN202010677841 A CN 202010677841A CN 111901833 B CN111901833 B CN 111901833B
Authority
CN
China
Prior art keywords
content
base station
user
caching
service scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010677841.1A
Other languages
Chinese (zh)
Other versions
CN111901833A (en
Inventor
罗晶晶
张琬璐
聂涛
高林
郑福春
张钦宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202010677841.1A priority Critical patent/CN111901833B/en
Publication of CN111901833A publication Critical patent/CN111901833A/en
Application granted granted Critical
Publication of CN111901833B publication Critical patent/CN111901833B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/10Flow control between communication endpoints
    • H04W28/14Flow control between communication endpoints using intermediate storage
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/12Wireless traffic scheduling
    • H04W72/1263Mapping of traffic onto schedule, e.g. scheduled allocation or multiplexing of flows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/535Allocation or scheduling criteria for wireless resources based on resource usage policies
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention provides a joint service scheduling and content caching method for unreliable channel transmission, which comprises the following steps: service scheduling: the base station with higher reliability of the scheduling channel can serve the request of the user, so that the service cost caused by retransmission can be reduced; content caching: and the state information among the intelligent agents is utilized for collaborative caching, and caching decisions can be coordinated among the base stations to realize the maximization of the reduction of service overhead. The beneficial effects of the invention are as follows: simulation results show that the service scheduling step of the invention is superior to the shortest distance priority strategy. And compared with the distributed multi-agent deep Q network strategy, when the content quantity and the local cache capacity are increased, the content caching step provided by the invention can obtain better performance and has better robustness.

Description

Combined service scheduling and content caching method for unreliable channel transmission
Technical Field
The invention relates to the technical field of wireless edge caching, in particular to a combined service scheduling and content caching method for unreliable channel transmission.
Background
With the rapid development of mobile internet and internet of things, data traffic in wireless networks grows exponentially. The latest report of cisco shows that global internet traffic will exceed 4.6ZB in 2023, 321% more than 2017. In a cellular access network mainly comprising a base station, a content request of a user sequentially passes through the base station, an S-GW and a P-GW, then enters the Internet, and is routed and forwarded to a far-end content server. The physical distance of the user from the content server may create network transmission delays. When there are fewer users and the network conditions are better, the latency of the user to the remote server is not significant. However, in a practical scenario, a large number of users often initiate a relatively concentrated transmission request for the popular content, which can cause huge stress on the network, so that the quality of experience of the users is drastically reduced. In addition, repeated transmission of large amounts of popular content (especially mobile high definition video) can result in significant waste of communication resources. In order to meet the ever-increasing data demands of users for online video, web browsing, online gaming, etc., providers are motivated to seek new service technologies that provide a high quality experience. As a key technology for next generation mobile communications, edge caching provides a new solution. The edge cache pre-stores a part of the content in the edge node, so that congestion of a backhaul link can be effectively avoided, and communication resource consumption caused by repeated downloading of the content can be reduced.
There are some related technical achievements of edge cache mechanism design in the current academia. Two common edge caching methods are: and firstly predicting the popularity of the content, then updating the cache, and directly learning the cache strategy. Nevertheless, there are still a number of problems with edge caching that remain to be solved. In a practical scenario, content delivery may fail intermittently or with significant delay due to mobility, fading, communication errors, etc. In this case, the wireless communication channel is often unreliable, and the reliability of the channel is unknown. Thus, the caching strategy on reliable channels cannot be directly applied to unreliable channel scenarios, especially in data intensive applications. In order to bring higher quality of experience to users and reduce service overhead for operators, service scheduling policies and content caching policies need to be redesigned.
Disclosure of Invention
The invention provides a joint service scheduling and content caching method for unreliable channel transmission, which comprises the following steps:
service scheduling: the base station with higher reliability of the scheduling channel can serve the request of the user, so that the service cost caused by retransmission can be reduced; the service scheduling step is a maximum prize priority (Maximal Reward Priority, MRP) policy.
Content caching: the state information among the intelligent agents is utilized for collaborative caching, and caching decisions can be coordinated among the base stations to realize maximization of the reduction of service overhead; the content caching step is a collaborative multi-agent actor evaluation (Collaborative Multi-Agent Actor Critic, CMA-AC) strategy.
As a further improvement of the present invention, in the service scheduling step, the core of the service scheduling step is to always schedule a base station with a higher channel reliability to service a user's request. However, this problem is challenging because the reliability of the channel is unknown. Taking the service user u as an example, we define that the rewards that the base station n can theoretically obtain for providing the content f to the user u can be expressed as
Wherein b u,f (t) -a service scheduling policy for a request of a user u for a content f at a time slot t;
d u,f (t) -the number of requests for content f by user u in t time slots;
a n,f (t-1) -a buffer decision of the content f by the time slot (t-1) base station n;
c n,u -the base station n serves the service overhead required for the user u to transmit once;
c 0 -the core network serves the service overhead required by the user;
p n,u -the degree of reliability of the communication channel between base station n and user u.
In the formula (4), II { b } u,f (t) =n } =1 indicates that the request of user u for content f is served by base station n,representing a reduction in service overhead due to edge caching compared to directly retrieving content from the core network; further, the average rewards obtained by base station n service u can be obtained:
wherein the method comprises the steps ofRepresenting the total number of times the first t time slot user u is served by the base station n; consider average rewards +.>The reliability of the channel can be reflected to a certain extent, and the service scheduling step of the system is obtained through a greedy algorithm. Therefore, the present invention names this service scheduling step as an MRP service scheduling policy, i.e. a base station with a higher average prize is always selected from the base stations capable of providing services to serve the user's request.
As a further improvement of the present invention, in the service scheduling step, whenAt this point, base station n has not previously served user u; in order to ensure that base station n serves user u at least once, the service scheduling step at this time is denoted b u,f (t)=n。
As a further improvement of the present invention, in the service scheduling step, whenThe user's request is serviced according to the following policies:
wherein,,
indicating a base station in which content f is buffered in a neighboring base station of user u in time slot t, l (t) indicating a slaveSelecting a base station n with the largest average prize in a time slot (t-1);
(1) the method comprises the following steps The user u requests the content f in the time slot t, and the adjacent base stations of the user u do not cache the content f, and the request of the user u for the content f is served by the core network;
(2) the method comprises the following steps In the time slot t, the user u requests the content f, and only one adjacent base station n caches the content f, and at the moment, the request of the user u for the content f is provided by the base station n;
(3) the method comprises the following steps The user u requests the content f in the time slot t, and a plurality of adjacent base stations buffer the content f, at this time, the request of the user u for the content f is served by the base station n with the largest average prize of the time slot (t-1).
As a further improvement of the present invention, in the content caching step, each time slot is divided into 3 phases: a content delivery phase, an information exchange phase and a cache update phase; in the content delivery phase, a user initiates a content request to a base station, and the base station simultaneously serves the request of the user according to a service scheduling policy; after the content delivery phase is finished, the system enters an information exchange phase, and at the phase, different base stations exchange request state information and cache decision information with each other; in the cache updating stage, each base station carries out cache updating according to the global state information obtained in the information exchange stage.
As a further improvement of the present invention, in the content caching step, each base station is regarded as an agent, which contains an Actor network and a Critic network, the Actor network being a policy network; given the state s observed by the current base station n n The Actor network can output the caching decision a of the agent n The method comprises the steps of carrying out a first treatment on the surface of the The Critic network is an evaluation network for estimating the total rewards that the system can obtain; critic network exchanges information between systemsMapping the global state s obtained in the stage to a cost function; by utilizing the Critic network to direct the Actor network to perform parameter updates, each agent is able to update its own cache in the direction of prize maximization. More specifically, each agent maintains an experience buffer. By randomly sampling and replaying past experiences, it can overcome the correlation of adjacent experiences and learn its own caching strategy based on the past.
As a further improvement of the present invention, in the content caching step, a content delivery phase: each time slot, each agent servicing the user's request according to a service scheduling step after receiving the user's request; taking the example of base station n in time slot τ, after the content delivery phase is completed, the base station can obtain the state s n (tau-1) taking action a n (tau-1) obtained prize r n (tau) while the base station enters the next state s n (τ); therefore, a group of quadruplets about base station states, rewards and buffer decisions can be obtained and put into the experience pool of each base station, and k samples are taken from the experience pool for training; after the content delivery phase is finished, the system enters an information exchange phase.
As a further development of the invention, in the content caching step, the information exchange phase: exchanging request state information and cache behavior information between base stations; after the information exchange is finished, the system enters a buffer updating stage, and each base station updates the buffer of the base station according to the corresponding buffer decision.
As a further improvement of the present invention, in the content caching step, a cache update phase: the Critic network updates network parameters through the cache state information and the cache decision information about other intelligent agents obtained in the information exchange stage; meanwhile, the Critic network is utilized to guide the Actor network to update parameters, so that each base station performs cache update towards the direction of rewarding maximization.
As a further improvement of the present invention, in the content caching step, we use epsilon for balanced exploration-utilization τ Updating cache behavior by greedy policy, i.e. with ε τ Probability random caching of (c)In 1-epsilon τ The probability of selecting the cache behavior with the largest cost function.
The beneficial effects of the invention are as follows: simulation results show that the service scheduling step of the invention is superior to the shortest distance first (Shortest Distance Priority, SDP) strategy. And, compared with distributed multi-agent deep Q network (Distributed Multi-Agent Deep Q Network, DMA-DQN) strategy, the content caching step proposed by the present invention can obtain better performance and have better robustness when the content number and local caching capacity are increased.
Drawings
FIG. 1 is a system model diagram of the present invention;
FIG. 2 is a schematic diagram of key elements of the present invention across time slots;
fig. 3-5 are performance simulation graphs of the present invention in different scenarios, respectively.
Detailed Description
Aiming at the defects of the existing service scheduling strategy and content caching strategy in the edge caching technology, the invention provides a combined service scheduling and content caching method for unreliable channel transmission, and aims to formulate a multi-agent decision problem so as to minimize the service cost of a system. The effectiveness of a deep reinforcement learning caching strategy is measured by the rewards it gets. While the design of rewards should present the goal of minimizing service overhead. Thus, the present invention defines rewards as a reduction in service overhead due to edge caching compared to directly retrieving content from the core network. The optimization goal is to find the optimal service scheduling policy and content caching policy to maximize the long-term rewards of the system.
In order to achieve the above purpose, the present invention is realized by the following technical scheme:
and (3) system model:
the application scenario considers a cellular network supporting caching, as shown in fig. 1. Wherein the base station may be connected to the core network through a backhaul link. There are N base stations (local cache capacity C) and U users in the area. The base station set and the user set are respectively represented asIs->Let us assume that the service area of the base station is limited, denoted as l c Users within this range can be served by the base station. We express the neighbouring base stations of user u as set +.>Similarly, the contiguous set of users of base station n is denoted +.>We assume that there is an exchange of information, such as request information and cache decision information, between the different base stations.
Channel model:
it is assumed that the communication channel between the user and the base station is unreliable. For user u, the set of communication channel reliabilities with all neighboring base stations is denoted asWherein p is n,u The reliability of the communication channel between base station n and user u, i.e. the probability of success in transmitting content from base station n to user u, is indicated. Non-contiguous base station for user uReliability p of the channel because of the inability to establish a communication link between them n,u =0. In this unreliable channel case, the requested content will be repeatedly transmitted by the base station until user u successfully retrieves the content.
Service model:
we assume that in the network model under consideration there is one content setThe time is divided into discrete time slots, each user independently requests content from the content library at each time slot, and the user preferences are unknown. Assume thatThe content requirements of all users in time slot t are denoted as d (t) = { d u,f (t)} U×F Wherein d is u,f (t) =1 means that user u requests content f in time slot t, otherwise d u,f (t) =0. Accordingly, the service scheduling policy at time slot t is expressed as b (t) = { b u,f (t)} U×F Wherein b u,f (t) ∈ { -1,0,1, …, N } is request d u,f Service policies of (t). Specifically, b u,f (t) = -1 represents d u,f (t) no service is required, b u,f (t) =0 and b u,f (t) =1, 2, …, N represents request d u,f (t) served by the core network and base stations 1,2, …, N, respectively. Meanwhile, we define a (t) = { a 1 (t),a 2 (t),...,a n (t),...,a N (t) } is a buffer decision of a time slot t system, where a n And (t) is a buffer decision of the base station n in the time slot t, and the total buffer number is not more than the buffer capacity of the base station.
The invention discloses a joint service scheduling and content caching method for unreliable channel transmission, which comprises a service scheduling step and a content caching step.
Service scheduling: the base station with higher reliability of the scheduling channel can serve the request of the user, so that the service cost caused by retransmission can be reduced; the service scheduling step is MRP strategy.
Content caching: the state information among the intelligent agents is utilized for collaborative caching, and caching decisions can be coordinated among the base stations to realize maximization of the reduction of service overhead; the content caching step is a CMA-AC strategy.
As shown in fig. 2, the present invention divides each slot into 3 phases: a content delivery phase, an information exchange phase and a cache update phase. In the content delivery phase, the user initiates a content request to the base station, while the base station services the user's request according to the service scheduling policy. After the content delivery phase is over, the system enters an information exchange phase. At this stage, different base stations exchange request state information, buffer decision information, and the like with each other. In the cache updating stage, each base station carries out cache updating according to the global state information obtained in the information exchange stage.
In FIG. 2, taking base station n as an example, we can obtain the caching strategy a of base station n at the beginning of time slot t, i.e. after the end of the content placement phase of time slot t-1 n (t-1). After the end of the content delivery phase of time slot t, base station n obtains the request status g of all users n (t) and rewards r brought by service users at the current moment n (t). Rewards r obtained by base station n of time slot t n (t) received by the base station buffer decision a n (t-1) service scheduling policy b n The combined influence of (t-1). Will (a) n (t-1),g n (t)) as state s of base station n in t time slot n (t) caching policy function pi n Will put state s n (t) mapping to a n (t). Thus a n (t) can also be expressed as pi n (s n (t)) given a service policy b, the performance of base station n is measured by a state value function expressed as
Where γ represents a discount factor, between (0, 1). r is (r) n (s n (τ),π n (s n (τ))|b)=r n (t+1), i.e.)
The optimization goal is to design a proper service scheduling policy b and a content caching policy pi n To maximize the overall rewards of the system, expressed as follows
η and ψ represent all possible solution sets of the service scheduling policy and the content caching policy, respectively.
Service scheduling:
the core of the service scheduling step is to always schedule base stations with a higher degree of channel reliability to service the user's request. However, this problem is challenging because the reliability of the channel is unknown. Taking the service user u as an example, we define that the rewards that the base station n can theoretically obtain for providing the content f to the user u can be expressed as
Wherein b u,f (t) -a service scheduling policy for a request of a user u for a content f at a time slot t;
d u,f (t) -the number of requests for content f by user u in t time slots;
a n,f (t-1) -a buffer decision of the content f by the time slot (t-1) base station n;
c n,u -the base station n serves the service overhead required for the user u to transmit once;
c 0 -the core network serves the service overhead required by the user;
p n,u -the degree of reliability of the communication channel between base station n and user u.
In the formula (4), II { b } u,f (t) =n } =1 indicates that the request of user u for content f is served by base station n,representing a reduction in service overhead due to edge caching compared to directly retrieving content from the core network. Furthermore, we can get the average rewards obtained by the base station n service u:
wherein the method comprises the steps ofIndicating the total number of times the first t time slot user u is served by the base station n. Consider average rewards +.>Can be used forThe reliability of the channel is reflected to a certain extent, and the service scheduling strategy of the system can be obtained through a greedy algorithm. Therefore, we name this service scheduling policy as MRP service scheduling policy, i.e. the base station that will average higher rewards is always selected from the base stations that can provide the service to serve the user's request.
The invention divides the MRP service scheduling strategy into two parts: the first part is when At this point base station n has not previously served user u. To ensure that base station n serves user u at least once, the service scheduling policy at this time can be denoted as b u,f (t) =n. The second part is-> The user's request is serviced according to the following policies:
wherein,,
indicating a base station having content f buffered in a neighbouring base station of user u in time slot t, l (t) indicating from +.>The base station n with the highest average prize in time slot (t-1) is selected.
(1) The method comprises the following steps The user u requests the content f in the time slot t, and the adjacent base stations of the user u do not cache the content f, and the request of the user u for the content f is served by the core network;
(2) the method comprises the following steps In the time slot t, the user u requests the content f, and only one adjacent base station n caches the content f, and at the moment, the request of the user u for the content f is provided by the base station n;
(3) the method comprises the following steps The user u requests the content f in the time slot t, and a plurality of adjacent base stations buffer the content f, at this time, the request of the user u for the content f is served by the base station n with the largest average prize of the time slot (t-1).
Content caching:
after giving the service policy b (t), equation 3 can be reduced to
The aim of the invention is to find an optimal strategy pi * To maximize the overall prize. By defining a state transition matrix, the state cost function can be recursively represented by the bellman equation
To describe the execution of the cache behavior in the current state, it will be under policy pi n In state s n Lower execution behavior a n Is defined as a state behavior value functionThe expression is shown in formula 9:
from the above expression we can get A denotes all possible decision sets, but due to transition probability +.>Unknown in advance, so that the state-behavior value function cannot be obtained through strategy iteration. In view of this, the present invention introduces an algorithm based on Q learning to solve this problem.
In the content caching step, each base station is regarded as an agent, and comprises an Actor network and a Critic network, wherein the Actor network is a strategy network; given the state s observed by the current base station n n The Actor network can output the caching decision a of the agent n The method comprises the steps of carrying out a first treatment on the surface of the The Critic network is an evaluation network for estimating the total rewards that the system can obtain; the Critic network maps the global state s obtained by the system in the information exchange stage to a cost function; by utilizing the Critic network to direct the Actor network to perform parameter updates, each agent is able to update its own cache in the direction of prize maximization. More specifically, each agent maintains an experience buffer. By randomly sampling and replaying past experiences, it can overcome the correlation of adjacent experiences and learn its own caching strategy based on the past.
The following details regarding the content caching step are:
1. content delivery phase: each time slot, each agent, after receiving the user's request, services the user's request according to a service scheduling policy. Taking the example of base station n in time slot τ, after the content delivery phase is completed, the base station can obtain the state s n (tau-1) taking action a n (tau-1) obtained prize r n (tau) while the base station enters the next state s n (τ). Thus, we can get a set of information about the base station status, rewards and buffering blocksThe four-element combination is put into the experience pool of each base station, and k samples are taken from the experience pool for training. After the content delivery phase is finished, the system enters an information exchange phase.
2. Information exchange stage: the base stations exchange request state information and cache behavior information. After the information exchange is finished, the system enters a buffer updating stage, and each base station updates the buffer of the base station according to the corresponding buffer decision.
3. Cache update stage: the Critic network updates the network parameters according to equation 10. Meanwhile, the Critic network is utilized to guide the Actor network to update parameters(see equation 11) such that each base station performs a cache update in the direction of the prize maximization.
Wherein,,s={s 1 ,s 2 ,…,s N and represents the state of the system.
For equilibrium exploration-utilization we use ε τ Updating cache behavior by greedy policy, i.e. with ε τ Is randomly cached with a probability of 1-epsilon τ The caching behavior with the largest probability selection cost function is shown in the following formula 12:
performance simulation:
to evaluate performance, we combine two service scheduling policies and two content caching policies, respectively, to simulate four schemes.
One of the service scheduling strategies is the MRP strategy provided by the invention, and the other is the SDP strategy. The core of the SDP policy is to select the closest edge node as possible for serving the user. Specifically, for each user within the area, the system may select an edge node closest to the user from among the edge nodes capable of providing services to the user to service the user's request.
One content caching strategy is a CMA-AC strategy provided by the invention, and the other content caching strategy is a DMA-DQN strategy. In the DMA-DQN, each edge node can be seen as an agent, and each agent finds an optimal policy through the DQN algorithm to minimize the service overhead of the system, but the different agents are independent from each other.
The four schemes are respectively as follows:
1.CMA-AC+MRP;
2.CMA-AC+SDP;
3.DMA-DQN+MRP;
4.DMA-DQN+SDP。
we first evaluate the scenario with 10 content and 2 cache capacity, and the simulation result is shown in fig. 3. The four schemes in fig. 3 have similar performance. This is because the buffer capacity of the base stations is small, resulting in a very weak degree of cooperation between the base stations. Therefore, the proposed CMA-AC caching strategy and MRP service scheduling strategy have no significant advantages.
Then, we evaluate the scenario with 15 content and 4 buffer capacity, and the simulation result is shown in fig. 4. It is noted that when the content caching policies are the same, the scheme performance of the service scheduling policy being the MRP policy is better than the scheme performance of the service scheduling policy being the SDP policy. The reason is that the MRP policy selects the channel with the highest average prize to serve the user's request, where a higher average prize means higher reliability. Whereas the SDP policy selects the nearest base station to fulfill the user's request, the nearest base station may not be the most channel-reliable base station. Thus, a scheme with a service scheduling policy being an MRP policy may achieve better performance than a scheme with a service scheduling policy being an SDP policy. In addition, when the service scheduling policies are the same, the scheme performance of the content caching policy, which is the CMA-AC policy, is better than the scheme performance of the content caching policy, which is the DAM-DQN policy. This is because the CMA-AC policy coordinates the collaboration between base stations by using the state information of the other agents, while the DMA-DQN policy updates the caching policy based only on the state information of each agent itself. Therefore, the solution performance of the content caching policy being the CMA-AC policy is better than the solution performance of the content caching policy being the DMA-DQN policy.
Finally, we evaluate the scenario with 20 content and 4 buffer capacity, and the simulation result is shown in fig. 5. In this scenario we find that schemes employing CMA-AC content caching policies still have better performance. Furthermore, the CMA-AC strategy is more robust than the DMA-DQN strategy. This is because the CAM-AC strategy aims to maximize the rewards of the system, while the DMA-DQN strategy aims to maximize the rewards of each agent.
The beneficial effects of the invention are as follows:
1. under the condition that user preference and channel reliability are unknown, the content caching strategy design of a plurality of base stations is researched, and the problem is modeled as a multi-agent deep reinforcement learning problem.
2. The invention proposes a service scheduling step (MRP service scheduling strategy) to solve the service scheduling problem. When a request content is available from a plurality of neighboring base stations, the MRP service scheduling policy may schedule base stations with higher reliability to service the user's request.
3. The content caching step (CMA-AC strategy) proposed by the present invention is used to solve the content caching problem. The CMA-AC strategy can effectively utilize the state information of other multi-agents to coordinate the collaborative caching between the base stations.
4. Simulation results show that the MRP service scheduling strategy provided by the invention has better performance than the SDP strategy, and the CMA-AC strategy provided by the invention is superior to the DMA-DQN strategy when the content quantity and the local cache capacity are increased. Furthermore, the CMA-AC strategy is more robust than the DMA-DQN strategy.
The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims (8)

1. An unreliable channel transmission oriented joint service scheduling and content caching method, comprising:
service scheduling: the base station with higher reliability of the scheduling channel can serve the request of the user, so that the service cost caused by retransmission can be reduced;
content caching: the state information among the intelligent agents is utilized for collaborative caching, and caching decisions can be coordinated among the base stations to realize maximization of the reduction of service overhead;
in the service scheduling step, defining that the base station n provides the content f for the user u can theoretically obtain rewards can be expressed as
Wherein b u,f (t) -a service scheduling policy for a request of a user u for a content f at a time slot t;
d u,f (t) -the number of requests for content f by user u in t time slots;
a n,f (t-1) -a buffer decision of the content f by the time slot (t-1) base station n;
c n,u -the base station n serves the service overhead required for the user u to transmit once;
c 0 -the core network serves the service overhead required by the user;
p n,u -the degree of reliability of the communication channel between base station n and user u;
in the formula (4), the amino acid sequence of the compound,indicating that user u requests for content f is served by base station n,/->Representing a reduction in service overhead due to edge caching compared to directly retrieving content from the core network; further, the average rewards obtained by base station n service u can be obtained:
wherein the method comprises the steps ofRepresenting the total number of times the first t time slot user u is served by the base station n; consider average rewards +.>The reliability of the channel can be reflected to a certain extent, and the service scheduling step of the system is obtained through a greedy algorithm;
in the content caching step, each time slot is divided into 3 phases: a content delivery phase, an information exchange phase and a cache update phase; in the content delivery phase, a user initiates a content request to a base station, and the base station simultaneously serves the request of the user according to a service scheduling policy; after the content delivery phase is finished, the system enters an information exchange phase, and at the phase, different base stations exchange request state information and cache decision information with each other; in the cache updating stage, each base station carries out cache updating according to the global state information obtained in the information exchange stage.
2. The federated service scheduling and content caching method as claimed in claim 1, wherein in said service scheduling step, whenAt this time base station n has not been previously takenUser u is crossed; in order to ensure that base station n serves user u at least once, the service scheduling step at this time is denoted b u,f (t)=n。
3. The federated service scheduling and content caching method as claimed in claim 1, wherein in said service scheduling step, whenThe user's request is serviced according to the following policies:
wherein,,
indicating a base station having content f buffered in a neighbouring base station of user u in time slot t, l (t) indicating from +.>Selecting a base station n with the largest average prize in a time slot (t-1);
(1) the method comprises the following steps The user u requests the content f in the time slot t, and the adjacent base stations of the user u do not cache the content f, and the request of the user u for the content f is served by the core network;
(2) the method comprises the following steps In the time slot t, the user u requests the content f, and only one adjacent base station n caches the content f, and at the moment, the request of the user u for the content f is provided by the base station n;
(3) the method comprises the following steps The user u requests the content f in the time slot t, and a plurality of adjacent base stations buffer the content f, at this time, the request of the user u for the content f is served by the base station n with the largest average prize of the time slot (t-1).
4. The joint service scheduling and content caching method according to claim 1, wherein in the content caching step, each base station is regarded as an agent, and comprises an Actor network and a Critic network, and the Actor network is a policy network; given the state s observed by the current base station n n The Actor network can output the caching decision a of the agent n The method comprises the steps of carrying out a first treatment on the surface of the The Critic network is an evaluation network for estimating the total rewards that the system can obtain; the Critic network maps the global state s obtained by the system in the information exchange stage to a cost function; by utilizing the Critic network to direct the Actor network to perform parameter updates, each agent is able to update its own cache in the direction of prize maximization.
5. The syndicated service scheduling and content caching method according to claim 4, wherein in the content caching step, a content delivery phase: each time slot, each agent servicing the user's request according to a service scheduling step after receiving the user's request; taking the example of base station n in time slot τ, after the content delivery phase is completed, the base station can obtain the state s n (tau-1) taking action a n (tau-1) obtained prize r n (tau) while the base station enters the next state s n (τ); therefore, a group of quadruplets about base station states, rewards and buffer decisions can be obtained and put into the experience pool of each base station, and k samples are taken from the experience pool for training; after the content delivery phase is finished, the system enters an information exchange phase.
6. The federated service scheduling and content caching method of claim 5, wherein in the content caching step, the information exchange phase: exchanging request state information and cache behavior information between base stations; after the information exchange is finished, the system enters a buffer updating stage, and each base station updates the buffer of the base station according to the corresponding buffer decision.
7. The federated service scheduling and content caching method of claim 6, wherein in the content caching step, the cache update phase: the Critic network updates network parameters through the cache state information and the cache decision information about other intelligent agents obtained in the information exchange stage; meanwhile, the Critic network is utilized to guide the Actor network to update parameters, so that each base station performs cache update towards the direction of rewarding maximization.
8. The joint service scheduling and content caching method according to claim 7, wherein in the content caching step, in order to balance exploration-utilization, we use epsilon greedy strategy to update the caching behavior, namely randomly caching with epsilon probability, and selecting the caching behavior with the largest cost function with 1-epsilon probability.
CN202010677841.1A 2020-07-13 2020-07-13 Combined service scheduling and content caching method for unreliable channel transmission Active CN111901833B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010677841.1A CN111901833B (en) 2020-07-13 2020-07-13 Combined service scheduling and content caching method for unreliable channel transmission

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010677841.1A CN111901833B (en) 2020-07-13 2020-07-13 Combined service scheduling and content caching method for unreliable channel transmission

Publications (2)

Publication Number Publication Date
CN111901833A CN111901833A (en) 2020-11-06
CN111901833B true CN111901833B (en) 2023-07-18

Family

ID=73192796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010677841.1A Active CN111901833B (en) 2020-07-13 2020-07-13 Combined service scheduling and content caching method for unreliable channel transmission

Country Status (1)

Country Link
CN (1) CN111901833B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113098771B (en) * 2021-03-26 2022-06-14 哈尔滨工业大学 Distributed self-adaptive QoS routing method based on Q learning
CN113094982B (en) * 2021-03-29 2022-12-16 天津理工大学 Internet of vehicles edge caching method based on multi-agent deep reinforcement learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109581983A (en) * 2018-12-07 2019-04-05 航天恒星科技有限公司 The method and apparatus of TT&C Resources dispatching distribution based on multiple agent
CN109981723A (en) * 2019-01-23 2019-07-05 桂林电子科技大学 File cache processing system and method, communication system based on deeply study
CN110213796A (en) * 2019-05-28 2019-09-06 大连理工大学 A kind of intelligent resource allocation methods in car networking

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8498628B2 (en) * 2007-03-27 2013-07-30 Iocast Llc Content delivery system and method
US10505756B2 (en) * 2017-02-10 2019-12-10 Johnson Controls Technology Company Building management system with space graphs

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109581983A (en) * 2018-12-07 2019-04-05 航天恒星科技有限公司 The method and apparatus of TT&C Resources dispatching distribution based on multiple agent
CN109981723A (en) * 2019-01-23 2019-07-05 桂林电子科技大学 File cache processing system and method, communication system based on deeply study
CN110213796A (en) * 2019-05-28 2019-09-06 大连理工大学 A kind of intelligent resource allocation methods in car networking

Also Published As

Publication number Publication date
CN111901833A (en) 2020-11-06

Similar Documents

Publication Publication Date Title
CN101217497B (en) A path selecting method of wireless mesh network
US9336178B2 (en) Optimizing content and communication in multiaccess mobile device exhibiting communication functionalities responsive of tempo spatial parameters
CN111901833B (en) Combined service scheduling and content caching method for unreliable channel transmission
US20120082131A1 (en) System and method of handover in wireless network
CN103826283A (en) Routing method and device for nodes in wireless ad hoc network
WO2015010539A2 (en) System and method for user controlled cost based network and path selection across multiple networks
CN110225493B (en) D2D routing method, system, device and medium based on improved ant colony
CN115278708B (en) Mobile edge computing resource management method oriented to federal learning
Zheng et al. 5G network-oriented hierarchical distributed cloud computing system resource optimization scheduling and allocation
CN114666843A (en) Cooperative caching method in layered network architecture
CN117939505B (en) Edge collaborative caching method and system based on excitation mechanism in vehicle edge network
US10291474B2 (en) Method and system for distributed optimal caching of content over a network
CN116916390A (en) Edge collaborative cache optimization method and device combining resource allocation
Dai et al. Proactive caching over cloud radio access network with user mobility and video segment popularity awared
Li et al. Video caching and scheduling with edge cooperation
CN112911614B (en) Cooperative coding caching method based on dynamic request D2D network
Xiong et al. Cooperative caching services on high-speed train by reverse auction
Mishra et al. An efficient content replacement policy to retain essential content in information-centric networking based internet of things network
CN114051252A (en) Multi-user intelligent transmitting power control method in wireless access network
CN110784881B (en) Method, equipment and medium for actively caching multi-level edge nodes of Internet of things terminal
CN106535231B (en) Content transmission method for 5G user-oriented central network Cache deployment
Mi et al. Joint caching and transmission in the mobile edge network: An multi-agent learning approach
Kim et al. Incoming traffic control of fronthaul in 5G mobile network for massive multimedia services
CN118102386B (en) Service caching and task unloading combined optimization method and system in D2D auxiliary MEC network
Nie et al. Joint service scheduling and content caching over unreliable channels

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant