CN114143891A - FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network - Google Patents
FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network Download PDFInfo
- Publication number
- CN114143891A CN114143891A CN202111447130.6A CN202111447130A CN114143891A CN 114143891 A CN114143891 A CN 114143891A CN 202111447130 A CN202111447130 A CN 202111447130A CN 114143891 A CN114143891 A CN 114143891A
- Authority
- CN
- China
- Prior art keywords
- base station
- ddql
- fdql
- content
- optimization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/54—Allocation or scheduling criteria for wireless resources based on quality criteria
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/54—Allocation or scheduling criteria for wireless resources based on quality criteria
- H04W72/542—Allocation or scheduling criteria for wireless resources based on quality criteria using measured or perceived quality
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Probability & Statistics with Applications (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
In the prior art, a mobile edge network presents a trend of intellectualization, diversification and integration, so that the optimal allocation of multidimensional resources faces a plurality of challenges. In order to improve the accuracy of multi-dimensional resource optimization, the invention provides an FDQL-based multi-dimensional resource collaborative optimization method in a mobile edge network. The method constructs a multidimensional resource allocation model by taking the minimized MOS as an optimization target, and designs a double-layer decision scheme. Firstly, a base station at the bottom layer utilizes a double-depth Q learning DDQL to carry out local model training to obtain an optimal decision in a short period; and then, the edge nodes on the upper layer utilize the Federal deep learning FDQL to carry out global model training so as to reduce the deviation of distributed decisions in a long period. The experimental result shows that the algorithm is superior to other methods in the aspects of reducing the content service delay and improving the user experience quality.
Description
Technical Field
The invention belongs to the technical field of communication, and particularly relates to a multi-dimensional resource collaborative optimization method based on Federal deep reinforcement learning (FDQL) in a mobile edge network.
Background
As known from the mobile market forecast report issued by Ericsson, by 2024, 5G users will reach 19 hundred million[1]The rapidly increasing data traffic exacerbates the conflict between limited spectrum, computing, and cache resources and increasing resource demand. Meanwhile, the Internet of Things (IoT, Intemet of Things)[2]Vehicle net (IoV, Intemet of Vehicles)[3]Increases the complexity of the network environment. At present, network communication faces diversified, integrated, intelligent and other challenges, which aggravates the difficulty of resource management. For the method, part of service processing and resource scheduling functions are deployed to the cloud platform by an operator to realize service performance improvement[4]。
However, in the face of future global coverage of 100%, ultra-large scale terminal device access, and mass data transmission below millisecond delay, the traditional processing platform relying on cloud computing faces a huge challenge. Especially, emerging services such as intelligent driving, Virtual Reality (VR), Augmented Reality (AR), and ultra-high video streaming increasingly depend on real-time data processing capability with high reliability and low time delay, and a cloud center far away from a user and a terminal device cannot process huge application programs in time, and network congestion and transmission delay also seriously affect user experience. Mobile Edge Computing (MEC) techniques to marginalize and localize network resources[5-9]Is one of the key technologies for solving the problems.
The MEC technology deploys a server with strong computing power to an edge node close to a user and a macro base station to increase local computing resources; meanwhile, the method is configured with larger caching capacity, and the novel data service quality of Web browsing, multimedia, social networks and the like is improved. And computing and storage resources sink to the edge of the network, so that the bandwidth pressure can be remarkably relieved, the load of a return link is reduced, and the service delay is reduced, thereby overcoming the defect of overlong service delay of the cloud center. Although a large-scale mobile network densely deployed MEC server can provide a service solution with ultra-low delay and high bandwidth for a user, the problem of resource usage imbalance still exists in a rapidly increased user service request, and therefore how to optimize a multidimensional resource allocation scheme on the premise of satisfying Quality of Experience (QoE) of the user becomes one of the problems that the MEC needs to solve urgently.
The traditional genetic algorithm, the particle swarm algorithm, the game theory, the graph theory coloring algorithm and the like can be used for solving the resource optimization problem of the MEC. Document [10] proposes a two-stage heuristic optimization algorithm based on a genetic algorithm, which decouples the joint optimization problem of computation offload and resource allocation, and obtains an allocation strategy with the minimum energy consumption by iteratively updating the solution of the problem. For the problem of multiple access characteristics and resource limitation of a base station in a dense cell, document [11] models a joint Optimization problem of a computation offloading decision and allocation of spectrum, power and computation resources as a mixed integer nonlinear programming problem of NP-hard, and obtains a computation offloading and resource scheduling scheme with low system overhead by using a Particle Swarm Optimization (PSO). Document [12] establishes an efficient network resource optimization strategy according to the Stackelberg game theory, realizes low-delay and high-reliability video service, and alleviates conflicts between transmission performance and QoE. Document [13] proposes a network resource sharing scheme based on graph theory coloring theory, which takes system resource overhead as an optimization target, realizes efficient V2X collaborative caching and computation, and communication resource allocation, and reduces multimedia service delay. The new characteristics of big data, dynamism, multiple targets and the like of the existing MEC network make the traditional method unable to fully mine network information to generate an optimal resource allocation decision.
Artificial intelligence represented by machine learning and deep learning is gradually changed from initial algorithm driving to data, algorithm and computational power composite driving, and various problems in the application field can be effectively solved. Document [14]A Reinforcement Learning (RL) optimization framework based on edge cloud computing is provided, and optimal task unloading and resource allocation decisions are quickly made by utilizing the advantages of the RL capability in adaptive processing of environmental diversity and dynamic property, so that the aims of minimizing task delay and energy consumption of a user battery are fulfilled. With exponential growth in MEC network size and structural complexityThe RL-based resource optimization algorithm has slow convergence due to huge state space, and is difficult to find an optimal solution. Deep Reinforcement Learning (DRL) estimates the value function of RL using a Deep Neural Network (DNN) to obtain an accurate approximate solution. Deep Q-Learning (DQL) is used as a DRL algorithm, the perception capability of Deep Learning and the decision capability of reinforcement Learning are combined, and the perception decision problem of a complex system is solved by continuous trial and error[15]. Document [16 ] for an MEC server to provide three multimedia services for a user, namely live streaming, buffered streaming and low-delay enhanced mobile broadband application]A QoS evaluation model is designed firstly, then network resources are dynamically distributed by using DQN to meet the QoS of users to the maximum extent, and the resource scheduling performance is superior to a circular scheduling algorithm and a priority scheduling algorithm. Document [17]]A resource allocation model with minimum task average energy consumption is constructed under the constraint conditions of communication, calculation and cache resources, and a multidimensional resource allocation method based on Double Deep Q-Learning (DDQL) is provided. Simulation results show that compared with a random algorithm, a greedy algorithm, a particle swarm algorithm and DQL, the DDQL can better solve the problem of multi-task resource allocation and reduce the average energy consumption of tasks by at least 5%. Estimating cumulative delay and reward per action using two neural networks [18 ]]A attention-based DDQL method is provided, and a CPU frequency and transmission power scheduling strategy with minimum delay and energy consumption can be obtained in a long period. Document [19 ]]With the goals of maximizing long-term profit and meeting low-delay calculation of users as targets, the DDQL is used for jointly optimizing the calculation unloading and buffer resource allocation scheme of the edge node, and the maximum profit is realized when the QoS of the service is ensured.
Usually, the training of the DRL relies on "big data", however, it is easier for the application industry to collect "small data" at low cost, and these distributed small data form numerous "data islands", which greatly restricts the usability of the DRL decision. On the other hand, centralized DRL training poses a huge challenge to the computing and storage capabilities of the MEC server. Federal Learning (Federated Learning, FL)[20]And breaking a data island, and obtaining a global optimal model with privacy safety by sharing network parameters. Document [21 ]]A resource management method based on federal learning is provided, and the bottleneck problem of intensive resource usage such as calculation, bandwidth, energy and data in the MEC is solved. To solve the problem of the decrease of average aggregation accuracy due to large data volume difference of participants [22 ]]A fair alpha-FedAvg algorithm is provided, and a more fair global resource optimization model is generated by re-weighting the FL polymerization process by utilizing the alpha value, so that the efficiency of local resource allocation is improved. Document [23 ]]And a collaborative edge caching frame based on Federated Deep Reinforcement Learning (FDRL) is provided, wherein the FDRL uploads near-optimal local DRL parameters to an edge node to participate in the next round of global FL training, so that the obtained local caching resource optimization scheme can effectively reduce backhaul flow and improve content hit rate.
Disclosure of Invention
Aiming at the trend that the mobile edge network in the prior art is intelligent, diversified and integrated, the optimal allocation of multidimensional resources faces many challenges. Aiming at the requirement of mass content service in MEC, in order to improve the accuracy of multi-dimensional resource optimization, the invention models the problem of spectrum, calculation and cache resource joint optimization into a mixed integer nonlinear programming problem, and constructs a double-layer multi-dimensional resource allocation framework decoupling original problem according to the federal deep reinforcement learning. The base station at the bottom layer adopts DDQL to obtain a local resource collaborative optimization strategy in a short period, so that the full utilization of local multidimensional resources is realized; and the edge node of the upper layer adopts FDQL to train the global optimal resource collaborative optimization strategy in a long period, so that the user can obtain the request content with low time delay and high quality.
The invention discloses a multi-dimensional resource collaborative optimization method based on FDQL in a mobile edge network, which constructs a multi-dimensional resource distribution model by taking a minimized MOS as an optimization target and designs a double-layer decision scheme. Firstly, a base station at the bottom layer utilizes a double-depth Q learning DDQL to carry out local model training to obtain an optimal decision in a short period; and then, the edge nodes on the upper layer utilize the Federal deep reinforcement learning FDQL to carry out global model training so as to reduce the deviation of distributed decisions in a long period. Simulation experiments show that the method is superior to other methods in the aspects of reducing content service delay and improving user experience quality.
Drawings
FIG. 1 is a schematic diagram of a mobile edge network;
FIG. 2 is a workflow diagram of an FDQL framework;
FIG. 3 is a graph comparing the training performance of four algorithms;
FIG. 4 is a comparison graph of the resource co-optimization performance of four algorithms under different short periods;
FIG. 5 is a comparison graph of decision times of four algorithms in different short periods;
FIG. 6 is a comparison graph of the resource co-optimization performance of four algorithms for different numbers of users;
FIG. 7 is a graph of loss function comparison of centralized DDQL and FDQL;
FIG. 8 is a graph of average MOS value comparison of a centralized DDQL and an FDQL;
FIG. 9 is a graph of decision time comparison of centralized DDQL and FDQL;
FIG. 10 is a QoE performance comparison graph of centralized DDQL and FDQL for different numbers of users;
fig. 11 is a graph comparing QoE performance of centralized DDQL and FDQL under different numbers of base stations.
Detailed Description
The invention is further described with reference to the following detailed description and accompanying drawings.
1. Overview
The invention provides a multi-dimensional resource collaborative optimization method based on Federal deep reinforcement learning FDQL, which is used for constructing a double-layer resource management architecture. Its main technical contribution includes the following three aspects:
(1) and establishing an MOS (metal oxide semiconductor) maximization model under the constraints of wireless rate, content acquisition delay and cache capacity according to the user content service request, and optimizing the allocation of frequency spectrum, calculation and cache resources in the MEC system by taking the MOS maximization model as a basis.
(2) The base station converts the MOS optimization model into a Markov Decision Process (MDP) with a continuous state space and a high-dimensional action space, and designs a multidimensional resource collaborative optimization algorithm based on the double-depth Q learning DDQL to realize local multidimensional resource joint optimization in a short period.
(3) Aiming at the global multidimensional resource joint optimization problem, the edge nodes carry out parameter aggregation for all associated base stations in a long period, and resource cooperative optimization based on FDQL is implemented. After continuous iteration, the resource optimization decision of the global model achieving ideal convergence approaches to the decision result of the centralized deep Q learning DQL, so that the multi-dimensional resource allocation meets the QoE requirement of a user.
2. System model
In this example, the MEC system consists of N Base Stations (BSs) capable of providing computing and caching services and an edge server, and the Base stations connect the edge node and the neighbor Base Station through a wired optical cable, as shown in fig. 1. Base stationThe covered cells are distributed with U randomlynIndividual intelligent terminal user, userThe base stations are connected by wireless links.
The edge node and the N base stations cooperatively cache a large amount of content to meet the content service requirements of users. Suppose that the cache contents of the whole system are collected in a long periodWherein the contentIs shown as Df. Defining the local popularity of content f at base station n as pn,fThen its global popularitySatisfying the following Mandelbrot-Zipf (MZipf) distribution
Wherein r isfFor the rank of the content f in the descending sequence of popularity, τ is the stationary factor and β is the skewness factor. Each base station buffers according to popularity in descending orderIn a part of the contentIts buffer status is expressed asAnd the edge node equipped with enough resources cachesTo ensure the quality of service of the system.
In the same time period, assume U in the nth cellnThe content request of each user is collected asThe content request status of user u is a multi-tupleWhereinRepresenting a content acquisition rate requirement; when the content f is not requestedTherefore, a content request list Table is formed in the base station to record the requirements of each user, and subsequent multidimensional resource allocation and content distribution are facilitated.
2.2 communication model
In the edge cache system, a user makes a content request to an associated base station, and the base station adds the content request into a content request list. If the base station caches the request content locally, the request content is directly sent to the user; on the contrary, if a plurality of neighbor base stations cache the request content,the local base station acquires the content from the nearest neighbor base station and sends the content to the user, otherwise, the local base station acquires the content from the edge node and sends the content to the user. Thus, define y n,u,f1 means that the requested content f is cached at the local base station and y n,u,f0 means uncached; definition of z in the same way n,u,f1 means that the requested content f is cached in the nearest neighbor base station, and z n,u,f0 means uncached.
Because a user may request a plurality of contents in the same period, and the service delay requirements of the users applying automatic navigation, video live broadcast and the like are strict, the base station processes the same content requests of different users in a sampling multicast mode, and the content service quality is enhanced by improving the frequency spectrum utilization rate. For this purpose, the system spectrum resource W is divided into L subchannels of bandwidth BAnd is shared by N cells. Generally, the Signal to Interference plus Noise Ratio (SINR) of the content received by the user u through the sub-channel l can be expressed as SINR
Wherein the transmission power of the base station n to the user u on the subchannel l is Pn,u,lChannel gain of hn,u,l;Is the noise power. The downlink transmission rate of subchannel l is then
vn,u,l=B log2(1+SINRn,u,l) (3)
The corresponding downlink transmission time delay is easily obtained from the formula (3)
From a set of content requestsAnd list Table, base station n is FnA content distribution sub-channelIntroducing a channel connection binary matrixWherein xn,u,l1 denotes that user u receives a content transmission on subchannel l; on the contrary, x n,u,l0 means no access. Next, define XnThe u-th row vector xn,u=(xn,u,l,…,xn,u,L) Is the channel connection vector of user U, then U in the nth cellnThe total transmission delay of an individual user can be expressed as
Wherein when yn,u,f1-y when equal to 0n,u,fIndicating that the request content f is cached in other base stations or edge nodes; when z isn,u,f1-z when equal to 0n,u,fIndicating that the request content f is cached in the edge node; TDn,n′、TDn,mRespectively representing the wired transmission time delay between the base station n and the nearest neighbor base station n' and the edge node m.
2.2, calculation model
At present, streaming media contents such as panoramic video, live video, VR video and the like occupy about 80% of network traffic, and content response and subsequent encoding and reconstruction processes of the streaming media all need to be supported by a computing unit. Therefore, the computing resource allocation scheme directly affects the processing delay of content requests, especially immersive interactive VR videos, and too high delay may cause frequent jams of pictures to cause dizziness, which seriously affects the user experience. To facilitate the distribution rate of requested content, the base station needs to rationally plan the allocation of computing units. Do not define γnThe number of CPU cycles required for the base station n to process a unit task, isPrinciple FnThe total calculated delay for each requested content may be expressed as
Wherein λn,fIs the amount of computing resources allocated to the content f. By combining the formulas (5) and (6), U in the nth cellnThe content acquisition delay of each user includes transmission delay and calculation delay, that is:
CAn=TDn+CDn (7)
2.3 caching the update model
The base station in the wireless network system supporting the cache is provided with the cache device with limited capacity, so that after the content service of one period is finished, the base station n needs to set according to the content requestThe cache is updated. First, the base station n calculates the local request probability of the cached content f
Wherein c isn,fThe number of requests for the content f. Consider the average request probability of content f over T consecutive periods
WhereinIs the local request probability within the period t. Setting a buffer threshold epsilonqThe base station is composed ofAnd judging whether the content f of the current request needs to be cached continuously or not. Thus, whenTime-cache update variable g n,f1 indicates that the content f needs to be cached; when in useHour g n,f0 means that the buffer content f is not needed, and the buffer space is released. At this time, the buffer content set of the base station nIs updated toThen, the base station n searches the global popularity p of the content f which is not cached in the requestfIf p isfRatio ofThe minimum popularity of the medium content is large, i.e.
Each base station has buffer spaces with different sizes, so the total content of the buffer spaces cannot exceed the maximum buffer capacity C of the base stationsmax. At the same time, the cache update policy should maximize the hit rate (hit ratio) of the requested content, which does not translate intoMaximize the sum of the popularity of the contents
Generally, the content request of the user has a time duration, so equation (11) can enable the user to achieve a higher QoE in one period.
2.4 problem modeling
Inspired by the widely used QoE metrics, the Mean Opinion Score (MOS) model [24] can be used to measure the quality of content services such as video streaming downloads or web browsing. According to the formula (7) and the formula (11), the MOS model designed by the invention is as follows
Wherein the parameters C of the linear modeln,1,Cn,2Make the MOSn∈[1,5]Weight factor wn,1,wn,2Respectively representing the influence degrees of content acquisition delay and cache updating on the MOS. MOS of easily known nth cellnThe higher the score, the higher the QoE of the user.
The invention aims to optimize the system MOS in a maximum mode through multidimensional resource cooperation so as to meet the QoE requirements of users. Thus, the multidimensional resource optimization model is
Wherein constraints C1 and C2 represent 0-1 decisions for content delivery; constraint C3 represents a 0-1 decision of channel allocation; constraint C4 indicates that the transmission power allocated by the base station to subchannel/is less than the maximum transmission power; constraint C5 ensures that the user's receiving rate is greater than a predetermined acquisition rate requirement; constraint C6 indicates that the total amount of computing resources allocated is less than the maximum computing power of the base station; constraint C7 ensures that the total amount of content after the cache update is less than the base station cache capacity.
The mixed integer programming problem (13) requires the joint solving of four classes of 0-1 decision variables xn,u,f,yn,u,f,zn,u,f,gn,fAnd a discrete variable λn,f,Pn,u,l. The above decision variables cause the problem to be non-convex optimization, and a solution method of convex optimization cannot be used. How to efficiently and accurately solve the problem (13) is the basis of a multi-dimensional resource collaborative allocation scheme, and the resource optimization of the mobile edge system is realized by using the federal deep reinforcement learning.
3. Multi-dimensional resource collaborative optimization method based on federal deep reinforcement learning
The mobile edge system of the invention is a two-layer network architecture, which respectively solves the problem of resource collaborative optimization of different time scales. The base station at the bottom layer utilizes DDQL to carry out local model training to obtain the optimal decision in a short period; then, the edge nodes on the upper layer perform global model training by using FDQL to reduce the deviation of distributed decision in a long period.
3.1 local multidimensional resource collaborative optimization based on DDQL
In the face of the problem of local multi-dimensional resource collaborative optimization, the base station n is used as an agent to model the agent into a Markov decision process, DDQL is adopted to interact with the environment in a continuous trial and error mode, and an optimal strategy pi is searched through maximum accumulated reward*。
MDP is expressed as a quadruplet<Sn,An,PRn,Rn>In which S isnRepresents a state space, AnRepresents a motion space, PRnRepresenting the probability of a state transition, RnRepresenting a reward function.
State space agent requires knowledge of the user and base station information before deciding on action selection, and thus state space SnConsisting of user requests and base station buffer status. At time slot i, system stateIs composed of
Action space the action space is the set of actions taken by an agent, the action vector of the present invention includesAllocation of resources, computing resources and cache updates, and thus action space anDefined as a multi-dimensional resource co-optimization mode
WhereinA channel connection matrix is represented that is,a vector of the power allocation is represented,it is shown that the calculation unit is assigned a vector,representing an updated content cache vector.
Reward function when the environment is in stateExecute actions at the timeThe system will enter the next stateAnd obtain instant rewardsThe optimization goal of the present invention is to maximize the QoE of the user, thus setting the MOS score as the reward function
State space SnTo the action space AnOne mapping of (d) constitutes the policy pi:current stateAction taken by policy πCan be expressed as
Where γ ∈ (0, 1) is the discount factor. According to Bellman equation, the Q function is updated by the formula
Where η ∈ (0, 1) is a learning rate that controls the learning speed.
The invention uses DDQL to search a local multidimensional resource collaborative optimization strategy and updates the parameters of the deep neural networkTo approach the Q value corresponding to the optimal strategy
In order to break the association between adjacent data and maintain the independence of the data, the DDQL establishes an experience replay (experience replay) mechanism to use the experience obtained by the agentAnd storing the experience playback pool to record each interaction process of the experience playback pool with the environment. When the experience playback pool is full, the new experience will randomly replace the old experience. On the other hand, in order toOvercoming the over-estimation of Q value in training, DDQL constructs the current network for selecting action and the target network for estimating function, and sets their state-action value functions asAndwhereinBoth represent parameters of the deep neural network.
When training starts, the DDQL firstly randomly samples small batch of samples from an experience playback pool and inputs the samples into a current network and a target network, and corresponding Q values are respectively calculated through forward propagation; then using the loss function
And performing back propagation on the current network to update the network parameters. Specifically, the calculation formula (20) relates to the parameterGradient of (2)
Where α represents the learning rate.
The DDQL-based local multidimensional resource collaborative optimization model training method is given below.
3.2 FDQL-based global multidimensional resource collaborative optimization
As described above, each base station collects content demand data of local users and maximizes the short-term utility of the multidimensional resource collaborative optimization strategy by using DDQL, but it is a difficult problem how to assist edge nodes to quickly find a global optimal strategy. If the edge node uses the traditional centralized DQL training method, not only the communication cost is increased, but also the confidentiality and security of data are damaged. Therefore, the invention designs the FDQL framework on the basis of the DDQL to construct a high-quality centralized multidimensional resource collaborative optimization model. Each base station participating in federal learning uploads model parameters to the edge node instead of user content demand data, so that data transmission burden is effectively reduced, and the problem of user privacy disclosure is solved. For simplicity of illustration, the invention performs multidimensional resource collaborative optimization according to time periods {1, …, T, …, T +1, …, T + T, …, 2T, … }. And in a short period of t ≠ kT, each base station implements DDQL model training to obtain a local optimal strategy. Generally, the content requests of the users have time continuity, so that the local resource allocation strategy of each base station can better meet the QoE required by the users in a period of time. After receiving the content in a long period, the user demand is likely to change, so that the edge node needs to perform FDQL model training in the t-kT long period to obtain a global optimal strategy, and feeds back the global optimal strategy to each base station to enhance the generalization capability of the local DDQL, thereby improving the user content acquisition experience by using a better resource allocation strategy.
Fig. 2 shows the workflow of the FDQL framework. First, each base station updates parameters according to equations (21) and (22)And uploading the data to the edge node; the edge node weights and aggregates the uploaded N parameters and obtains global parameters
WhereinwnThe sum of the contents is requested for the user in the nth cell. Next, each base station uses the global parameters of the feedbackThe next round of DDQL training is performed. The system repeats the above steps until the FDQL algorithm converges.
The FDQL-based global multidimensional resource collaborative optimization model training method is provided below.
4. Simulation analysis
The following verification proves that the multidimensional resource collaborative optimization scheme provided by the invention can maximally meet the QoE of the user on a Python platform, wherein each Monte Carlo experiment is performed 100 times. In the simulation scene, 1 edge node is deployed in the regional center, 5 base stations are uniformly distributed in an edge network, each base station serves 20 users, and the content requirements of the users are randomly generated. Other simulation parameters are shown in table 1.
TABLE 1 simulation parameters
4.1 DDQL parameter selection
The invention selects a feedback neural network comprising 3 hidden layers as a network model of the DDQL, wherein the number of neurons in the hidden layers has various combination modes and needs a large amount of experiments to combine and adjust the number. The invention runs 5 time periods in the 3 rd cell, takes the average MOS value as the evaluation index, and the experimental result is shown in the table 2. It can be seen that when the first input layer is set to 300 neurons, the second hidden layer is set to 600 neurons, and the third output layer is set to 300 neurons, the average MOS value of the DDQL is the highest. Without loss of generality, the optimal parameter settings in table 2 are adopted in the following experiments to ensure fast convergence of DDQL.
TABLE 2 DDQL parameter set comparison
4.2 Performance comparison of local multidimensional resource collaborative optimization algorithm
In order to evaluate the performance of the local multi-dimensional resource collaborative optimization algorithm provided by the invention, the local multi-dimensional resource collaborative optimization algorithm is combined with a Genetic Algorithm (GA) and a particle swarm algorithm (PSO)[11]Deep Q Learning (DQL), where PSO is a swarm intelligence optimization algorithm that simulates an entire bird population by designing particles that simulate bird behavior.
FIG. 3 compares the training performance of the four algorithms, and it is easy to see that the convergence performance of the greedy strategy GA is lower than that of the other three algorithms because the GA meets the optimal resource allocation of the current iteration and cannot guarantee that the solution is globally optimal; the PSO algorithm will yield too early convergence, i.e. tends to fall into local minima, and thus its global optimality is lower than DQL, resulting in lower MOS values. In the face of a complex state action-space model, the solving capability of the DQL is lower than that of the DDQL, the DDQL is fast and stable in convergence, the MOS value is the highest, and the best performance is shown. When the training times are more than 600, the four algorithms reach the convergence stationary phase, but the MOS values of the DDQL are respectively improved by 89.58%, 44.41% and 26.39% compared with GA, PSO and DQL, and the obtained multidimensional resource collaborative optimization strategy can provide the best content service experience for users.
Fig. 4 tests the local multidimensional resource co-optimization performance of four algorithms in 6 short periods, with 100 monte carlo experiments per period. As can be seen from fig. 4, DDQL has a higher average MOS value than GA, PSO, DQL in each period, and the user experience is better. The decision time is also one of the key indexes for measuring the QoE of the user, and higher decision delay influences the speed of the user for acquiring the content, thereby reducing the network service quality. Corresponding to fig. 4, fig. 5 shows the average decision time of four algorithms per cycle, where the GA needs more iterations to find a feasible solution of the complex problem, resulting in much higher average decision time than other algorithms; although the average decision time of PSO is greatly reduced compared with GA, the particles are easy to fall into the same chemical, and the diversity of understanding is reduced, so that the convergence rate is lower than that of the reinforcement learning method. Compared with DQL, the average decision time of DDQL is increased, but the average MOS value of a user can reach the highest value, and the optimal multidimensional resource collaborative optimization performance is realized. Table 3 shows that the total average decision time of DDQL is reduced by about 89.76%, 64.86% over 6 short cycles compared to GA, PSO.
TABLE 3 Total mean decision time for the four algorithms
Fig. 6 compares the resource co-optimization performance of the four algorithms for different numbers of users. With the increase of the number of users served by each base station, the total content demand is correspondingly increased, and the average MOS values of the four algorithms are gradually reduced. This is because the higher the total content requirement in the system is, the less multidimensional resources are allocated to each user, which affects the content acquisition speed and thus the QoE of the user. The DDQL adopts a double-layer network, and can obtain the optimal resource allocation strategy of a local optimization model (13), so that the MOS value is improved by about 97.62%, 47.34% and 30.10% compared with the GA, the PSO and the DQL, and the best content service is provided.
4.3 Performance comparison of Global multidimensional resource collaborative optimization Algorithm
In the invention, the edge nodes utilize FDQL to carry out global model training so as to reduce the deviation of distributed decision in a long period. In order to verify the validity of the FDQL, it is not compared with the conventional centralized DDQL and the loss function (20) is used as the evaluation criterion. Fig. 7 shows that the loss function of the centralized DDQL is much higher than that of the FDQL in the first 100 rounds of training, and the loss functions of the two algorithms are similar in the last 100 rounds of training, which indicates that the FDQL converges faster and has better stability.
FIG. 8 tests the global multidimensional resource co-optimization performance of four algorithms in 6 long periods. In the centralized DDQL, a large number of multidimensional resources are consumed by uploading all data to the edge nodes by each base station, so that the cooperative optimization performance of global resources is reduced, and the average MOS value of 6 long periods is reduced by 13.7% compared with that of FDQL. FIG. 9 shows the corresponding decision time, wherein the Q matrix of the centralized DDQL training is large in scale, and takes a lot of decision time; and FDQL leads each base station to train local data in parallel, and edge nodes only carry out federal fusion operation, so that the total average decision time is reduced by 83.80 percent compared with centralized DDQL.
Finally, fig. 10 and fig. 11 compare the global multidimensional resource collaborative optimization performance of the centralized DDQL and the FDQL under different network environments. As the number of users served by each base station increases to 26, fig. 10 shows that the global optimum solution is not easily obtained by the centralized DDQL due to the enlargement of the data size, and the average MOS value is sharply decreased. Similarly, when the number of base stations increases, i.e. the network scale increases, fig. 11 shows that the centralized DDQL still has a performance bottleneck, and cannot provide good content service for users of large-scale networks. However, the FDQL can obtain the optimal multidimensional resource optimization scheme regardless of the increase of the content demand of the users in the local cell or the increase of the global network scale, so as to provide stable content service for the users and ensure the efficient operation of the MEC system.
5. Summary of the invention
Aiming at the requirement of mass content service in MEC, the invention provides a multi-dimensional resource collaborative optimization algorithm based on the federal deep learning, which is beneficial to reducing the service delay of a system and improving the user experience. The method models a frequency spectrum, calculation and cache resource joint optimization problem into a mixed integer nonlinear programming problem, and constructs a double-layer multi-dimensional resource allocation framework decoupling original problem according to the federal deep reinforcement learning. The base station at the bottom layer adopts DDQL to obtain a local resource collaborative optimization strategy in a short period, so that the full utilization of local multidimensional resources is realized; and the edge node of the upper layer adopts FDQL to train the global optimal resource collaborative optimization strategy in a long period, so that the user can obtain the request content with low time delay and high quality. Simulation experiments show that the DDQL algorithm has higher local QoE performance than GA, PSO and DQL; meanwhile, the method has better global decision stability than centralized DDQL.
Reference documents:
[1]Ghosh A,Maeder A,Baker M,et al.5G evolution:a view on 5G cellular technology beyond 3GPP release 15[J].IEEE Ac-cess,2019,7(99):127639-127651.
[2]Huang J Y,Nkenyereye L,Sung N M,et al.IoT service slicing and task offloading for edge computing[J].IEEE Access,2020,8(14):11526-11547.
[3]Cheng J,Yuan G,Zhou M,et al.Accessibility analysis and modeling for IoV in an urban scene[J].IEEE Transactions on Vehic-ular Technology,2020,69(4):4246-4256.
[4]He S W,Huang W,Wang J H,et al.Cache-enabled coordinated mobile edge network:opportunities and challenges[J].IEEE Wireless Communications,2020,27(2):204-211.
[5]Yang z,Du Y,Che C,et al.Energy-efficient joint resource allocation algorithms for MEC-enabled emotional computing in ur-ban communities[J].IEEE Access,2019,7(7):137410-137419.
[6]Guo Hz,Liu J J,zhang J.Computation offloading for multi-access mobile edge computing in ultradense networks[J].IEEE Communications Magazme,2018,56(8):14-19.
[7]Kamel M,Hamouda W,Youssef A.Ultra-dense networks:a survey[J].IEEE Communications Surveys&Tutorials,2016,18(4):2522-2545.
[8]Abbas N,zhang Y,Taherkordi A,et al.Mobile edge computing:a survey[J].IEEE Internet of Things Journal,2018,5(1):450-465.
[9]zhang K,Leng S P,He Y J,et al.Mobile edge computing and networking for green and low-latency Internet of things[J].IEEE Communications Magazine,2018,56(5):39-45.
[10]Li H,Xu H,Zhou C,et al.Joint optimization strategy of computation offloading and resource allocation in multi-access edge computing environment[J].IEEE Transactions on Vehicular Technology,2020,69(9):10214-10226.
[11]Guo F,Zhang H,Hong J,et al.An efficient computation offloading management scheme in the densely dep1oyed small cell net-works with mobile edge computing[J].IEEE/ACM Transactions on Networking,2018,26(6):2651-2664.
[12]Cao T,Xu C,J Du,et al.Reliable and efficient multimedia service optimization for edge computing-based 5G networks:game theoretic approaches[J].IEEE Transactions on Vehicular Technology,2020,17(3):1610-1625.
[13] lienwei, zhanghaibo, prince heart. in the internet of vehicles, MEC-based V2x cooperative caching and resource allocation [ J ] communication bulletin, 2021, 42 (02): 26-36.
[14]Kiran N,Pan C,Wang S,et al.Joint resource allocation and computation ofloading in mobile edge computing for SDN based wireless networks[J].Journal of Communications and Networks,2020,22(1):1-11.
[15] Pengjun, wang cheng long, jiang fu, zhao, yue, liu wei rong a rapid deep Q learning network side cloud migration strategy for vehicle services [ J ]. electronic and information bulletin, 2020, 42 (01): 58-64.
[16]Guo B,Zhang X,WangY,et al.Deep-Q-network-based multimedia multi-service QoS optimization for mobile edge computing systems[J].IEEEAccess,2019,7:160961-160972.
[17] Widely available, zhang jun, li wenjing, zhou fan, torpedo, pay timely rain, qixue xue. 148-161.
[18]Ren J,Wang H,Hou T T,et al.Collaborative edge computing and caching with deep reinforcement learning decision agents[J].IEEE Access,2020,8:120604-120612.
[19]Liu T,ZhangY,Zhu Y,et al.Online computation offloading and resource scheduling in mobile-edge computing[J].IEEE Inter-net of Things Journal,2021,8(8):6649-6664.
[20]Messaond S,Bradai A,Ahmed O,et al.Deep federated Q-learning-based network slicing for industrial IoT[J].IEEE Transac-tions on Industrial Informatics,2021,17(8):5572-5582.
[21]Yu R,Li P.Toward resource-efficient federated 1earning in mobile edge computing[J].IEEE Network,2021,35(1):148-155.
[22]Zhong Z,Zhou Y P,Wu D,et al.P-FedAvg:parallelizing federated learning with theoretical guarantees[J].IEEE Transactions on Vehicular Technology,2020,69(4):4246-4256.
[23]Wang X,Wang C,Li X,et al.Federated deep reinforcement learning for internet of things with decentralized cooperative edge caching[J].IEEE Internet of Things Journal,2020,7(10):9441-9455.
[24]Rugelj M,Sedlar U,Volk M,et al.Novel cross-layer QoE-aware radio resource allocation algorithms in multiuser OFDMA systems.IEEE Transactions Communications,2014,62(9):3196-3208。
Claims (4)
1. A multi-dimensional resource collaborative optimization method based on FDQL in a mobile edge network comprises a plurality of base stations and an edge node in a mobile edge computing MEC system, wherein the base stations are communicated with the edge node and a neighbor base station, and the base stations and the edge node have the capability of providing computing and caching services; the method is characterized in that the method for the collaborative optimization of the FDQL-based multidimensional resources in the mobile edge network comprises the following steps: 1) constructing a multi-dimensional resource allocation model to represent allocation and cache updating of frequency spectrums and computing sources; 2) optimizing a multidimensional resource allocation model;
in the step 1), a multidimensional resource allocation model is constructed by taking the minimum mean opinion score MOS as an optimization target;
wherein the parameters C of the linear modeln,1,Cn,2Make the MOSn∈[1,5]Weight factor wn,1,wn,2Respectively representing the influence degrees of content acquisition delay and cache updating on the MOS; CAnIs U in the nth cellnThe content acquisition delay of each user comprises transmission delay and calculation delay; ps isnIs U in the nth cellnThe base station updates the cache according to the content request set; the nth cell is the range covered by the base station n;
MOS of nth cellnThe higher the score is, the higher the QoE (quality of experience) of the user is, and the multidimensional resource optimization model is maxMOSn;
In the step 2) of the said step,
2.1) carrying out local model training on the base station of the bottom layer by using the double-depth Q learning DDQL to obtain the optimal decision in a short period:
2.1.1) taking a base station n as an agent, and modeling a local resource allocation problem into a Markov Decision Process (MDP);
2.1.2 interacting with the environment in a continuous trial and error mode by adopting DDQL, and searching an optimal strategy by maximizing accumulated rewards;
2.2) carrying out global model training on the edge nodes on the upper layer by using the Federal deep reinforcement learning FDQL to reduce the deviation of distributed decisions in a long period:
performing multidimensional resource collaborative optimization according to a time period {1, …, T, …, T, T +1, …, T + T, …, 2T, … };
in a short period that t is not equal to kT, each base station implements DDQL model training to obtain a local optimal multidimensional resource allocation strategy;
and (3) carrying out FDQL model training by the edge node in the t-kT long period to obtain a globally optimal multi-dimensional resource allocation strategy, and feeding back to each base station to enhance the generalization capability of the local DDQL, thereby improving the user content acquisition experience by using a better resource allocation strategy.
2. The method as claimed in claim 1, wherein in the step 2.1.1), the Markov decision process MDP is expressed as a quadruple < Sn,An,PRn,RnIs where SnRepresents a state space, AnRepresents a motion space, PRnRepresenting the probability of a state transition, RnRepresenting a reward function;
state space: the state space S is the space where the agent needs to know the information of the user and the base station before deciding on the action selectionnThe method comprises the steps of consisting of a user request and a base station cache state; at time slot i, system stateWhere r and c represent content requests and content caches respectively,andrespectively represent the 1 st and the U thnThe status of the individual user or users is,representing the buffer state of the base station n;
an action space: the action space is a set of actions taken by the agent; the motion vector includes communication, allocation of computing resources, and cache update, then motion space AnThe method is defined as a multidimensional resource collaborative optimization mode:whereinA channel connection matrix is represented that is,a vector of the power allocation is represented,it is shown that the calculation unit is assigned a vector,a content cache vector representing an update;
the reward function: when the environment is in a stateExecute actions at the timeThe system enters the next stateAnd obtain instant rewardsSet the MOS score to the reward function
from Bellman equalisation Bellman's equation, update of Q function
3. The method as claimed in claim 2, wherein in step 2.1.2), DDQL is used to find the local multidimensional resource collaborative optimization strategy, and parameters of the deep neural network are updatedTo approach the Q value corresponding to the optimal strategy,
DDQL establishes experience playback mechanism to transfer the experience obtained by the agentStoring the experience playback pool to record each interaction process of the experience playback pool and the environment;
when the experience playback pool is full, the new experience randomly replaces the old experience;
the DDQL constructs the current network for selection action and the function for evaluationA plurality of target networks, and setting their state-action value functions asAndwhereinAll represent parameters of the deep neural network;
when training starts, the DDQL firstly randomly samples small batch of samples from an experience playback pool and inputs the samples into a current network and a target network, and corresponding Q values are respectively calculated through forward propagation; then, the loss function is used for carrying out back propagation on the current network so as to update the network parameters;
4. The method for collaborative optimization of multidimensional resources based on FDQL in mobile edge network as claimed in claim 3, wherein in the step 2.2), the workflow of FDQL framework:
the edge node weights and aggregates the uploaded N parameters, and obtains global parametersWherein the subscript g representsAs a result of the global parameter,wnrequesting the sum of contents for the users in the nth cell;
next, each base station uses the global parameters of the feedbackCarrying out next round of DDQL training;
and repeating the steps until the FDQL algorithm converges.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111447130.6A CN114143891A (en) | 2021-11-30 | 2021-11-30 | FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111447130.6A CN114143891A (en) | 2021-11-30 | 2021-11-30 | FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114143891A true CN114143891A (en) | 2022-03-04 |
Family
ID=80386175
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111447130.6A Pending CN114143891A (en) | 2021-11-30 | 2021-11-30 | FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114143891A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114449482A (en) * | 2022-03-11 | 2022-05-06 | 南京理工大学 | Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning |
CN114745383A (en) * | 2022-04-08 | 2022-07-12 | 浙江金乙昌科技股份有限公司 | Mobile edge calculation assisted multilayer federal learning method |
CN115002212A (en) * | 2022-04-12 | 2022-09-02 | 广州大学 | Combined caching and unloading method and system based on cross entropy optimization algorithm |
CN115080249A (en) * | 2022-08-22 | 2022-09-20 | 南京可信区块链与算法经济研究院有限公司 | Vehicle networking multidimensional resource allocation method and system based on federal learning |
CN115208952A (en) * | 2022-07-20 | 2022-10-18 | 北京交通大学 | Intelligent collaborative content caching method |
CN115361688A (en) * | 2022-07-13 | 2022-11-18 | 西安电子科技大学 | Industrial wireless edge gateway optimization layout scheme based on machine learning |
CN115756873A (en) * | 2022-12-15 | 2023-03-07 | 北京交通大学 | Mobile edge computing unloading method and platform based on federal reinforcement learning |
CN116032757A (en) * | 2022-12-16 | 2023-04-28 | 缀初网络技术(上海)有限公司 | Network resource optimization method and device for edge cloud running scene |
CN116209015A (en) * | 2023-04-27 | 2023-06-02 | 合肥工业大学智能制造技术研究院 | Edge network cache scheduling method, system and storage medium |
CN116321219A (en) * | 2023-01-09 | 2023-06-23 | 北京邮电大学 | Self-adaptive honeycomb base station federation forming method, federation learning method and device |
CN116346921A (en) * | 2023-03-29 | 2023-06-27 | 华能澜沧江水电股份有限公司 | Multi-server collaborative cache updating method and device for security management and control of river basin dam |
CN117042051A (en) * | 2023-08-29 | 2023-11-10 | 燕山大学 | Task unloading strategy generation method, system, equipment and medium in Internet of vehicles |
CN116346921B (en) * | 2023-03-29 | 2024-06-11 | 华能澜沧江水电股份有限公司 | Multi-server collaborative cache updating method and device for security management and control of river basin dam |
-
2021
- 2021-11-30 CN CN202111447130.6A patent/CN114143891A/en active Pending
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114449482A (en) * | 2022-03-11 | 2022-05-06 | 南京理工大学 | Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning |
CN114449482B (en) * | 2022-03-11 | 2024-05-14 | 南京理工大学 | Heterogeneous Internet of vehicles user association method based on multi-agent deep reinforcement learning |
CN114745383A (en) * | 2022-04-08 | 2022-07-12 | 浙江金乙昌科技股份有限公司 | Mobile edge calculation assisted multilayer federal learning method |
CN115002212A (en) * | 2022-04-12 | 2022-09-02 | 广州大学 | Combined caching and unloading method and system based on cross entropy optimization algorithm |
CN115002212B (en) * | 2022-04-12 | 2024-02-27 | 广州大学 | Combined caching and unloading method and system based on cross entropy optimization algorithm |
CN115361688B (en) * | 2022-07-13 | 2023-11-10 | 西安电子科技大学 | Industrial wireless edge gateway optimization layout scheme based on machine learning |
CN115361688A (en) * | 2022-07-13 | 2022-11-18 | 西安电子科技大学 | Industrial wireless edge gateway optimization layout scheme based on machine learning |
CN115208952A (en) * | 2022-07-20 | 2022-10-18 | 北京交通大学 | Intelligent collaborative content caching method |
CN115208952B (en) * | 2022-07-20 | 2023-09-26 | 北京交通大学 | Intelligent collaborative content caching method |
CN115080249A (en) * | 2022-08-22 | 2022-09-20 | 南京可信区块链与算法经济研究院有限公司 | Vehicle networking multidimensional resource allocation method and system based on federal learning |
CN115756873A (en) * | 2022-12-15 | 2023-03-07 | 北京交通大学 | Mobile edge computing unloading method and platform based on federal reinforcement learning |
CN115756873B (en) * | 2022-12-15 | 2023-10-13 | 北京交通大学 | Mobile edge computing and unloading method and platform based on federation reinforcement learning |
CN116032757A (en) * | 2022-12-16 | 2023-04-28 | 缀初网络技术(上海)有限公司 | Network resource optimization method and device for edge cloud running scene |
CN116032757B (en) * | 2022-12-16 | 2024-05-10 | 派欧云计算(上海)有限公司 | Network resource optimization method and device for edge cloud running scene |
CN116321219A (en) * | 2023-01-09 | 2023-06-23 | 北京邮电大学 | Self-adaptive honeycomb base station federation forming method, federation learning method and device |
CN116321219B (en) * | 2023-01-09 | 2024-04-19 | 北京邮电大学 | Self-adaptive honeycomb base station federation forming method, federation learning method and device |
CN116346921A (en) * | 2023-03-29 | 2023-06-27 | 华能澜沧江水电股份有限公司 | Multi-server collaborative cache updating method and device for security management and control of river basin dam |
CN116346921B (en) * | 2023-03-29 | 2024-06-11 | 华能澜沧江水电股份有限公司 | Multi-server collaborative cache updating method and device for security management and control of river basin dam |
CN116209015B (en) * | 2023-04-27 | 2023-06-27 | 合肥工业大学智能制造技术研究院 | Edge network cache scheduling method, system and storage medium |
CN116209015A (en) * | 2023-04-27 | 2023-06-02 | 合肥工业大学智能制造技术研究院 | Edge network cache scheduling method, system and storage medium |
CN117042051B (en) * | 2023-08-29 | 2024-03-08 | 燕山大学 | Task unloading strategy generation method, system, equipment and medium in Internet of vehicles |
CN117042051A (en) * | 2023-08-29 | 2023-11-10 | 燕山大学 | Task unloading strategy generation method, system, equipment and medium in Internet of vehicles |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114143891A (en) | FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network | |
Zhou et al. | Incentive-driven deep reinforcement learning for content caching and D2D offloading | |
Wei et al. | Joint optimization of caching, computing, and radio resources for fog-enabled IoT using natural actor–critic deep reinforcement learning | |
Lin et al. | Resource management for pervasive-edge-computing-assisted wireless VR streaming in industrial Internet of Things | |
CN112020103B (en) | Content cache deployment method in mobile edge cloud | |
Yang et al. | Social-energy-aware user clustering for content sharing based on D2D multicast communications | |
Zhang et al. | Joint optimization of cooperative edge caching and radio resource allocation in 5G-enabled massive IoT networks | |
Shan et al. | A survey on computation offloading for mobile edge computing information | |
Cao et al. | Reliable and efficient multimedia service optimization for edge computing-based 5G networks: game theoretic approaches | |
Ko et al. | Joint client selection and bandwidth allocation algorithm for federated learning | |
Majidi et al. | Hfdrl: An intelligent dynamic cooperate cashing method based on hierarchical federated deep reinforcement learning in edge-enabled iot | |
CN104426979A (en) | Distributed buffer scheduling system and method based on social relations | |
Chua et al. | Resource allocation for mobile metaverse with the Internet of Vehicles over 6G wireless communications: A deep reinforcement learning approach | |
Cha et al. | Fuzzy logic based client selection for federated learning in vehicular networks | |
Fu et al. | Traffic prediction-enabled energy-efficient dynamic computing resource allocation in cran based on deep learning | |
CN113918829A (en) | Content caching and recommending method based on federal learning in fog computing network | |
Xi et al. | Real-time resource slicing for 5G RAN via deep reinforcement learning | |
Balasubramanian et al. | FedCo: A federated learning controller for content management in multi-party edge systems | |
Sun et al. | A DQN-based cache strategy for mobile edge networks | |
CN113821346B (en) | Edge computing unloading and resource management method based on deep reinforcement learning | |
Seid et al. | Blockchain-empowered resource allocation in Multi-UAV-enabled 5G-RAN: a multi-agent deep reinforcement learning approach | |
Lin et al. | Joint optimization of preference-aware caching and content migration in cost-efficient mobile edge networks | |
Zhang et al. | Toward intelligent resource allocation on task-oriented semantic communication | |
Cui et al. | Multi-Agent Reinforcement Learning Based Cooperative Multitype Task Offloading Strategy for Internet of Vehicles in B5G/6G Network | |
Liu et al. | Multi-agent federated reinforcement learning strategy for mobile virtual reality delivery networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |