CN116996921A - Whole-network multi-service joint optimization method based on element reinforcement learning - Google Patents
Whole-network multi-service joint optimization method based on element reinforcement learning Download PDFInfo
- Publication number
- CN116996921A CN116996921A CN202311252903.4A CN202311252903A CN116996921A CN 116996921 A CN116996921 A CN 116996921A CN 202311252903 A CN202311252903 A CN 202311252903A CN 116996921 A CN116996921 A CN 116996921A
- Authority
- CN
- China
- Prior art keywords
- network
- module
- reinforcement learning
- rate
- route
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000005457 optimization Methods 0.000 title claims abstract description 30
- 230000002787 reinforcement Effects 0.000 title claims abstract description 27
- 238000004891 communication Methods 0.000 claims abstract description 32
- 230000005540 biological transmission Effects 0.000 claims abstract description 31
- 230000006870 function Effects 0.000 claims abstract description 13
- 238000012549 training Methods 0.000 claims abstract description 10
- 238000009826 distribution Methods 0.000 claims description 9
- 230000008859 change Effects 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 238000012512 characterization method Methods 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 238000010276 construction Methods 0.000 claims description 2
- 230000007704 transition Effects 0.000 claims description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/02—Arrangements for optimising operational condition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0985—Hyperparameter optimisation; Meta-learning; Learning-to-learn
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a full-network multi-service joint optimization method based on element reinforcement learning, which comprises the following steps: s1, a 5G communication platform comprising a three-layer network structure of a wireless access network, a transmission network and a core network is built, and an objective function of joint optimization is determined; s2, constructing a multi-service-oriented route cache module; s3, constructing a meta reinforcement learning model, which comprisesThe system comprises an Actor network, a Critic network and a task experience set caching module; s4, training and determining parameters of a route cache module based on the meta reinforcement learning model; s5, multi-service joint optimization of the 5G whole network is carried out. The invention is controlled byThe route caching method of the whole system communication network at each layer of network management realizes the whole network joint optimization of multiple services.
Description
Technical Field
The invention relates to the field of communication, in particular to a full-network multi-service joint optimization method based on meta reinforcement learning.
Background
In recent years, the rapid development of 5G mobile communication technology has greatly improved the communication quality of end-to-end transmission services. However, with the increasing coverage and interaction depth of terminal devices, the 5G communication network meets the difficulties of large service traffic, high jitter, multiple types and multiple numbers, and brings great challenges to the whole network resource control.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a full-network multi-service joint optimization method based on element reinforcement learning, which realizes the full-network joint optimization of multi-service by controlling a route caching method of a communication full-network at each layer of network management.
The aim of the invention is realized by the following technical scheme: a full-network multi-service joint optimization method based on element reinforcement learning comprises the following steps:
s1, constructing a 5G communication platform comprising a three-layer network structure of a wireless access network, a transmission network and a core network, and determining a joint optimization objective function;
s2, constructing a multi-service-oriented route cache module;
s3, constructing a meta reinforcement learning model, which comprisesThe system comprises an Actor network, a Critic network and a task experience set caching module;
s4, training and determining parameters of a route cache module based on the meta reinforcement learning model;
s5, multi-service joint optimization of the 5G whole network is carried out.
The beneficial effects of the invention are as follows: the invention realizes the whole network joint optimization of multiple services by controlling the route caching method of the communication whole network at each layer of network management. Considering that the traditional multi-service joint optimization focuses on the design of time and frequency spectrum slice diversity of communication resources of each network, the optimization upper bound of the whole-network multi-service transmission is difficult to approach, and the situation of unstable multi-service rate flow is more difficult to deal with. The invention adopts a meta-reinforcement learning algorithm, can learn the route caching method under the distribution of different business flows of multiple businesses, and greatly improves the utilization rate of the whole network resources and the service quality of the multiple businesses.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The technical solution of the present invention will be described in further detail with reference to the accompanying drawings, but the scope of the present invention is not limited to the following description.
As shown in fig. 1, a method for optimizing the combination of multiple services in a whole network based on element reinforcement learning includes the following steps:
s1, constructing a 5G communication platform comprising a three-layer network structure of a wireless access network, a transmission network and a core network, and determining a joint optimization objective function;
s101: and building a 5G open communication platform and a three-layer network structure comprising a wireless access network, a transmission network and a core network.
S1011: building a radio access network comprisingA wireless access terminal and an access base station, < >>Personal wireless access terminal initiation->Communication service flows and corresponding air interface communication rate is +.>The method comprises the steps of carrying out a first treatment on the surface of the The access base station comprises->Radio channel resources and channel gain of +.>The method comprises the steps of carrying out a first treatment on the surface of the The radio access network adopts a standard 5G air interface channel allocation scheme, which is marked as +.>;/>After entering the radio access network, the radio access network will be based on +.>Channel allocation is carried out on the communication service flows, and based on the result of the channel allocation, the speed of the communication service flows can correspondingly change when the wireless access network is output, and the communication service flows are recorded as +.>The output rate of the individual communication traffic streams is +.>Its value is->,/> and />And (5) completely determining.
S1012: and constructing a transmission network, including a transmission network route and a transmission network link. The transmission network comprisesBackground stream traffic in individual dimensions, rate is denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the The transmission network adopts standard 5G transmission route protocol, which is marked as +.>. The transmission network is based on->After the routing configuration is completed, the method comprises the steps of (1)>The size of each communication service flow is correspondingly changed, and weThe input rate of the transmission network is recorded asOutput rate is +.>, wherein />The value of (2) is->,/> and />And (5) completely determining.
S1013: and building a core network, wherein the core network comprises a core network route and a core network link. The core network comprisesBackground stream traffic in individual dimensions, rate is denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the The core network adopts the standard 5G core network routing protocol, which is marked as +.>. The core network is based on->After the routing configuration is completed, the method comprises the steps of (1)>The size of each communication service flow will change correspondingly, and we will note that the input rate of the core network is as followsOutput rate is +.>, wherein />The value of (2) is->,/> and />And (5) completely determining.
S102: and (5) representing a multi-service joint optimization target. In the first placeAt time we will->The optimization objective function of individual business is marked +.>The size is +.>,/>,/>,/>,/> and />Influence. The objective function of the multi-service joint optimization can be written as
(1.1)
S2, constructing a multi-service-oriented route cache module;
s201: a first route buffer module is constructed between the wireless access network and the transmission network, and the module can realize the rate control of any input flow by adopting a classical route buffer algorithm. We will input wireless into the networkThe communication traffic flows are input into the module, and the corresponding traffic flow rate is +>The output traffic flow rate is +.>;
S202: and constructing a second route buffer module between the transmission network and the core network, wherein the module can realize the rate control of any input flow by adopting a classical route buffer algorithm. We will transport the networkThe communication traffic flows are input into the module, and the corresponding traffic flow rate is +>The output traffic flow rate is +.>;
S3, constructing a meta reinforcement learning model, which comprisesThe system comprises an Actor network, a Critic network and a task experience set caching module;
s301: constructionAnd an Actor network. Each Actor network is a double-layer fully-connected neural network, which is marked by +.>The parameters of the individual Actor network are +.>The input is +.>When->At the time->The output of the personal Actor network is +.>I.e. +.>Is>Element>The rate at which the individual traffic streams are output from the first route buffer module; when->At the time->The output of the personal Actor network is +.>I.e. +.>Is the first of (2)Element>The rate at which the individual traffic streams are output from the second route buffer module;
s302: constructing a Critic network which is a double-layer fully-connected nerveNetwork and by parametersCharacterization, its input includes->,/> and />The output of which characterizes the value function of the input variable value;
s303: and (3) constructing a task experience set caching module: the module is a buffer with a fixed storage space, and the initial state is empty and is used for storing task experience generated in the training process of meta reinforcement learning.
S4, training and determining parameters of a route cache module based on the meta reinforcement learning model;
s401, establishing a Markov decision model;
s4011 define a state asAction is +.>The reward is +.>;
S4012. Determine a state transition relationship: based on,/>,/>,/>Is estimated by Bayesian reasoning>,/>,/> and />Values or distributions of (2) to thereby obtainA value or distribution of (2);
due toDefined as->,/>,/>,/>Is able to infer +.>Is->,,/> and />Is aware of->,/>,/> and />The value or distribution of (2) is obtained>A value or distribution of (2);
s402: generating a task data set;
s4021: order the;
S4022: random initializationTraffic flow rate distribution for individual wireless access terminals;
s4023: order the;
S4024: observationAnd send in->An Actor network to give a probability of 0.98 +.>The output of the personal Actor network is assigned to +.>Back->The output of the personal Actor network is assigned to +.>And thus get->Is assigned to +.a.A set of random values with a probability of 0.02>To ensure the exploration of the meta reinforcement learning algorithm;
s4025: execution ofThe input rate of the transmission network and the core network is made to be +.> and />;
S4026: observing and recording and />Is a value of (2);
s4027: will beArchive as an experience and store in task experience set caching module +.>The experience of each task is concentrated;
s4028: if it isTerminating the loop, proceeding to step S4029, otherwise letting +.>Returning to step S4024;
s4029: if it isTerminating the loop, proceeding to step S403, otherwise letting +.>Returning to step S4022;
s403: training a meta reinforcement learning model;
s4031: random initialization and />Is a value of (2);
s4032: order the;
S4033: randomly selecting a task experience set from 100 task experience sets;
s4034: order the;
S4035: taking K experiences from task experience setCalculating a loss function
(1.2)
wherein ,,/>is->Personal Actor network->For output at input, ++>For Critic network with-> and />Is the output at the time of input. We use the->Value back propagation update parameter in Critic network>;
S4036: minimization ofTo update->Parameter in personal Actor network +.>, wherein ,;
s4037: if it isTerminating the loop, proceeding to step S4038, otherwise letting +.>Returning to step S4035;
s4038: if it isTerminating the loop, proceeding to step S404, otherwise letting +.>Returning to step S4033;
s404: before trainingThe Actor network is deployed to the first route buffer module and then +.>And deploying the Actor network to a second route cache module.
S5, carrying out multi-service joint optimization of the 5G whole network;
s501: order the;
S502: observationThe value of (2) is fed into a route buffer module, and is obtained based on an Actor network in the route buffer module> and />Is a value of (2);
s503: performing a slave at a first route cache moduleTo->Is performed from +.>To->Is a rate conversion of (2);
s504: judging whether or not to meet;
If it isThe circulation is terminated, and multi-service joint optimization of the whole network is completed at the moment;
it should be noted that: this process calculates and />Is updated at different times, since the size of the objective function of the multi-service joint optimization is subject to +.>,/>Influence, so optimize-> and />The effect of the joint optimization of multiple services is achieved.
Otherwise, letAnd returns to step S502.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (6)
1. A full-network multi-service joint optimization method based on element reinforcement learning is characterized by comprising the following steps of: the method comprises the following steps:
s1, constructing a 5G communication platform comprising a three-layer network structure of a wireless access network, a transmission network and a core network, and determining a joint optimization objective function;
s2, constructing a multi-service-oriented route cache module;
s3, constructing a meta reinforcement learning model, which comprisesThe system comprises an Actor network, a Critic network and a task experience set caching module;
s4, training and determining parameters of a route cache module based on the meta reinforcement learning model;
s5, multi-service joint optimization of the 5G whole network is carried out.
2. The method for optimizing the whole network multi-service combination based on element reinforcement learning according to claim 1, wherein the method comprises the following steps: the step S1 includes:
s101: building a 5G communication platform comprising a three-layer network structure of a wireless access network, a transmission network and a core network:
s1011: building a radio access network comprisingA wireless access terminal and an access base station, < >>Personal wireless access terminal initiation->Communication service flows and corresponding air interface communication rate is +.>The method comprises the steps of carrying out a first treatment on the surface of the The access base station comprises->Radio channel resources and channel gain of +.>The method comprises the steps of carrying out a first treatment on the surface of the The radio access network adopts a standard 5G air interface channel allocation scheme, which is marked as +.>;/>After entering the radio access network, the radio access network will be based on +.>Channel allocation is carried out on the communication service flows, and based on the result of the channel allocation, the speed of the communication service flows can correspondingly change when the wireless access network is output, and the communication service flows are recorded as +.>The output rate of the individual communication traffic streams is +.>Its value is->,/> and />Completely determining;
s1012: constructing a transmission network, including a transmission network route and a transmission network link; the transmission network comprisesBackground stream traffic in individual dimensions, rate is denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the The transmission network adopts standard 5G transmission route protocol, which is marked as +.>The method comprises the steps of carrying out a first treatment on the surface of the The transmission network is based on->After the routing configuration is completed, the method comprises the steps of (1)>The size of each communication service flow is correspondingly changed, and the input rate of the transmission network is recorded as +.>Output rate is +.>, wherein />The value of (2) is->,/> and />Completely determining;
s1013: building a core network, including a core network route and a core network link; the core network comprisesBackground stream traffic in individual dimensions, rate is denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the The core network adopts the standard 5G core network routing protocol, which is marked as +.>The core network is based on->After the routing configuration is completed, the method comprises the steps of (1)>The size of each communication service flow will change correspondingly, and the input rate of the core network is recorded as +.>Output rate is +.>, wherein />The value of (2) is->,/> and />Completely determining;
s102: characterizing multi-business joint optimization objectives, in the firstAt time we will->The optimization objective function of each service is recorded asThe objective function of the multi-service joint optimization is recorded as
(1.1)
Size is subject to->,/>,/>,/>,/> and />Influence.
3. The method for optimizing the whole network multi-service combination based on element reinforcement learning according to claim 1, wherein the method comprises the following steps: the step S2 includes:
s201: a first route buffer module is constructed between the wireless access network and the transmission network, the module adopts classical route buffer algorithm to realize the rate control of any input flow, and the wireless input network is connected with the first route buffer moduleThe communication traffic flows are input into the module, and the corresponding traffic flow rate is +>The output traffic flow rate is +.>;
S202: a second route buffer module is constructed between the transmission network and the core network, the module adopts classical route buffer algorithm to realize the rate control of any input flow, and the transmission network is connected with the second route buffer moduleIndividual traffic streamsInto the module, the corresponding traffic flow rate is +.>The output traffic flow rate is +.>。
4. The method for optimizing the whole network multi-service combination based on element reinforcement learning according to claim 1, wherein the method comprises the following steps: the step S3 includes:
s301: constructionAn Actor network, each of which is a double-layer fully-connected neural network, record +.>The parameters of the individual Actor network are +.>The input is +.>When->At the time->The output of the personal Actor network is +.>I.e. +.>Is>Element>The rate at which the individual traffic streams are output from the first route buffer module; when->At the time->The output of the personal Actor network is +.>I.e. +.>Is>Element>The rate at which the individual traffic streams are output from the second route buffer module;
s302: constructing a Critic network which is a two-layer fully connected neural network and is composed of parametersCharacterization, its input includes->,/> and />The output of which characterizes the value function of the input variable value;
s303: and (3) constructing a task experience set caching module: the module is a buffer with a fixed storage space, and the initial state is empty and is used for storing task experience generated in the training process of meta reinforcement learning.
5. The method for optimizing the whole network multi-service combination based on element reinforcement learning according to claim 1, wherein the method comprises the following steps: the step S4 includes:
s401, establishing a Markov decision model;
s4011 define a state asAction is +.>The reward is +.>;
S4012. Determine a state transition relationship: based on,/>,/>,/>Is estimated by Bayesian reasoning>,/>,/> and />The value or distribution of (2) thereby obtaining +.>A value or distribution of (2);
s402: generating a task data set;
s4021: order the;
S4022: random initializationTraffic flow rate distribution for individual wireless access terminals;
s4023: order the;
S4024: observationAnd send in->An Actor network to give a probability of 0.98 +.>The output of the personal Actor network is assigned to +.>Back->The output of the personal Actor network is assigned to +.>And thus get->To follow a group with a probability of 0.02Assignment of machine value to +.>To ensure the exploration of the meta reinforcement learning algorithm;
s4025: execution ofThe input rate of the transmission network and the core network is made to be +.> and />;
S4026: observing and recording and />Is a value of (2);
s4027: will beArchive as an experience and store in task experience set caching module +.>The experience of each task is concentrated;
s4028: if it isTerminating the loop, proceeding to step S4029, otherwise letting +.>Returning to step S4024;
s4029: if it isTerminating the loop, proceeding to step S403, otherwiseLet->Returning to step S4022;
s403: training a meta reinforcement learning model;
s4031: random initialization and />Is a value of (2);
s4032: order the;
S4033: randomly selecting a task experience set from 100 task experience sets;
s4034: order the;
S4035: taking K experiences from task experience setCalculating a loss function
(1.2)
wherein ,,/>is->Personal Actor network->For the output at the time of input,for Critic network with-> and />Is the output at the time of input; we use the->Value back propagation update parameter in Critic network>;
S4036: minimization ofTo update->Parameter in personal Actor network +.>, wherein ,;
s4037: if it isTerminating the loop, proceeding to step S4038, otherwise letting +.>Returning to step S4035;
s4038: if it isThe cycle is terminated and, if necessary,step S404 is entered, otherwise let +.>Returning to step S4033;
s404: before trainingThe Actor network is deployed to the first route buffer module and then +.>And deploying the Actor network to a second route cache module.
6. The method for optimizing the whole network multi-service combination based on element reinforcement learning according to claim 1, wherein the method comprises the following steps: the step S5 includes:
s501: order the;
S502: observationThe value of (2) is fed into a route buffer module, and is obtained based on an Actor network in the route buffer module> and />Is a value of (2);
s503: performing a slave at a first route cache moduleTo->Is performed from +.>To->Is a rate conversion of (2);
s504: judging whether or not to meet;
If it isThe circulation is terminated, and multi-service joint optimization of the whole network is completed at the moment; otherwise, let->And returns to step S502.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311252903.4A CN116996921B (en) | 2023-09-27 | 2023-09-27 | Whole-network multi-service joint optimization method based on element reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311252903.4A CN116996921B (en) | 2023-09-27 | 2023-09-27 | Whole-network multi-service joint optimization method based on element reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116996921A true CN116996921A (en) | 2023-11-03 |
CN116996921B CN116996921B (en) | 2024-01-02 |
Family
ID=88525186
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311252903.4A Active CN116996921B (en) | 2023-09-27 | 2023-09-27 | Whole-network multi-service joint optimization method based on element reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116996921B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020249299A1 (en) * | 2019-06-11 | 2020-12-17 | Telefonaktiebolaget Lm Ericsson (Publ) | Methods and apparatus for data traffic routing |
CN113411826A (en) * | 2021-06-17 | 2021-09-17 | 天津大学 | Edge network equipment caching method based on attention mechanism reinforcement learning |
CN113596138A (en) * | 2021-07-26 | 2021-11-02 | 东北大学 | Heterogeneous information center network cache allocation method based on deep reinforcement learning |
CN113676513A (en) * | 2021-07-15 | 2021-11-19 | 东北大学 | Deep reinforcement learning-driven intra-network cache optimization method |
US20230171640A1 (en) * | 2021-11-30 | 2023-06-01 | Samsung Electronics Co., Ltd. | Traffic optimization module and operating method thereof |
CN116321307A (en) * | 2023-03-10 | 2023-06-23 | 北京邮电大学 | Bidirectional cache placement method based on deep reinforcement learning in non-cellular network |
-
2023
- 2023-09-27 CN CN202311252903.4A patent/CN116996921B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020249299A1 (en) * | 2019-06-11 | 2020-12-17 | Telefonaktiebolaget Lm Ericsson (Publ) | Methods and apparatus for data traffic routing |
CN113411826A (en) * | 2021-06-17 | 2021-09-17 | 天津大学 | Edge network equipment caching method based on attention mechanism reinforcement learning |
CN113676513A (en) * | 2021-07-15 | 2021-11-19 | 东北大学 | Deep reinforcement learning-driven intra-network cache optimization method |
CN113596138A (en) * | 2021-07-26 | 2021-11-02 | 东北大学 | Heterogeneous information center network cache allocation method based on deep reinforcement learning |
US20230171640A1 (en) * | 2021-11-30 | 2023-06-01 | Samsung Electronics Co., Ltd. | Traffic optimization module and operating method thereof |
CN116321307A (en) * | 2023-03-10 | 2023-06-23 | 北京邮电大学 | Bidirectional cache placement method based on deep reinforcement learning in non-cellular network |
Also Published As
Publication number | Publication date |
---|---|
CN116996921B (en) | 2024-01-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Guo et al. | An adaptive wireless virtual reality framework in future wireless networks: A distributed learning approach | |
CN110809306B (en) | Terminal access selection method based on deep reinforcement learning | |
CN114338504B (en) | Micro-service deployment and routing method based on network edge system | |
CN112020103B (en) | Content cache deployment method in mobile edge cloud | |
CN113475089B (en) | Method and system for user-oriented content streaming | |
CN112995950B (en) | Resource joint allocation method based on deep reinforcement learning in Internet of vehicles | |
CN114390057B (en) | Multi-interface self-adaptive data unloading method based on reinforcement learning under MEC environment | |
Wang et al. | Multimodal semantic communication accelerated bidirectional caching for 6G MEC | |
CN113098714B (en) | Low-delay network slicing method based on reinforcement learning | |
CN112395090B (en) | Intelligent hybrid optimization method for service placement in mobile edge calculation | |
CN113626104B (en) | Multi-objective optimization unloading strategy based on deep reinforcement learning under edge cloud architecture | |
CN114205791A (en) | Depth Q learning-based social perception D2D collaborative caching method | |
CN114281718A (en) | Industrial Internet edge service cache decision method and system | |
CN116321307A (en) | Bidirectional cache placement method based on deep reinforcement learning in non-cellular network | |
CN113993168B (en) | Collaborative caching method based on multi-agent reinforcement learning in fog wireless access network | |
CN115633380A (en) | Multi-edge service cache scheduling method and system considering dynamic topology | |
CN115314944A (en) | Internet of vehicles cooperative caching method based on mobile vehicle social relation perception | |
Chen et al. | Twin delayed deep deterministic policy gradient-based intelligent computation offloading for IoT | |
CN116996921B (en) | Whole-network multi-service joint optimization method based on element reinforcement learning | |
CN111465057B (en) | Edge caching method and device based on reinforcement learning and electronic equipment | |
CN112911614A (en) | Cooperative coding caching method based on dynamic request D2D network | |
CN116204319A (en) | Yun Bianduan collaborative unloading method and system based on SAC algorithm and task dependency relationship | |
CN115756873A (en) | Mobile edge computing unloading method and platform based on federal reinforcement learning | |
CN113766540B (en) | Low-delay network content transmission method, device, electronic equipment and medium | |
WO2023039905A1 (en) | Ai data transmission method and apparatus, device, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |