CN114021770A - Network resource optimization method and device, electronic equipment and storage medium - Google Patents

Network resource optimization method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114021770A
CN114021770A CN202111089718.9A CN202111089718A CN114021770A CN 114021770 A CN114021770 A CN 114021770A CN 202111089718 A CN202111089718 A CN 202111089718A CN 114021770 A CN114021770 A CN 114021770A
Authority
CN
China
Prior art keywords
gradient
decision tree
model
resource
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111089718.9A
Other languages
Chinese (zh)
Inventor
魏翼飞
公雨
李骏
郭达
张勇
滕颖蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202111089718.9A priority Critical patent/CN114021770A/en
Publication of CN114021770A publication Critical patent/CN114021770A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06313Resource planning in a project environment

Abstract

The application provides a network resource optimization method, a device, electronic equipment and a storage medium, collected communication sample resources, calculation sample resources, cache sample resources and user terminal information are processed through a depth certainty strategy gradient model, input information, proxy action information and reward data information are recorded, then a generated data set is used for training a gradient enhancement decision tree initial model, and a gradient enhancement decision tree model capable of optimizing network resources is obtained, so that the gradient enhancement decision tree model can be used for rapidly processing current environment data information including communication, calculation, cache resources and user terminal information to obtain a resource allocation strategy of maximized total utility. Therefore, the network resources can be distributed according to the resource distribution strategy of the maximized total utility, so that the network resource distribution is more reasonable, and the utilization rate of the network resources is greatly improved.

Description

Network resource optimization method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of network resource allocation technologies, and in particular, to a method and an apparatus for optimizing network resources, an electronic device, and a storage medium.
Background
The network slice refers to flexible allocation of network resources, and a plurality of mutually isolated logic subnets with different characteristics are divided according to requirements. In a core network or a conventional cellular network, the overall system is designed to support many types of services. However, a virtual wireless network consisting of a Mobile Virtual Network Operator (MVNO) is dedicated to one service (e.g., video transcoding and map downloading), which will provide a better user experience. MVNOs are primarily focused on abstracting and virtualizing the physical resources of Infrastructure providers (InP) into multiple network slices to satisfy Quality of Service (QoS) of network Slice Providers (SP).
The effects of MVNO, InP, SP are summarized below:
1) the MVNO leases resources such as physical resources, backhaul bandwidth and the like from the InP, generates virtual resources to different slices according to different user requests, and leases the virtual resources to the SP to execute the operation.
2) InP, which owns the physical network radio resources (e.g., backhaul and spectrum) may operate the physical network infrastructure.
3) The SP leases virtual resources to the user for different services and various QoS requirements.
However, the existing network resource allocation method is not reasonable enough, and based on the rapid development of the network resource, the data volume of the network transmission is greatly increased, so that the overall network resource is easy to have the conditions of slow operation and unsmooth blocking.
Disclosure of Invention
In view of the above, an object of the present application is to provide a method, an apparatus, an electronic device and a storage medium for optimizing network resources, so as to solve or partially solve the above technical problems.
Based on the above purpose, a first aspect of the present application provides a network resource optimization method, including:
collecting communication sample resources, calculation sample resources, cache sample resources and current user terminal information in a network system;
inputting the communication sample resource, the calculation sample resource, the cache sample resource and the user terminal information into a deep deterministic strategy gradient model for processing, and outputting agent action information and reward data information;
training a gradient enhancement decision tree initial model by using the environmental data information, the agent action information and the reward data information as training samples to obtain a gradient enhancement decision tree model capable of optimizing network resources;
inputting the current environment data information, the current agent action information and the current reward data information of the network system into a trained gradient enhanced decision tree model for processing, and outputting a resource allocation strategy for maximizing the total utility of the network system by the gradient enhanced decision tree model.
A second aspect of the present application provides a network resource optimization apparatus, including:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is configured to acquire communication sample resources, calculation sample resources and cache sample resources in a network system;
the deep certainty strategy gradient processing module is configured to input the communication sample resource, the calculation sample resource, the cache sample resource and the user terminal information into a deep certainty strategy gradient model for processing, and output agent action information and reward data information;
the decision tree training module is configured to train a gradient enhancement decision tree initial model by using the environmental data information, the agent action information and the reward data information as training samples to obtain a gradient enhancement decision tree model capable of optimizing network resources;
and the resource allocation processing module is configured to input the current communication resource, the current computing resource, the current cache resource and the current user terminal information of the network system into the gradient enhancement decision tree model for processing, and the gradient enhancement decision tree model outputs a resource allocation strategy for maximizing the total utility of the network system.
A third aspect of the application provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of the first aspect when executing the program.
A fourth aspect of the present application provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of the first aspect.
From the above, according to the network resource optimization method, the network resource optimization device, the electronic device and the storage medium provided by the application, the collected communication sample resources, the calculation sample resources, the cache sample resources and the user terminal information are used for training the depth certainty strategy gradient model, the agent action information and the reward data information output after training are used for training the gradient enhancement decision tree initial model, and then the gradient enhancement decision tree model capable of optimizing the network resources is obtained, so that the current environment data information, the current agent action information and the current reward data information output by the depth certainty strategy gradient model can be rapidly processed by using the gradient enhancement decision tree model to obtain the resource allocation strategy of the maximum total effectiveness. Therefore, the network resources can be distributed according to the resource distribution strategy of the maximized total utility, so that the network resource distribution is more reasonable, and the utilization rate of the network resources is greatly improved.
Drawings
In order to more clearly illustrate the technical solutions in the present application or the related art, the drawings needed to be used in the description of the embodiments or the related art will be briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a network resource optimization method according to an embodiment of the present application;
fig. 2 is a block diagram of a network resource optimization apparatus according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below with reference to the accompanying drawings in combination with specific embodiments.
It should be noted that technical terms or scientific terms used in the embodiments of the present application should have a general meaning as understood by those having ordinary skill in the art to which the present application belongs, unless otherwise defined. The use of "first," "second," and similar terms in the embodiments of the present application do not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
With the continuous expansion of wireless communication networks and the diversification of user application requirements, MVNOs are urgently required to design a system composed of QoS and Quality of Experience (QoE) to provide satisfactory services for users.
Multi-access edge computing (MEC) refers to deploying an edge server with specific computing resources and cache resources in a small cell at the edge of a network, and the technology can make full use of network resources to meet the QoS of users. Thus, when a user requests a resource, the MEC server may perform the corresponding task in a distributed manner, which will save backhaul bandwidth. Edge servers in small Base stations are lightweight and have limited resources compared to Macro Base Stations (MBS). Therefore, there is a strong need to find a feasible resource allocation scheme for the computation and caching tasks requested by the user. Furthermore, the 5G technology guarantees QoE of users and QoS of networks, but finding an optimal scheme for allocating channel resources and bandwidth in a dynamic environment is still a challenge.
Deep Reinforcement Learning (DRL) is a key branch in the field of artificial intelligence, has the capability of identifying dynamic environments, and has a wide application prospect in the aspect of solving the problem of resource allocation. The DRL method can widely solve the problem of complex resource allocation in the network slice time-varying network. Some studies apply DRL methods to manage resources, such as Deep Q Networks (DQN), which is an effective method for jointly scheduling resources for users. DQN is adapted to solve the discrete action space problem. However, the action space in our work is continuous. Therefore, the resource allocation problem is solved by combining the operator-critical framework and the Deep Neural Network (DNN) by adopting a Deep Deterministic Policy Gradient (DDPG) method.
Ensemble learning is a process of combining multiple single models to form a better model, and assists the DRL algorithm by ensemble learning in view of the limitations and high computational cost of DRL. A Gradient Boosting Decision Tree (GBDT) is a branch of ensemble learning, and it is proposed that a solution obtained by deep reinforcement learning can be converted into a GBDT model by a distillation method widely used in the imaging field. Compared with DRL method, it can show the importance of input parameter in GBDT model and calculate output more economically and faster.
Based on the above theoretical basis, an embodiment of the present application provides a network resource optimization method, as shown in fig. 1, the method includes the steps of:
step 101, collecting communication sample resources, calculation sample resources, cache sample resources and user terminal information in a network system.
Step 102, inputting the communication sample resource, the calculation sample resource, the cache sample resource and the user terminal information into a deep deterministic strategy gradient model (i.e. DDPG) for processing, and outputting agent action information and reward data information.
And 103, recording the environment data information, the agent action information and the reward data information to generate a data set.
And 104, training a gradient enhanced decision tree initial model (GBDT initial model) by using the data set to obtain the gradient enhanced decision tree model (GBDT model) capable of optimizing network resources.
And 105, inputting the current communication resource, the current computing resource, the current cache resource and the current user terminal information of the network system into a trained gradient enhanced decision tree model for processing, wherein the gradient enhanced decision tree model outputs a resource allocation strategy for maximizing the total utility of the network system.
In the above scheme, the Deep Deterministic Policy Gradient (DDPG) is a set of algorithm models based on edge computation and caching, taking into account the mobility of the user terminal and the dynamic communication conditions between the MEC server and the user terminal to jointly optimize task scheduling and resource allocation in the continuous action space.
In order to coordinate network functions and dynamically allocate limited resources, an improved Deep Reinforcement Learning (DRL) method is adopted, mobility of a user terminal and dynamic wireless channel conditions are fully considered, and a maximum profit function of a Mobile Virtual Network Operator (MVNO) is obtained. In consideration of the slow convergence rate of the DRL algorithm, the DRL and the integrated learning are combined, and a data set generated by the DDPG algorithm is used for training a gradient enhanced decision tree (GBDT) model. The trained GBDT model can completely imitate the behavior of the DDPG agent, and the output speed of the result is higher and the cost efficiency is higher.
In some embodiments, the network system comprises: user terminals communicatively connected to each other, mobile communication base stations provided with controllers (i.e., macro base station MBS), and small base stations equipped with multi-access edge computing.
Step 101 specifically includes:
step 1011, the mobile communication base station determines the spectrum bandwidth allocated to the small base station according to the obtained association index between each user terminal with the service request and the small base station, the total spectrum bandwidth of the small base station and the sub-channel allocated to the user terminal, and uses the determined spectrum bandwidth allocated to the small base station as the communication sample resource.
The network system consists of an MBS with a controller and several small base stations with MEC servers, wherein
Figure BDA0003262436930000051
On behalf of a set of user terminals,
Figure BDA0003262436930000052
representing a set of small base stations. The services requested by the user terminal can be divided into computation offload and content delivery. The service types of different services can be distinguished, provided that the request packet is marked.
Figure BDA0003262436930000053
Represents a set of user terminals requesting computational offloading,
Figure BDA0003262436930000054
representing a set of user terminals requesting content delivery. If the user terminals can only accept one service request at a time, the number of user terminals requesting a service may be defined as N + M ═ V. In addition to this, the present invention is,
Figure BDA0003262436930000055
representing the set of requesting services SP. All user terminals requesting a service SP can be seen as a set
Figure BDA0003262436930000056
Wherein V ═ U-sVsAnd is
Figure BDA0003262436930000057
The coverage areas of the small base stations are overlapping to ensure that each user terminal having a service request is associated with a small base station.
Figure BDA0003262436930000061
Can be regarded as a task establishment indicator, wherein
Figure BDA0003262436930000062
Indicating that the user terminal v requests the service s to be associated with the small base station u; otherwise
Figure BDA0003262436930000063
In particular, each user terminal can only be associated with one small base station, defined as
Figure BDA0003262436930000064
The total spectrum bandwidth of all small base stations may be defined as B, i.e. B ═ u @uBu。BuRepresenting the spectrum bandwidth allocated to the small base station u. In practice, small base station BuThe bandwidth of Hz can be divided into BuB sub-channels allocated to user terminals vsIs defined as
Figure BDA0003262436930000065
Thus BuCan be expressed as
Figure BDA0003262436930000066
Wherein
Figure BDA0003262436930000067
Is from the small base station u to the user terminal vsThe allocated bandwidth.
Step 1012, the mobile communication base station acquires the computing capability of the small base station allocated to the user terminal as the computing sample resource.
If the small base stations belong to different InP, the licensed spectrum for each InP is orthogonal. Therefore, there is no interference between different small base stations. However, there is interference between user terminals belonging to the same SP and connected to the same small base station. User terminal vsThe average signal to interference and noise ratio (SINR) with the small base station u can be defined as
Figure BDA0003262436930000068
Wherein
Figure BDA0003262436930000069
And
Figure BDA00032624369300000610
respectively representing user terminals
Figure BDA00032624369300000611
And user terminal
Figure BDA00032624369300000612
The transmission power of the transmission,
Figure BDA00032624369300000613
and
Figure BDA00032624369300000614
is the average channel gain, σ2Is Additive White Gaussian Noise (AWGN).
In addition, small base station u and user terminal vsThe data transmission rate therebetween can be calculated by the Shannon theory, i.e.
Figure BDA00032624369300000615
The present application uses a quasi-static assumption that the environmental state remains unchanged during the time slot t. User terminal
Figure BDA00032624369300000616
The requested computing task may be described as
Figure BDA00032624369300000617
Wherein
Figure BDA00032624369300000618
Indicating the input data size (unit, bit),
Figure BDA00032624369300000619
to representUser terminal
Figure BDA00032624369300000620
The computing power of the requested computing task (the total number of CPU cycles of the computing task), and further,
Figure BDA00032624369300000621
is allocated to the user terminal
Figure BDA00032624369300000622
Computing power (CPU per second), computing tasks of small base station u
Figure BDA00032624369300000623
The total execution time at the small base station u is
Figure BDA00032624369300000624
Thus, the user terminal has a calculated rate of
Figure BDA0003262436930000071
Computing tasks
Figure BDA0003262436930000072
The total energy consumption can be expressed as
Figure BDA0003262436930000073
Wherein euRepresenting the energy consumption of the small base station u per CPU cycle.
Furthermore, the computing power of each small base station is limited, i.e.
Figure BDA0003262436930000074
Wherein FuIs the computational power allocated to the small base station u. In practice, the total computation power of all small base stations may be defined as F, i.e. F ═ uuFu
And step 1013, the mobile communication base station uses the obtained buffer space allocated to the small base station as a buffer sample resource.
User terminal
Figure BDA0003262436930000075
The caching task can be described as
Figure BDA0003262436930000076
The storage space of the small base station is limited and only the small base station can be stored
Figure BDA0003262436930000077
A content type. The caching task is implemented in a first-in first-out manner, i.e. when the latest content is determined to be stored, the oldest stored content is deleted. The probability of the user terminal requesting the content F follows Zipf distribution and is modeled as
Figure BDA0003262436930000078
Where the parameter l indicates the popularity of the content, which is always a positive value. In our caching model, if the content caching task of the user terminal is known, the popularity of the content can be calculated directly from the formula.
In addition to this, the present invention is,
Figure BDA0003262436930000079
is the time to download the desired content over the backhaul. Thus, the expected backhaul bandwidth savings achieved by caching the content may be expressed as
Figure BDA00032624369300000710
Wherein
Figure BDA00032624369300000711
It can be calculated directly by the content popularity equation.
A caching strategy is used in the implementation where the prices of different content are known. Furthermore, the buffer space of each small base station is limited, i.e.
Figure BDA00032624369300000712
Wherein C isuIs the buffer space allocated to the small base station u. Practically all small base stationsMay be defined as C, i.e., C ═ u @uCu
Based on the obtained communication sample resources, calculation sample resources and cache sample resources and user terminal information, in order to maximize the total profit of the MVNO, an integrated architecture is constructed, and the MVNO carries out task scheduling and resource allocation to the user terminal
Figure BDA00032624369300000713
The virtual network access charge is charged per bps
Figure BDA00032624369300000714
After paying the MVNO, the user terminal has access to the physical resources and completes the task. On the other hand, MVNOs also pay for InP the spectrum usage cost per Hz
Figure BDA00032624369300000715
If the requesting task user terminal is computational offload, the MVNO may be slave to the user terminal
Figure BDA0003262436930000081
Charging a fee per bps
Figure BDA0003262436930000082
At the same time, the MVNO will pay the calculated energy cost per J for the small cell
Figure BDA0003262436930000083
If the task is content delivery, the MVNO may charge a fee per bps
Figure BDA0003262436930000084
At the same time, the MVNO will pay per byte the expected savings in backhaul bandwidth
Figure BDA0003262436930000085
Thus, the user terminal
Figure BDA0003262436930000086
And a small base stationThe profit function for the transmission between u can be defined as
Figure BDA0003262436930000087
The total profit of an MVNO can be divided into three components, namely communication, calculation and caching of revenue.
Communication yield: the first term of the profit function described above is the communication revenue.
Figure BDA0003262436930000088
Representing user terminals
Figure BDA0003262436930000089
The fees paid to the MVNO for access to the virtual network,
Figure BDA00032624369300000810
representing the bandwidth cost that the MVNO pays for InP.
And (4) calculating the income: the second term of the profit function described above is the calculated profit.
Figure BDA00032624369300000811
Representing user terminals
Figure BDA00032624369300000812
The fee paid for the MVNO to perform the computational task,
Figure BDA00032624369300000813
representing the energy consumption cost paid by the MVNO for InP.
Caching income: the last item of the profit function described above is cache revenue.
Figure BDA00032624369300000814
Representing user terminals
Figure BDA00032624369300000815
The cost paid for the MVNO to perform the caching task,
Figure BDA00032624369300000816
representing MVNOs as cached content
Figure BDA00032624369300000817
Cost towards InP.
The present disclosure optimizes the goal to maximize the total profit, OP, of the MVNO, and thus can yield
Figure BDA00032624369300000818
S.t.:
Figure BDA00032624369300000819
Figure BDA00032624369300000820
Figure BDA00032624369300000821
C1 denotes a user terminal
Figure BDA00032624369300000822
Can only be associated with one small base station u; c2 means that the bandwidth allocated from the small base station u to all the user terminals associated with it cannot exceed the spectrum resources of the small base station u; c3 and C5 ensure the user terminal respectively
Figure BDA0003262436930000091
Rate of communication
Figure BDA0003262436930000092
And calculating the rate
Figure BDA0003262436930000093
Requiring; c4 and C6 show the computing power F of each small cell uuAnd a buffer space CuIs limited.
In some embodiments, step 102 specifically includes:
step 1021, setting a first input parameter and a first output parameter of the depth certainty strategy gradient model, wherein the first input parameter at least comprises: the communication sample resource, the calculation sample resource, the buffer sample resource and the ue information, the first output parameter at least includes: agent action information and reward data information.
And 1022, inputting the obtained communication sample resource, the calculation sample resource, the cache sample resource and the user terminal information into an evolution network, performing cyclic execution according to time, continuously calculating a corresponding first loss function in the execution process, and adjusting parameters of a depth deterministic policy model according to the first loss function.
Wherein the depth-deterministic policy model comprises: an evolution network and an evaluation network.
Initializing parameters of an evolution network and an evaluation network in advance; performing cycle execution in the evolution network, continuously calculating a first loss function by using an evaluation network in the execution process, performing minimization processing on the first loss function, and adjusting parameters of the evaluation network according to the minimized loss function; adjusting parameters of the evolution network according to the sampled strategy gradient; and adjusting parameters of the evolution target network and the evaluation target network.
And 1023, acquiring specific data of the first output parameter output by the depth certainty strategy model after all processing is finished.
A controller deployed at a mobile communications base station can interact with the environment (i.e. collect all information on the state of the system) and obtain rewards (i.e. make decisions on all requests) after performing actions, with the goal of maximizing the long-term cumulative return. The process of exploring the optimal strategy by the controller is as follows: observing status information s in time slot ttE.g., S, and then selects action a according to policy pi (a | S) (representing the probability of selecting an action in this state)tE is A; taking action atLater intelligenceThe energy bank immediately receives the instant reward. In general, the goal of MDP is to explore a strategy pi (a | s) to maximize the cost function, usually expressed in terms of expected impressions accumulated returns calculated by the Bellman equation.
Three key elements in reinforcement learning are introduced below: state space, action space, and rewards.
State space: the state space contains two components, namely the available resources of the small base station U (U e U) equipped with the MEC server and the state V (V e V) of the user. The state space at time slot t can be denoted as st={Fu,Bu,Cu,Ωv}。Fu、BuAnd CuRepresenting the available computation, bandwidth and buffering resources for each small cell U equipped with the MEC server (U e U). In addition, the state Ω of the uservIncluding average SINR between the user and the small cell, input data size (bits) for the computation task, computation power (total number of CPU cycles used to complete the task), cache capacity, content popularity, and user location, etc.
An action space: the action space is used for small base station selection and resource allocation, and the aim is to complete calculation unloading or content delivery tasks. In time slot t, the motion space can be represented as
Figure BDA0003262436930000101
Figure BDA0003262436930000102
Figure BDA0003262436930000103
And
Figure BDA0003262436930000104
respectively indicating the allocation of small base stations equipped with MEC servers to users
Figure BDA0003262436930000105
Bandwidth, computational resources and cache resources.
Figure BDA0003262436930000106
Representing whether or not to perform task setup.
Rewarding: taking action atThereafter, the agent will receive the reward Rt. In particular, the reward should correspond to the above-described optimization objective function. Thus, a reward may be defined as
Figure BDA0003262436930000107
Training samples were created using the DDPG method: the GBDT model is trained very fast, but it cannot be learned directly from the environment. The DDPG method can solve the problem that the optimization device of the application obtains the maximum return or realizes a specific target by learning an optimal strategy in the process of continuously interacting with the environment. However, GBDT, a model for supervised learning, requires the correct label from the environment. Thus, in our model, a training sample is first created by the DDPG, and then a training sample containing environmental information and output reward information is created.
In some embodiments, step 104 specifically includes:
step 1041, setting a second input parameter and a second output parameter of the initial model of the gradient enhanced decision tree, wherein the second input parameter includes: environmental data information, agent action information, and reward data information, the second output parameter comprising: a resource allocation policy of a network system that maximizes overall utility.
And 1042, setting the initial value of the iteration count m to be 0, and initializing an additional predictor in the initial model of the gradient enhancement decision tree.
Step 1043, inputting a first predetermined amount of environment data information, agent action information and reward data information output by the depth certainty strategy gradient model into the gradient enhancement decision tree initial model as a training sample for training, adding 1 to corresponding m every time of training, stopping training until the value of m reaches a predetermined threshold value, and taking the trained gradient enhancement decision tree initial model as the gradient enhancement decision tree model.
Step 1043 specifically includes:
a group of base learners in the initial model of the gradient enhancement decision tree is designated as a target base learner group;
inputting environment data information, agent action information and reward data information into a gradient enhancement decision tree initial model for training, and calculating a second loss function after training, wherein each training time, corresponding m is added by 1;
calculating a first negative gradient vector of the second loss function;
respectively fitting a second negative gradient vector to each base learner in the target base learner group;
determining a component most suitable for the negative gradient vector according to the second gradient vector and the determined target base learning group;
updating parameters of an additional predictor according to the component of the most suitable negative gradient vector;
and determining that m is equal to a set threshold value, and taking the final gradient enhancement decision tree initial model as a gradient enhancement decision tree model.
GBDT based decision tree is an iterative decision tree algorithm. The extensible end-to-end tree lifting system is called XGboost and is an improved GBDT algorithm. In particular, the GBDT uses only the information of the first derivative in the optimization, while the XGboost algorithm uses the first and second derivatives to perform the second order Taylor expansion on the cost function. In addition, the complexity of the model can be controlled by adding a regular term containing the number of nodes of each leaf and the score function to the cost function. In the overall framework, the improved GBDT algorithm is applied to the regression task. A data set comprising n samples is given. The data set may be represented as D ═ xi,yi)(|D|=n,xi∈F∪B∪C∪Ω,yiE.g. R) in which yiExpressed as a solution according to a reward function, xiState space represented as our system model
Figure BDA0003262436930000111
From the above, a state space composed of a large amount of dynamic environment information and an action space containing a large amount of continuous values can be obtained. The DDPG algorithm is employed to maximize the reward function, and the DDPG method uses a neural network to evaluate and select actions, which is more complex and more difficult to obtain as compared to a tree model. Therefore, the DDPG algorithm is combined with the GBDT model, so that the convergence speed can be increased, and accurate estimation can be realized.
Training samples were created using DDPG, with environmental state parameters as input and rewards as output in the GBDT model. Thus, with constant training, the GBDT model learns to get the maximum reward for given environmental information, with the goal of achieving the same level of accuracy as a DRL agent. In some embodiments, the method further comprises:
and step A, testing the gradient enhanced decision tree model by taking the second preset amount of environment data information, agent action information and reward data information output by the depth certainty strategy gradient model as a test sample.
And step B, determining the accuracy of the gradient enhancement decision tree model according to the test result.
And step C, when the accuracy is determined to be greater than or equal to a preset accuracy threshold, the obtained gradient enhanced decision tree model is used as a final gradient enhanced decision tree model.
And step D, in response to the fact that the accuracy is smaller than the preset accuracy threshold, the obtained gradient enhancement decision tree model is retrained again by using the test sample until the obtained accuracy is smaller than the preset accuracy threshold, and the retrained gradient enhancement decision tree model is used as a final gradient enhancement decision tree model.
Through the steps, the accuracy of the obtained gradient enhancement decision tree model can be tested, so that the accuracy of the finally obtained gradient enhancement decision tree model can meet the actual requirement, and the precision of the gradient enhancement decision tree model is improved.
It should be noted that the method of the embodiment of the present application may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the multiple devices may only perform one or more steps of the method of the embodiment, and the multiple devices interact with each other to complete the method.
It should be noted that the above describes some embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Based on the same inventive concept, corresponding to the method of any embodiment, the application also provides a network resource optimization device.
Referring to fig. 2, the network resource optimization apparatus includes:
an acquisition module 21 configured to acquire communication sample resources, calculation sample resources, and cache sample resources in the network system;
a deep deterministic policy gradient processing module 22 configured to input the communication sample resource, the calculation sample resource, the cache sample resource and the user terminal information into a deep deterministic policy gradient model for processing, and output agent action information and reward data information;
a decision tree training module 23 configured to train a gradient enhancement decision tree initial model by using the environmental data information, the agent action information, and the reward data information as training samples, so as to obtain a gradient enhancement decision tree model capable of optimizing network resources;
and the resource allocation processing module 24 is configured to input the current communication resource, the current computing resource, the current cache resource and the current user terminal information of the network system into a gradient enhanced decision tree model for processing, wherein the gradient enhanced decision tree model outputs a resource allocation strategy for maximizing the total utility of the network system.
In some embodiments, the network system comprises: the system comprises user terminals which are in communication connection with each other, a mobile communication base station provided with a controller and a small base station equipped with multi-access edge calculation;
the acquisition module 21 is configured to:
the mobile communication base station determines the spectrum bandwidth allocated to the small base station according to the obtained association index between each user terminal with the service request and the small base station, the total spectrum bandwidth of the small base station and the sub-channel allocated to the user terminal, and takes the determined spectrum bandwidth allocated to the small base station as a communication sample resource; the mobile communication base station acquires the computing capacity of the small base station distributed to the user terminal as computing sample resources; and the mobile communication base station takes the obtained cache space distributed to the small base station as a cache sample resource.
In some embodiments, the depth deterministic policy gradient processing module 22 is configured to:
setting first input parameters and first output parameters of the depth deterministic strategy gradient model, wherein the first input parameters comprise at least: the communication sample resource, the calculation sample resource, the buffer sample resource and the ue information, the first output parameter at least includes: agent action information and reward data information; inputting the obtained communication sample resource, the calculation sample resource, the cache sample resource and the user terminal information into an evolution network, performing cyclic execution according to time, continuously calculating a corresponding first loss function in the execution process, and adjusting parameters of a depth certainty strategy model according to the first loss function; and after all the processing is finished, acquiring specific data of a first output parameter output by the depth certainty strategy model.
In some embodiments, the depth deterministic policy model comprises: an evolution network and an evaluation network;
the depth deterministic policy gradient processing module 22 is further configured to:
initializing parameters of an evolution network and an evaluation network in advance; performing cycle execution in the evolution network, continuously calculating a first loss function by using an evaluation network in the execution process, performing minimization processing on the first loss function, and adjusting parameters of the evaluation network according to the minimized loss function; adjusting parameters of the evolution network according to the sampled strategy gradient; and adjusting parameters of the evolution target network and the evaluation target network.
In some embodiments, the decision tree training module 23 is configured to:
setting a second input parameter and a second output parameter of the initial model of the gradient enhancement decision tree, wherein the second input parameter comprises: environmental data information, agent action information, and reward data information, the second output parameter comprising: a resource allocation policy of the network system that maximizes total utility; setting the initial value of the iteration count m to be 0, and initializing an additional predictor in the initial model of the gradient enhancement decision tree; and inputting a first preset amount of environmental data information, agent action information and reward data information output by the depth certainty strategy gradient model into the gradient enhancement decision tree initial model as training samples for training, wherein each training time, the corresponding m is added by 1, and the training is stopped until the value of m reaches a preset threshold value, and the trained gradient enhancement decision tree initial model is used as the gradient enhancement decision tree model.
In some embodiments, the decision tree training module 23 is further configured to:
a group of base learners in the initial model of the gradient enhancement decision tree is designated as a target base learner group; inputting environment data information, agent action information and reward data information into a gradient enhancement decision tree initial model for training, and calculating a second loss function after training, wherein each training time, corresponding m is added by 1; calculating a first negative gradient vector of the second loss function; respectively fitting a second negative gradient vector to each base learner in the target base learner group; determining a component most suitable for the negative gradient vector according to the second gradient vector and the determined target base learning group; updating parameters of an additional predictor according to the component of the most suitable negative gradient vector; and determining that m is equal to a set threshold value, and taking the final gradient enhancement decision tree initial model as a gradient enhancement decision tree model.
In some embodiments, the apparatus further comprises a test module configured to:
testing the gradient enhancement decision tree model by taking the second preset amount of environment data information, agent action information and reward data information output by the depth certainty strategy gradient model as a test sample; determining the accuracy of the gradient enhancement decision tree model according to the test result; when the accuracy is determined to be greater than or equal to a preset accuracy threshold, taking the obtained gradient enhancement decision tree model as a final gradient enhancement decision tree model; and in response to the fact that the accuracy is smaller than the preset accuracy threshold, the obtained gradient enhancement decision tree model is retrained again by using the test sample until the obtained accuracy is smaller than the preset accuracy threshold, and the retrained gradient enhancement decision tree model is used as a final gradient enhancement decision tree model.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the various modules may be implemented in the same one or more software and/or hardware implementations as the present application.
The apparatus in the foregoing embodiment is used to implement the corresponding network resource optimization method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to the method of any embodiment described above, the present application further provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the program, the network resource optimization method described in any embodiment above is implemented.
Fig. 3 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
The electronic device of the foregoing embodiment is used to implement the corresponding network resource optimization method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the above-mentioned embodiment methods, the present application further provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the network resource optimization method according to any of the above embodiments.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
The computer instructions stored in the storage medium of the foregoing embodiment are used to enable the computer to execute the network resource optimization method according to any of the foregoing embodiments, and have the beneficial effects of corresponding method embodiments, which are not described herein again.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the context of the present application, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present application as described above, which are not provided in detail for the sake of brevity.
In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the application. Furthermore, devices may be shown in block diagram form in order to avoid obscuring embodiments of the application, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the application are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that the embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
The present embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present application are intended to be included within the scope of the present application.

Claims (10)

1. A method for optimizing network resources, comprising:
collecting communication sample resources, calculation sample resources, cache sample resources and user terminal information in a network system;
inputting the communication sample resource, the calculation sample resource, the cache sample resource and the user terminal information into a depth certainty strategy gradient model for processing, and outputting agent action information and reward data information;
recording the environment data information, the agent action information and the reward data information to generate a data set;
training a gradient enhancement decision tree initial model by using the data set to obtain a gradient enhancement decision tree model capable of optimizing network resources;
and inputting the current communication resource, the current computing resource, the current cache resource and the current user terminal information of the network system into a gradient enhanced decision tree model for processing, wherein the gradient enhanced decision tree model outputs a resource allocation strategy for maximizing the total utility of the network system.
2. The method of claim 1, wherein the network system comprises: the system comprises user terminals which are in communication connection with each other, a mobile communication base station provided with a controller and a small base station equipped with multi-access edge calculation;
the acquiring of communication sample resources, calculation sample resources and cache sample resources in a network system specifically includes:
the mobile communication base station determines the spectrum bandwidth allocated to the small base station according to the obtained association index between each user terminal with the service request and the small base station, the total spectrum bandwidth of the small base station and the sub-channel allocated to the user terminal, and takes the determined spectrum bandwidth allocated to the small base station as a communication sample resource;
the mobile communication base station acquires the computing capacity of the small base station distributed to the user terminal as computing sample resources;
and the mobile communication base station takes the obtained cache space distributed to the small base station as a cache sample resource.
3. The method according to claim 1, wherein the inputting the communication sample resource, the calculation sample resource, the cache sample resource, and the user terminal information into a deep deterministic policy gradient model for processing, and outputting agent action information and reward data information specifically comprises:
setting first input parameters and first output parameters of the depth deterministic strategy gradient model, wherein the first input parameters comprise at least: the communication sample resource, the calculation sample resource, the buffer sample resource and the ue information, the first output parameter at least includes: agent action information and reward data information;
inputting the obtained communication sample resource, the calculation sample resource, the cache sample resource and the user terminal information into an evolution network, performing cyclic execution according to time, continuously calculating a corresponding first loss function in the execution process, and adjusting parameters of a depth certainty strategy model according to the first loss function;
and after all the processing is finished, acquiring specific data of a first output parameter output by the depth certainty strategy model.
4. The method of claim 3, wherein the depth-deterministic policy model comprises: an evolution network and an evaluation network;
the inputting the obtained communication sample resource, the calculation sample resource, the cache sample resource, and the user terminal information into an evolution network, and performing loop execution according to time, continuously calculating a corresponding first loss function in the execution process, and adjusting parameters of a depth certainty policy model according to the first loss function specifically includes:
initializing parameters of an evolution network and an evaluation network in advance;
performing loop execution in the evolution network, wherein in the process of executing,
continuously calculating a first loss function by using an evaluation network, minimizing the first loss function, and adjusting parameters of the evaluation network according to the minimized loss function;
adjusting parameters of the evolution network according to the sampled strategy gradient;
and adjusting parameters of the evolution target network and the evaluation target network.
5. The method according to claim 1, wherein the training of the initial gradient decision tree model by using the environmental data information, the agent action information, and the reward data information as training samples to obtain a gradient decision tree model capable of optimizing network resources specifically comprises:
setting a second input parameter and a second output parameter of the initial model of the gradient enhancement decision tree, wherein the second input parameter comprises: environmental data information, agent action information, and reward data information, the second output parameter comprising: a resource allocation policy of the network system that maximizes total utility;
setting the initial value of the iteration count m to be 0, and initializing an additional predictor in the initial model of the gradient enhancement decision tree;
and inputting a first preset amount of environmental data information, agent action information and reward data information output by the depth certainty strategy gradient model into the gradient enhancement decision tree initial model as training samples for training, wherein each training time, the corresponding m is added by 1, and the training is stopped until the value of m reaches a preset threshold value, and the trained gradient enhancement decision tree initial model is used as the gradient enhancement decision tree model.
6. The method according to claim 5, wherein the first predetermined amount of environment data information, agent action information, and reward data information output by the deep deterministic policy gradient model are input into a gradient enhanced decision tree initial model for training, each training time, the corresponding m is added by 1 until the value of m reaches a predetermined threshold, the training is stopped, and the trained gradient enhanced decision tree initial model is used as the gradient enhanced decision tree model, which specifically includes:
a group of base learners in the initial model of the gradient enhancement decision tree is designated as a target base learner group;
inputting environment data information, agent action information and reward data information into a gradient enhancement decision tree initial model for training, and calculating a second loss function after training, wherein each training time, corresponding m is added by 1;
calculating a first negative gradient vector of the second loss function;
respectively fitting a second negative gradient vector to each base learner in the target base learner group;
determining a component most suitable for the negative gradient vector according to the second gradient vector and the determined target base learning group;
updating parameters of an additional predictor according to the component of the most suitable negative gradient vector;
and determining that m is equal to a set threshold value, and taking the final gradient enhancement decision tree initial model as a gradient enhancement decision tree model.
7. The method of claim 1, further comprising:
testing the gradient enhancement decision tree model by taking the second preset amount of environment data information, agent action information and reward data information output by the depth certainty strategy gradient model as a test sample;
determining the accuracy of the gradient enhancement decision tree model according to the test result;
when the accuracy is determined to be greater than or equal to a preset accuracy threshold, taking the obtained gradient enhancement decision tree model as a final gradient enhancement decision tree model;
and in response to the fact that the accuracy is smaller than the preset accuracy threshold, the obtained gradient enhancement decision tree model is retrained again by using the test sample until the obtained accuracy is smaller than the preset accuracy threshold, and the retrained gradient enhancement decision tree model is used as a final gradient enhancement decision tree model.
8. A network resource optimization apparatus, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is configured to acquire communication sample resources, calculation sample resources and cache sample resources in a network system;
the deep certainty strategy gradient processing module is configured to input the communication sample resource, the calculation sample resource, the cache sample resource and the user terminal information into a deep certainty strategy gradient model for processing, and output agent action information and reward data information;
the decision tree training module is configured to train a gradient enhancement decision tree initial model by using the environmental data information, the agent action information and the reward data information as training samples to obtain a gradient enhancement decision tree model capable of optimizing network resources;
and the resource allocation processing module is configured to input the current communication resource, the current computing resource, the current cache resource and the current user terminal information of the network system into the gradient enhancement decision tree model for processing, and the gradient enhancement decision tree model outputs a resource allocation strategy for maximizing the total utility of the network system.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 7 when executing the program.
10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 7.
CN202111089718.9A 2021-09-14 2021-09-14 Network resource optimization method and device, electronic equipment and storage medium Pending CN114021770A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111089718.9A CN114021770A (en) 2021-09-14 2021-09-14 Network resource optimization method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111089718.9A CN114021770A (en) 2021-09-14 2021-09-14 Network resource optimization method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114021770A true CN114021770A (en) 2022-02-08

Family

ID=80054689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111089718.9A Pending CN114021770A (en) 2021-09-14 2021-09-14 Network resource optimization method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114021770A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114339892A (en) * 2022-03-17 2022-04-12 山东科技大学 DQN and joint bidding based two-layer slice resource allocation method
CN114760639A (en) * 2022-03-30 2022-07-15 深圳市联洲国际技术有限公司 Resource unit allocation method, device, equipment and storage medium
CN115017435A (en) * 2022-06-28 2022-09-06 中国电信股份有限公司 Method and device for determining cache resources, nonvolatile storage medium and processor
CN115412401A (en) * 2022-08-26 2022-11-29 京东科技信息技术有限公司 Method and device for training virtual network embedding model and virtual network embedding
CN115421930A (en) * 2022-11-07 2022-12-02 山东海量信息技术研究院 Task processing method, system, device, equipment and computer readable storage medium
CN115460567A (en) * 2022-11-09 2022-12-09 清华大学 Data processing method, data processing device, computer equipment and storage medium
CN116738239A (en) * 2023-08-11 2023-09-12 浙江菜鸟供应链管理有限公司 Model training method, resource scheduling method, device, system, equipment and medium
CN116755862A (en) * 2023-08-11 2023-09-15 之江实验室 Training method, device, medium and equipment for operator optimized scheduling model

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114339892A (en) * 2022-03-17 2022-04-12 山东科技大学 DQN and joint bidding based two-layer slice resource allocation method
CN114760639A (en) * 2022-03-30 2022-07-15 深圳市联洲国际技术有限公司 Resource unit allocation method, device, equipment and storage medium
CN115017435A (en) * 2022-06-28 2022-09-06 中国电信股份有限公司 Method and device for determining cache resources, nonvolatile storage medium and processor
CN115412401A (en) * 2022-08-26 2022-11-29 京东科技信息技术有限公司 Method and device for training virtual network embedding model and virtual network embedding
CN115412401B (en) * 2022-08-26 2024-04-19 京东科技信息技术有限公司 Method and device for training virtual network embedding model and virtual network embedding
CN115421930B (en) * 2022-11-07 2023-03-24 山东海量信息技术研究院 Task processing method, system, device, equipment and computer readable storage medium
CN115421930A (en) * 2022-11-07 2022-12-02 山东海量信息技术研究院 Task processing method, system, device, equipment and computer readable storage medium
CN115460567B (en) * 2022-11-09 2023-03-24 清华大学 Data processing method, data processing device, computer equipment and storage medium
CN115460567A (en) * 2022-11-09 2022-12-09 清华大学 Data processing method, data processing device, computer equipment and storage medium
CN116738239A (en) * 2023-08-11 2023-09-12 浙江菜鸟供应链管理有限公司 Model training method, resource scheduling method, device, system, equipment and medium
CN116755862A (en) * 2023-08-11 2023-09-15 之江实验室 Training method, device, medium and equipment for operator optimized scheduling model
CN116738239B (en) * 2023-08-11 2023-11-24 浙江菜鸟供应链管理有限公司 Model training method, resource scheduling method, device, system, equipment and medium
CN116755862B (en) * 2023-08-11 2023-12-19 之江实验室 Training method, device, medium and equipment for operator optimized scheduling model

Similar Documents

Publication Publication Date Title
CN114021770A (en) Network resource optimization method and device, electronic equipment and storage medium
CN111031102B (en) Multi-user, multi-task mobile edge computing system cacheable task migration method
JP5664882B2 (en) User scheduling and transmission power control method and apparatus in communication system
CN112291793B (en) Resource allocation method and device of network access equipment
CN108601074B (en) Network resource allocation method and device based on heterogeneous joint cache
CN112422644B (en) Method and system for unloading computing tasks, electronic device and storage medium
CN112615731B (en) Method and device for distributing multi-operator combined network slice resources
Li et al. Method of resource estimation based on QoS in edge computing
CN111629390B (en) Network slice arranging method and device
KR101924628B1 (en) Apparatus and Method for controlling traffic offloading
CN113747450B (en) Service deployment method and device in mobile network and electronic equipment
CN114090108A (en) Computing task execution method and device, electronic equipment and storage medium
US20230156520A1 (en) Coordinated load balancing in mobile edge computing network
CN113271221B (en) Network capacity opening method and system and electronic equipment
CN109412971B (en) Data distribution method based on action value function learning and electronic equipment
CN115484304B (en) Lightweight learning-based live service migration method
CN117580132B (en) Heterogeneous network access method, device and equipment for mobile equipment based on reinforcement learning
CN115665867B (en) Spectrum management method and system for Internet of Vehicles
CN113141634B (en) VR content caching method based on mobile edge computing network
CN116739440B (en) Method and device for evaluating intelligent network, electronic equipment and storage medium
Li et al. A Task Offloading Decision and Resource Allocation Algorithm Based on DDPG in Mobile Edge Computing
CN116528004A (en) Video pushing method, device, equipment and storage medium
CN117311991A (en) Model training method, task allocation method, device, equipment, medium and system
CN117715126A (en) Network slice switching method and device, storage medium and electronic equipment
CN117651344A (en) Network resource sharing method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination