CN112291793B - Resource allocation method and device of network access equipment - Google Patents
Resource allocation method and device of network access equipment Download PDFInfo
- Publication number
- CN112291793B CN112291793B CN202011584793.8A CN202011584793A CN112291793B CN 112291793 B CN112291793 B CN 112291793B CN 202011584793 A CN202011584793 A CN 202011584793A CN 112291793 B CN112291793 B CN 112291793B
- Authority
- CN
- China
- Prior art keywords
- resource allocation
- resource
- service
- allocation model
- power
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013468 resource allocation Methods 0.000 title claims abstract description 173
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000009471 action Effects 0.000 claims abstract description 79
- 238000013528 artificial neural network Methods 0.000 claims abstract description 12
- 238000004364 calculation method Methods 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims description 37
- 238000012549 training Methods 0.000 claims description 35
- 238000012545 processing Methods 0.000 claims description 12
- 230000001186 cumulative effect Effects 0.000 claims description 5
- 230000002787 reinforcement Effects 0.000 claims description 5
- 238000000342 Monte Carlo simulation Methods 0.000 claims description 2
- 238000004590 computer program Methods 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims 1
- 238000004891 communication Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 5
- 238000007726 management method Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000012804 iterative process Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 241000287196 Asthenes Species 0.000 description 1
- 241000533867 Fordia Species 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 229920003087 methylethyl cellulose Polymers 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 239000002904 solvent Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/02—Resource partitioning among network components, e.g. reuse partitioning
- H04W16/10—Dynamic resource partitioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/22—Traffic simulation tools or models
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a resource allocation method and a device of network access equipment, wherein the method comprises the following steps: acquiring resource demand information of a current service of network access equipment; determining the current power of the network access equipment and the state information of the cache cluster resources; inputting the resource demand information and the state information into a trained resource allocation model; the resource allocation model is a deep neural network which outputs probability distribution of all possible actions according to input resource demand information and state information; wherein each action corresponds to a power or cache resource allocation result; according to the probability distribution of all possible actions output by the resource allocation model, taking the action with the maximum probability as the final allocation result of power or cache resources; and distributing power and/or cache resources for the service according to the distribution result. The invention can reduce the resource scheduling calculation burden of the network equipment and realize the maximization of the resource interest rate.
Description
Technical Field
The present invention relates to the field of communications network technologies, and in particular, to a method and an apparatus for resource allocation of a network access device.
Background
With The wide application of more and more intelligent devices and Internet of Things technologies (The Internet of Things, iot), intelligent terminals have become an indispensable part in modern life, such as notebook computers, smart phones and The like, and people begin to develop novel services such as high-definition live video and augmented reality on The intelligent terminal devices to facilitate our lives. These provide a more comfortable content experience for the user while also leading to unprecedented growth in network traffic. Due to the limitation of computing power and battery capacity, the terminal equipment cannot efficiently meet the basic requirements of low time delay and high computation of a large number of novel services. Offloading the compute-intensive tasks to the cloud increases transmission delays and additional network load. Therefore, a Mobile Edge Computing (MEC) technology is proposed, which migrates the computing and storage capacity of the cloud to the edge of the network, and performs task computing through the edge, thereby reducing the energy consumption and execution delay of the terminal equipment and improving the service quality.
Due to the intensive deployment of Mobile Edge Computing (MEC) units and the popularity of bandwidth-intensive applications, maximizing the throughput of access to the network becomes an urgent challenge. With the high capacity and low latency of optical networks, the dense deployment of optical networks, Optical Network Units (ONUs), in which power and cache resource management are two major factors affecting throughput, have become the main access points of MECs, and it requires adaptive and efficient online decision making for power and cache resource management. Resource management issues, often manifested as online decision-making issues, persist in computer systems and networks, such as network congestion control, job scheduling for computing clusters, and so forth. Aiming at the problem that data in the network explodes rapidly with the increase of the number of network connection devices, the end point of data processing cannot be located on a cloud platform of a core network completely, almost half of the data needs to be analyzed and processed at the edge of the network, so that the high-efficiency resource allocation is performed, the resource utilization rate is improved, the throughput in the MEC network is improved, and the resource management of self-learning combined allocation of power and cache resources in the network is realized.
The existing resource allocation method has the defects of low resource utilization rate, too complex resource optimization method and the like in the network resource management aspect of mobile edge calculation.
Disclosure of Invention
In view of the above, the present invention is directed to a method and an apparatus for resource allocation of a network access device, which have a global view of adaptive learning, can reduce the resource scheduling computation burden of the network device, and maximize the resource utilization rate.
Based on the above object, the present invention provides a resource allocation method for a network access device, including:
acquiring resource demand information of the current service of the network access equipment;
determining the current power of the network access equipment and the state information of the cache cluster resources;
inputting the resource demand information and the state information into a trained resource allocation model; the resource allocation model is a deep neural network which outputs probability distribution of all possible actions according to input resource demand information and state information; wherein each action corresponds to a power or cache resource allocation result;
according to the probability distribution of all possible actions output by the resource allocation model, taking the action with the maximum probability as the final allocation result of power or cache resources; and
and distributing power and/or cache resources for the service according to the distribution result.
The resource allocation model is obtained by pre-training according to training data comprising previously acquired resource demand information of the service and execution delay of the service; the execution time delay of the service is determined after the service is executed according to the resource allocation result after the resource allocation result output by the resource allocation model according to the training data; the specific training method comprises the following steps:
using the previously collected resource demand information, power and state information of the cache cluster resources as training data;
inputting the training data to the resource allocation model;
according to the probability distribution of all possible actions output by the resource allocation model, taking the action with the maximum probability as the final allocation result of power or cache resources;
distributing power and/or cache resources for the service according to the distribution result;
after executing the service, determining the execution time delay of the service;
calculating a reward value according to the execution delay of the service;
and adjusting the parameters of the resource allocation model according to the reward value.
Preferably, the execution delay of the service is inversely related to the reward value; in particular, the amount of the solvent to be used,
the reward value is calculated according to the following method:tprize value for time of day(ii) a Wherein,Tindicating that the current service is executed from the beginning to the endtThe duration of the time, i.e. the execution delay of the current service.
Preferably, the adjusting the parameter of the resource allocation model according to the reward value specifically includes:
updating parameters of the resource allocation model according to the following formula 1:
Wherein,to representtTime of day the resource allocation modelIs determined by the parameters of (a) and (b),indicating updatedtResource allocation model at +1 momentIs determined by the parameters of (a) and (b),is a coefficient constant with a value of 0-1;
is composed oftObjective function value at time:whereinis at presenttThe value of the prize at the time of day,γa reward attenuation factor;
Wherein,to representtResource demand information of a service unit of a current service, and status information of power and cache cluster resources, which are input into the resource allocation model at a time,representing the resource allocation model fortInput of time of dayAn action of outputting the highest probability;representing a cost function calculation based on information of the business units input into the resource allocation model;
representing the resource allocation model fort-input of time 1An action of outputting the highest probability;express according tot-1, the largest value among the values of the cost function calculated separately for the information of each business unit input into the resource allocation model;
wherein,taccumulated prize value for a timeAccording totPrize value for time of dayCalculated by the following formula 2:
Wherein,indicating that the action has been performedAfter, follow upnDue to action at one momentThe result of the reward is that,IErepresenting a cumulative discount reward for the monte carlo method in reinforcement learning.
The invention also provides a resource allocation device of the network access equipment, which comprises:
a demand information acquisition module, configured to acquire resource demand information of a current service of the network access device;
a status information determining module, configured to determine status information of the current power and/or cache cluster resource of the network access device;
an action determining module for inputting the resource demand information and the state information into a trained resource allocation model; the resource allocation model is a deep neural network which outputs probability distribution of all possible actions according to input resource demand information and state information; wherein each action corresponds to a power or cache resource allocation result;
the allocation result determining module is used for taking the action with the maximum probability as the final allocation result of the power or the cache resource according to the probability distribution of all possible actions output by the resource allocation model; and
and the resource allocation module is used for allocating power and/or cache resources to the service according to the allocation result.
The present invention also provides an electronic device comprising a central processing unit, a signal processing and storage unit, and a computer program stored on the signal processing and storage unit and operable on the central processing unit, wherein the central processing unit performs the resource allocation method of the network access device as described above.
The present invention also provides a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions for causing the computer to execute the resource allocation method of a network access device as described above.
In the technical scheme of the invention, the resource demand information of the current service of the network access equipment is obtained; determining the current power of the network access equipment and the state information of the cache cluster resources; inputting the resource demand information and the state information into a trained resource allocation model; the resource allocation model is a deep neural network which outputs probability distribution of all possible actions according to input resource demand information and state information; wherein each action corresponds to a power or cache resource allocation result; according to the probability distribution of all possible actions output by the resource allocation model, taking the action with the maximum probability as the final allocation result of power or cache resources; and allocating power and/or cache resources to the service according to the allocation result. By the resource allocation model which can independently learn how to perform efficient resource allocation, the resource demand information of the input service, the state information of the power and the cache cluster resources are automatically analyzed, the resource allocation is automatically performed, and the resource scheduling calculation burden of the network equipment can be reduced; and because the model is obtained by the execution delay training of the fed back service, the minimum service average delay can be obtained, thereby realizing the maximization of the resource interest rate.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a resource allocation method of a network access device according to an embodiment of the present invention;
FIG. 2 is a flowchart of a resource allocation model training method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of resource allocation model training according to an embodiment of the present invention;
fig. 4 is a block diagram of an internal structure of a resource allocation apparatus of a network access device according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention;
FIG. 6 is a diagram of service and environment resource status information provided by an embodiment of the present invention;
fig. 7 is a schematic diagram of a deep neural network structure of a resource allocation model according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.
It is to be noted that technical terms or scientific terms used in the embodiments of the present invention should have the ordinary meanings as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
The technical scheme of the invention provides a PaC (power and cache) resource allocation model based on deep reinforcement learning, which automatically analyzes and allocates an input data set without human intervention, creates a system capable of autonomously learning how to efficiently allocate resources, reduces the resource scheduling calculation burden of network equipment and realizes the maximization of resource interest rate.
The technical solution of the embodiments of the present invention is described in detail below with reference to the accompanying drawings.
The resource allocation method of the network access device provided by the embodiment of the invention has the flow as shown in fig. 1, and comprises the following steps:
step S101: the network access equipment inputs the resource demand information of the current service, the current power and the state information of the cache cluster resource into the resource allocation model.
In this step, a network access device (e.g., an ONU in an MEC) acquires resource demand information of a current service, determines current power and state information of a cache cluster resource, and inputs the resource demand information and the state information into a trained resource allocation model.
Specifically, the network access device may receive request information of multiple services of one or more users, and the network access device correctly decodes the received request information of multiple services according to the grasped mobile terminal distance, moving speed, channel condition, and the like of the user; assuming that service information data of a request received by a network access device is PaC (Power and Cache) data, the resource requirement of each service is known when the service arrives, and each service has a known resource requirementsIs configured by a resource demand vectorIs given iniFor the numbering of the different traffic classes,i=0,1,2 … n, n being the total number of traffic classes;diis indicated by the reference numberiFor example, the service category may include video service, call service, download service, upload service, and so on;diis shown asiNumber of resource units per resource requirement of a traffic class, e.g.dA 1 indicates that only one power unit (or buffer unit) is needed for the current service,siis indicated by the reference numberiOf the category (b).Presentation services1 requires one power (or buffer) resource unit,presentation servicesiDemand fordiA power (or buffer) resource unit.
The network access equipment packs the request information into a plurality of services, each service consists of a plurality of service units, and one service unit of one service can contain the requirement information of one time step of the service and the requirement information of one resource unit.
In practical applications, the network access device may sequentially input the resource requirement information (i.e., the requirement information of the resource unit) in each service unit, the power at the current time step, and the state information of the cache cluster resource to the resource allocation model at each time step. Wherein the network access device is intThe information temporally entered into the resource allocation model may include resource demand information in a business unit, antThe power at a time and the state information of the cache cluster resource are shown in fig. 6 and can be represented by a state space, which is denoted as. The power and cache cluster resources specifically refer to power and cache cluster resources that can be used by the service.
In practical application, can be intInputting resource demand information in a plurality of service units to the resource allocation model within a time step of a time instant, andtpower at the moment and caching of state information of the cluster resources.
The network access device may be an ONU in the MEC, or a base station, or other network access devices for accessing the intelligent terminal to the network.
Step S102: and according to the resource allocation result output by the resource allocation model, allocating power and cache resources for the service and executing the service.
In this step, the network access device takes the action with the maximum probability as the final power or cache resource distribution result according to the probability distribution of all possible actions output by the resource distribution model; and distributing power and/or cache resources for the service according to the distribution result and executing the service.
Specifically, the resource allocation model is a deep neural network, and the structure of the deep neural network can be as shown in fig. 7; the resource allocation model is a policy network, and a policy function in the policy network can output probability distribution of all possible actions according to the input of the resource allocation model; each action corresponds to a power or cache resource distribution result, wherein the action with the maximum probability is used as the final power or cache resource distribution result; that is, the output of the resource allocation model is represented by an action space, corresponding to the result of allocation of resources; the resource allocation result may be specifically that a position is allocated in the resource slot map for the service unit. The resource slot map specifically includes a power and cached resource slot map. For example, after the resource allocation model outputs an action according to the policy, the network access device allocates a service unit to a position corresponding to the action in the resource slot map according to the action with the highest probability output by the resource allocation model. For example, the action a =1 with the highest probability of the resource allocation model output indicates that the service unit is allocated to the first position of the corresponding first row in the resource slot map.
For example, for the current traffic will betResource requirement information temporally input into a service unit of a resource allocation model andtthe power of the moment and the state information of the cache cluster resource are recorded asThen the action with the highest probability of the resource allocation model output can be expressed asI.e. thatIs shown asThe service unit of (2) allocates one in the resource slot mapThe corresponding position is the distribution result of the power or cache resource distribution of the service unit; wherein,representing a policy function in the resource allocation model.
In practical application, if the resource demand information in a plurality of service units is input into the resource allocation model within a time step, andtthe power at that moment and the state information of the cache cluster resources, the resource allocation model may output results for these service units, respectively, that is, for each service unit, the resource allocation model outputs a probability distribution of all possible actions.
The above-mentioned motion space can be formed byIs given inIndicating "scheduling a service unit to the ith position of the resource slot image";indicating an "invalid" operation, indicating that no position is scheduled at the current time step. By "valid" is meant that the location of each service will be arranged in the resource slot map according to the resource requirements of the arriving service and the required time step to achieve the minimum required time step and the maximum resource utilization. In fact, in the selectionThe resource slot map will then be shifted up by one time step in its entirety and all newly arrived traffic will thereafter be processed by the network access device. By decoupling the decision order of the resource allocation model from real-time (i.e., at each time step, time is frozen until the resource allocation model selects an invalid operation), the system can cause the resource allocation model to schedule multiple service units at the same time stepWhile keeping the operating space linear.
The resource allocation model is obtained by pre-training: the resource allocation model is obtained by pre-training according to the training data comprising the previously acquired resource demand information, power and state information of the cache cluster resources and the execution delay of the service; and the execution time delay of the service is determined after the service is executed according to the resource allocation result after the resource allocation result output by the resource allocation model according to the training data.
The resource allocation model can be trained in an online manner or an offline manner.
When the training is carried out in an online mode, the resource allocation model can be trained in the network access equipment by using the request information of the service received by the network access equipment as training data;
when training in an off-line mode, the resource demand information of the service needs to be collected in advance as training data: in the scenario environment, each scenario will have a fixed number of services arriving and scheduled according to a policy, and when all services complete scheduling, the scenario will terminate. In order to train a general strategy, a plurality of examples of service arrival sequences are considered, a service set is formed, and resource requirement information of each service in the service set is used as training data. In each training iteration, multiple episodes may be simulated for each business set to explore a probability space of possible actions to take using the current policy function, and the resulting results used to refine the policy function for all business sets.
In the above method for training a resource allocation model, at each time step, the model parameters in an iteration process may be adjusted according to the following method, and a specific flow is shown in fig. 2, and specifically includes the following steps:
step S201: and inputting the previously acquired resource demand information, power and state information of the cache cluster resources as training data into the resource allocation model.
Specifically, each service in the training data may be composed of a plurality of service units, and one service unit of one service may include requirement information of one time step of the service and requirement information of one resource unit.
In practical application, the resource demand information in each service unit, the power at the current time step and the state information of the cache cluster resource may be sequentially input to the resource allocation model at each time step. For example, intThe information temporally entered into the resource allocation model may include resource demand information in a business unit, antThe power at a time and the state information of the cache cluster resources, as shown in FIG. 3, can be represented by a state space, which is denoted as. The power and cache cluster resources specifically refer to power and cache cluster resources that can be used by the service.
Step S202: and executing the service according to the resource allocation result output by the resource allocation model according to the training data, and determining the execution time delay of the service.
In this step, according to the probability distribution of all possible actions output by the resource allocation model, the action with the highest probability is used as the final allocation result of power or cache resources; distributing power and/or cache resources for the service according to the distribution result, and executing the service; and determining the execution delay of the service.
Specifically, as shown in FIG. 3, the resource allocation model is based on inputsThe action with the highest probability of output can be expressed asWhereinrepresenting a policy function in the resource allocation model.
The objective of the allocation strategy of the strategy function in the resource allocation model is to minimize the average delay of the traffic, and the objective of the reinforcement learning is to maximize the expected cumulative discount reward value:
wherein,is composed oftThe value of the prize at the time of day,γfor rewarding the attenuation factor, at (0, 1)]And taking values in the interval.
After the resource allocation result is output by the resource allocation model and the service is executed according to the resource allocation result, the time delay of the service can be determinedWhereinFor the completion time of the service (i.e. the time between arrival and completion),as a servicesThe execution time of (a) is determined,。
reward value per time step is set to,tThe prize value at the time isHere, theTIndicating that the current service is executed from the beginning to the endtThe duration of the time, i.e., the execution delay of the current service; for the case where the current service comprises a plurality of services,Tparticularly the beginning of each service in the current serviceExecute totAverage of the duration at time;Srepresenting the current set of services in the system. It may be noted that the jackpot prize is inversely related to the execution latency of the service over time, so maximizing the jackpot prize minimizes the average latency.
Step S203: and adjusting the parameters of the resource allocation model according to the determined execution delay of the service.
Specifically, as shown in fig. 3, parameters of the resource allocation model, that is, parameters of the policy function, may be updated by a gradient descent method according to the reward value calculated by the execution delay of the service; for example, the parameters of the policy function, i.e. the parameters of the resource allocation model, may be updated according to the following equation 1:
In the formula 1, the reaction mixture is,is composed oftTime resource distribution model (policy function)Is determined by the parameters of (a) and (b),to be updatedt+1 time resource allocation model (policy function)Is determined by the parameters of (a) and (b),is a coefficient constant with a value of 0-1;
is composed oftObjective function value at time:is at presenttPrize value earned at a timeAnd passing the reward attenuation factorγThe last moment after attenuation ist-1 the sum of the largest values among the values of the cost function calculated separately for the information of each business unit input into the resource allocation model; wherein,is a cost function;
And is(ii) a Wherein,to representtResource demand information of a service unit of a current service, and status information of power and cache cluster resources, which are input into the resource allocation model at a time,representing the resource allocation model fortInput of time of dayAn action of outputting the highest probability;specifically representing a cost function calculation performed based on information of the business units input into the resource allocation model;
representing the resource allocation model fort-input of time 1An action of outputting the highest probability;express according tot-1, the largest value among the values of the cost function calculated separately for the information of each business unit input into the resource allocation model;
wherein,to representThe expected value of (c) is,to representIs calculated from the expected value of (c).
Due to the currently performed actionThe state of the resource distribution system at the subsequent time can be influenced, so that the accumulated reward value obtained at each subsequent time is maximized as a modeling distribution strategy after the resource distribution system executes distribution according to the distribution strategyThe basis of (1); wherein,taccumulated prize value for a timeCan be based ontPrize value for time of dayThe specific expression is obtained by calculation and is shown as formula 2:
Wherein,indicating that the action has been performedAfter, follow upnDue to action at one momentThe prize won. Derivation of an expression for the jackpot value indicates that the jackpot is awardedEquals execution actionRewards earned immediately thereafterPlus the accumulated prize value that was available at the previous time, i.e. time t-1Multiplying by a reward attenuation factorγ。
In the above-mentioned formula 1, the compound,representation versus cost functionAbout policy parametersDifferentiating, as shown in equation 3:
Through the flow method shown in the figure 2, a plurality of iterative processes can be carried out, and the value function can be continuously updated in an iterative mannerValue of (2) and resource allocation model (policy function)Until the value of the currently calculated cost function equals the value of the objective function, thereby stopping the iterative process; at the moment, the resource allocation model is stable and the optimal value function can be obtainedbest, further maximizing the cumulative prizeAnd determining a strategy function of the neural network at the timeThe optimal value can be used as a strategy function in a resource allocation model obtained by training, so that the accumulative reward value of service resource scheduling can reach the maximum value, and the minimum service average time delay is obtained.
Based on the foregoing resource allocation method of the network access device, an internal structure of a resource allocation apparatus of the network access device provided in the embodiment of the present invention is shown in fig. 4, and includes: a requirement information acquisition module 401, a state information determination module 402, an action determination module 403, an allocation result determination module 404, and a resource allocation module 405.
The requirement information acquiring module 401 is configured to acquire resource requirement information of a current service of the network access device;
the status information determining module 402 is configured to determine the current power of the network access device and/or the status information of the cache cluster resource;
the action determining module 403 is configured to input the resource requirement information and the status information into a trained resource allocation model; the resource allocation model is a deep neural network which outputs probability distribution of all possible actions according to input resource demand information and state information; wherein each action corresponds to a power or cache resource allocation result;
the allocation result determining module 404 is configured to take the action with the highest probability as the final allocation result of power or cache resources according to the probability distribution of all possible actions output by the resource allocation model; and
the resource allocation module 405 is configured to allocate power and/or cache resources to the service according to the allocation result.
Further, a resource allocation apparatus for a network access device provided in an embodiment of the present invention may further include: a model training module 406.
The model training module 406 takes the resource demand information, power and state information of the cache cluster resource of the previously collected service as training data; inputting the training data to the resource allocation model; according to the probability distribution of all possible actions output by the resource allocation model, taking the action with the maximum probability as the final allocation result of power or cache resources; distributing power and/or cache resources for the service according to the distribution result; after executing the service, determining the execution time delay of the service; calculating a reward value according to the execution delay of the service; and adjusting the parameters of the resource allocation model according to the reward value.
The specific implementation method for functions of each module in the resource allocation apparatus of the network access device provided in the embodiment of the present invention may refer to the method of each step in the flows shown in fig. 1 and 2, and is not described herein again.
Fig. 5 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the resource allocation method of the network access device provided in the embodiment of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module, and can be connected with a nonlinear receiver to receive information from the nonlinear receiver, so as to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
The electronic device may be specifically disposed in a network access device such as an ONU and a base station.
Furthermore, an embodiment of the present invention provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions for causing the computer to execute the resource allocation method of the network access device as described above.
In the technical scheme of the invention, the resource demand information of the current service of the network access equipment is obtained; determining the current power of the network access equipment and the state information of the cache cluster resources; inputting the resource demand information and the state information into a trained resource allocation model; the resource allocation model is a deep neural network which outputs probability distribution of all possible actions according to input resource demand information and state information; wherein each action corresponds to a power or cache resource allocation result; according to the probability distribution of all possible actions output by the resource allocation model, taking the action with the maximum probability as the final allocation result of power or cache resources; and allocating power and/or cache resources to the service according to the allocation result. By the resource allocation model which can independently learn how to perform efficient resource allocation, the resource demand information of the input service, the state information of the power and the cache cluster resources are automatically analyzed, the resource allocation is automatically performed, and the resource scheduling calculation burden of the network equipment can be reduced; and because the model is obtained by the execution delay training of the fed back service, the minimum service average delay can be obtained, thereby realizing the maximization of the resource interest rate.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (9)
1. A resource allocation method of a network access device is characterized by comprising the following steps:
acquiring resource demand information of the current service of the network access equipment;
determining the current power of the network access equipment and the state information of the cache cluster resources;
inputting the resource demand information and the state information into a trained resource allocation model; the resource allocation model is a deep neural network which outputs probability distribution of all possible actions according to input resource demand information and state information; wherein each action corresponds to a power or cache resource allocation result;
according to the probability distribution of all possible actions output by the resource allocation model, taking the action with the maximum probability as the final allocation result of power or cache resources; and
distributing power and/or cache resources for the service according to the distribution result;
the resource allocation model is obtained by performing parameter adjustment and pre-training according to previously acquired training data and a reward value calculated according to execution delay of a service, and specifically, parameters of the resource allocation model are updated according to the following formula 1:
μt+1=μt-α(vt-yt) dt formula 1
Wherein, mutRepresenting the resource allocation model pi at the moment tuParameter of (d), mut+1Representing the updated resource allocation model pi at the t +1 momentuAlpha is a coefficient constant taking the value of 0-1;
ytfor the objective function value at time t: y ist=Rt+γmax(vπ(st-1,at-1,μt-1) Wherein R) istThe reward value at the current time t is, and gamma is a reward attenuation factor;
value v of the cost function at time tt=E[Gt]=vπ(st,at,μt);
And v ist=Rt+γvπ(st-1,at-1,μt-1);
Wherein s istResource demand information representing a service unit of a current service input into said resource allocation model at time t, and status information of power and cache cluster resources, atRepresenting an input s of the resource allocation model for time ttAn action of outputting the highest probability; v. ofπRepresenting a cost function calculation based on information of the business units input into the resource allocation model;
at-1representing an input s of the resource allocation model for a time t-1t-1An action of outputting the highest probability; max (v)π(st-1,at-1,μt-1) Represents the largest value among the values of the cost function calculated from the information of each service unit inputted to the resource allocation model at time t-1, respectively;
E[Gt]represents GtExpected value of E [ G ]t-1]=vπ(st-1,at-1,μt-1) Represents Gt-1The expected value of (d);
wherein, the accumulated reward value G at the time ttReward value R according to time ttCalculated by the following formula 2:
wherein R ist-nIndicates that action a has been performedtThen, the next n moments are due to action atThe resulting reward, IE, represents the cumulative discount reward for the monte carlo method in reinforcement learning.
2. The method of claim 1, wherein the resource allocation model is specifically trained as follows:
using the collected resource demand information, power and state information of the cache cluster resource of the service as the training data;
inputting the training data to the resource allocation model;
according to the probability distribution of all possible actions output by the resource allocation model, taking the action with the maximum probability as the final allocation result of power or cache resources;
distributing power and/or cache resources for the service according to the distribution result;
after executing the service, determining the execution time delay of the service;
calculating a reward value according to the execution delay of the service;
and adjusting the parameters of the resource allocation model according to the reward value.
3. The method of claim 2, wherein the execution delay of the service is inversely related to the reward value; wherein,
4. The method of claim 1, wherein the obtaining resource requirement information of the current service of the network access device comprises:
the network access equipment receives request information of a plurality of services of one or more mobile terminals;
the network access equipment decodes the received request information of a plurality of services according to the distance from the network access equipment to the mobile terminal, the moving speed of the mobile terminal and the channel condition;
the network access equipment packs the decoded request information of the plurality of services into a plurality of services; each service is composed of a plurality of service units; wherein a service unit of a service comprises: a time step of demand information for the service and a resource unit of demand information.
5. The method of claim 4, wherein inputting the resource demand information and the state information into a trained resource allocation model comprises:
and the network access equipment sequentially inputs the requirement information of the resource units in each service unit, the power of the current time step and the state information of the cache cluster resources into the resource allocation model at each time step.
6. The method of claim 5, wherein the probability distribution of all possible actions output according to the resource allocation model, and taking the action with the highest probability as the final power or buffer resource allocation result comprises:
according to the action with the maximum probability in the probability distribution of all possible actions output by the resource allocation model, the action serves as the corresponding position of the service unit in the resource slot position mapping, and serves as the allocation result of power and/or cache resource allocation; wherein the resource slot image comprises a power and cached resource slot image.
7. A resource allocation apparatus of a network access device, comprising:
a demand information acquisition module, configured to acquire resource demand information of a current service of the network access device;
a status information determining module, configured to determine status information of the current power and/or cache cluster resource of the network access device;
an action determining module for inputting the resource demand information and the state information into a trained resource allocation model; the resource allocation model is a deep neural network which outputs probability distribution of all possible actions according to input resource demand information and state information; wherein each action corresponds to a power or cache resource allocation result;
the allocation result determining module is used for taking the action with the maximum probability as the final allocation result of the power or the cache resource according to the probability distribution of all possible actions output by the resource allocation model; and
the resource allocation module is used for allocating power and/or cache resources to the service according to the allocation result;
the resource allocation model is obtained by performing parameter adjustment and pre-training according to previously acquired training data and a reward value calculated according to execution delay of a service, and specifically, parameters of the resource allocation model are updated according to the following formula 1:
μt+1=μt-α(vt-yt) dt formula 1
Wherein, mutRepresenting the resource allocation model pi at the moment tuParameter of (d), mut+1Representing the updated resource allocation model pi at the t +1 momentuAlpha is a coefficient constant taking the value of 0-1;
ytfor the objective function value at time t: y ist=Rt+γmax(vπ(st-1,at-1,μt-1) Wherein R) istThe reward value at the current time t is, and y is a reward attenuation factor;
value v of the cost function at time tt=E[Gt]=vπ(st,at,μt);
And v ist=Rt+γvπ(st-1,at-1,μt-1);
Wherein s istResource demand information representing a service unit of a current service input into said resource allocation model at time t, and status information of power and cache cluster resources, atRepresenting an input s of the resource allocation model for time ttAn action of outputting the highest probability; v. ofπRepresenting a cost function calculation based on information of the business units input into the resource allocation model;
at-1representing an input s of the resource allocation model for a time t-1t-1An action of outputting the highest probability; max (v)π(st-1,at-1,μt-1) Represents the largest value among the values of the cost function calculated from the information of each service unit inputted to the resource allocation model at time t-1, respectively;
E[Gt]represents GtExpected value of E [ G ]t-1]=vπ(st-1,at-1,μt-1) Represents Gt-1The expected value of (d);
wherein, the accumulated reward value G at the time ttReward value R according to time ttCalculated by the following formula 2:
wherein R ist-nIndicates that action a has been performedtThen, the next n moments are due to action atThe obtained reward IE represents the Monte Carlo party in the reinforcement learningA cumulative discount reward for the law.
8. An electronic device comprising a central processing unit, a signal processing and storage unit, and a computer program stored on the signal processing and storage unit and executable on the central processing unit, characterized in that the central processing unit implements the method according to any of claims 1-6 when executing the program.
9. A non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions for causing the computer to perform the resource allocation method of any one of claims 1 to 6.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011584793.8A CN112291793B (en) | 2020-12-29 | 2020-12-29 | Resource allocation method and device of network access equipment |
JP2021074502A JP7083476B1 (en) | 2020-12-29 | 2021-04-26 | Network access device resource allocation method and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011584793.8A CN112291793B (en) | 2020-12-29 | 2020-12-29 | Resource allocation method and device of network access equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112291793A CN112291793A (en) | 2021-01-29 |
CN112291793B true CN112291793B (en) | 2021-04-06 |
Family
ID=74426534
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011584793.8A Active CN112291793B (en) | 2020-12-29 | 2020-12-29 | Resource allocation method and device of network access equipment |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP7083476B1 (en) |
CN (1) | CN112291793B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112512070B (en) * | 2021-02-05 | 2021-05-11 | 之江实验室 | Multi-base-station cooperative wireless network resource allocation method based on graph attention mechanism reinforcement learning |
CN113220452A (en) * | 2021-05-10 | 2021-08-06 | 北京百度网讯科技有限公司 | Resource allocation method, model training method, device and electronic equipment |
CN113515385A (en) * | 2021-07-30 | 2021-10-19 | 盛景智能科技(嘉兴)有限公司 | Resource scheduling method and device, electronic equipment and storage medium |
CN113840333B (en) * | 2021-08-16 | 2023-11-10 | 国网河南省电力公司信息通信公司 | Power grid resource allocation method and device, electronic equipment and storage medium |
CN114465883B (en) * | 2022-01-06 | 2024-09-24 | 北京全路通信信号研究设计院集团有限公司 | Automatic service resource distribution system and method based on SDN network |
CN114760639A (en) * | 2022-03-30 | 2022-07-15 | 深圳市联洲国际技术有限公司 | Resource unit allocation method, device, equipment and storage medium |
CN115499319B (en) * | 2022-09-19 | 2024-09-06 | 北京达佳互联信息技术有限公司 | Resource distribution method, device, electronic equipment and storage medium |
CN115331796B (en) * | 2022-10-17 | 2022-12-27 | 中科厚立信息技术(成都)有限公司 | Intensive learning-based sickbed resource allocation optimization method, system and terminal |
CN115696403B (en) * | 2022-11-04 | 2023-05-16 | 东南大学 | Multi-layer edge computing task unloading method assisted by edge computing nodes |
CN115987797A (en) * | 2022-12-07 | 2023-04-18 | 广西通信规划设计咨询有限公司 | MEC-based optimized Internet of things node resource allocation method and system |
CN118368740A (en) * | 2023-01-17 | 2024-07-19 | 华为技术有限公司 | Wireless resource allocation method and device |
CN116795066B (en) * | 2023-08-16 | 2023-10-27 | 南京德克威尔自动化有限公司 | Communication data processing method, system, server and medium of remote IO module |
CN117992217B (en) * | 2024-01-08 | 2024-07-23 | 中国科学院沈阳自动化研究所 | Method and system for distributing operation resources of industrial control system, medium and terminal |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013110966A1 (en) * | 2012-01-27 | 2013-08-01 | Empire Technology Development Llc | Parameterized dynamic model for cloud migration |
CN108989099A (en) * | 2018-07-02 | 2018-12-11 | 北京邮电大学 | Federated resource distribution method and system based on software definition Incorporate network |
CN111556518A (en) * | 2020-06-12 | 2020-08-18 | 国网经济技术研究院有限公司 | Resource allocation method and system for improving network quality in multi-slice network |
CN111866953A (en) * | 2019-04-26 | 2020-10-30 | 中国移动通信有限公司研究院 | Network resource allocation method, device and storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111031102B (en) * | 2019-11-25 | 2022-04-12 | 哈尔滨工业大学 | Multi-user, multi-task mobile edge computing system cacheable task migration method |
CN112134916B (en) * | 2020-07-21 | 2021-06-11 | 南京邮电大学 | Cloud edge collaborative computing migration method based on deep reinforcement learning |
CN111953759B (en) * | 2020-08-04 | 2022-11-11 | 国网河南省电力公司信息通信公司 | Collaborative computing task unloading and transferring method and device based on reinforcement learning |
-
2020
- 2020-12-29 CN CN202011584793.8A patent/CN112291793B/en active Active
-
2021
- 2021-04-26 JP JP2021074502A patent/JP7083476B1/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013110966A1 (en) * | 2012-01-27 | 2013-08-01 | Empire Technology Development Llc | Parameterized dynamic model for cloud migration |
CN108989099A (en) * | 2018-07-02 | 2018-12-11 | 北京邮电大学 | Federated resource distribution method and system based on software definition Incorporate network |
CN111866953A (en) * | 2019-04-26 | 2020-10-30 | 中国移动通信有限公司研究院 | Network resource allocation method, device and storage medium |
CN111556518A (en) * | 2020-06-12 | 2020-08-18 | 国网经济技术研究院有限公司 | Resource allocation method and system for improving network quality in multi-slice network |
Also Published As
Publication number | Publication date |
---|---|
JP2022104776A (en) | 2022-07-11 |
CN112291793A (en) | 2021-01-29 |
JP7083476B1 (en) | 2022-06-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112291793B (en) | Resource allocation method and device of network access equipment | |
CN111835827B (en) | Internet of things edge computing task unloading method and system | |
CN113950066B (en) | Single server part calculation unloading method, system and equipment under mobile edge environment | |
CN109829332B (en) | Joint calculation unloading method and device based on energy collection technology | |
CN109002358B (en) | Mobile terminal software self-adaptive optimization scheduling method based on deep reinforcement learning | |
CN114021770B (en) | Network resource optimization method and device, electronic equipment and storage medium | |
CN110956202B (en) | Image training method, system, medium and intelligent device based on distributed learning | |
CN110389816B (en) | Method, apparatus and computer readable medium for resource scheduling | |
CN113268341B (en) | Distribution method, device, equipment and storage medium of power grid edge calculation task | |
CN112766497B (en) | Training method, device, medium and equipment for deep reinforcement learning model | |
CN113778691B (en) | Task migration decision method, device and system | |
CN110968366A (en) | Task unloading method, device and equipment based on limited MEC resources | |
CN111580974B (en) | GPU instance allocation method, device, electronic equipment and computer readable medium | |
CN110768861B (en) | Method, device, medium and electronic equipment for obtaining overtime threshold | |
CN118210609A (en) | Cloud computing scheduling method and system based on DQN model | |
CN114007231A (en) | Heterogeneous unmanned aerial vehicle data unloading method and device, electronic equipment and storage medium | |
CN111694670B (en) | Resource allocation method, apparatus, device and computer readable medium | |
Lin et al. | Learning-Based Query Scheduling and Resource Allocation for Low-Latency Mobile Edge Video Analytics | |
CN116560832A (en) | Resource allocation method oriented to federal learning and related equipment | |
CN113052312B (en) | Training method and device of deep reinforcement learning model, medium and electronic equipment | |
CN115080233A (en) | Resource allocation management method, device, equipment and storage medium for application software | |
CN112148448A (en) | Resource allocation method, device, equipment and computer readable medium | |
CN118093145B (en) | Task scheduling method and device based on calculation force and computer program product | |
CN117251035B (en) | Heat dissipation control method, heat dissipation control device, electronic equipment and computer readable medium | |
CN115297361B (en) | Transcoding task processing method and device, transcoding system, electronic equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |