CN112291793A

CN112291793A - Resource allocation method and device of network access equipment

Info

Publication number: CN112291793A
Application number: CN202011584793.8A
Authority: CN
Inventors: 杨辉; 李雪婷; 姚秋彦; 包博文; 李超; 孙政洁; 张�杰
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2021-01-29
Anticipated expiration: 2040-12-29
Also published as: JP2022104776A; JP7083476B1; CN112291793B

Abstract

The invention discloses a resource allocation method and a device of network access equipment, wherein the method comprises the following steps: acquiring resource demand information of a current service of network access equipment; determining the current power of the network access equipment and the state information of the cache cluster resources; inputting the resource demand information and the state information into a trained resource allocation model; the resource allocation model is a deep neural network which outputs probability distribution of all possible actions according to input resource demand information and state information; wherein each action corresponds to a power or cache resource allocation result; according to the probability distribution of all possible actions output by the resource allocation model, taking the action with the maximum probability as the final allocation result of power or cache resources; and distributing power and/or cache resources for the service according to the distribution result. The invention can reduce the resource scheduling calculation burden of the network equipment and realize the maximization of the resource interest rate.

Description

Resource allocation method and device of network access equipment

Technical Field

The present invention relates to the field of communications network technologies, and in particular, to a method and an apparatus for resource allocation of a network access device.

Background

With The wide application of more and more intelligent devices and Internet of Things technologies (The Internet of Things, iot), intelligent terminals have become an indispensable part in modern life, such as notebook computers, smart phones and The like, and people begin to develop novel services such as high-definition live video and augmented reality on The intelligent terminal devices to facilitate our lives. These provide a more comfortable content experience for the user while also leading to unprecedented growth in network traffic. Due to the limitation of computing power and battery capacity, the terminal equipment cannot efficiently meet the basic requirements of low time delay and high computation of a large number of novel services. Offloading the compute-intensive tasks to the cloud increases transmission delays and additional network load. Therefore, a Mobile Edge Computing (MEC) technology is proposed, which migrates the computing and storage capacity of the cloud to the edge of the network, and performs task computing through the edge, thereby reducing the energy consumption and execution delay of the terminal equipment and improving the service quality.

Due to the intensive deployment of Mobile Edge Computing (MEC) units and the popularity of bandwidth-intensive applications, maximizing the throughput of access to the network becomes an urgent challenge. With the high capacity and low latency of optical networks, the dense deployment of optical networks, Optical Network Units (ONUs), in which power and cache resource management are two major factors affecting throughput, have become the main access points of MECs, and it requires adaptive and efficient online decision making for power and cache resource management. Resource management issues, often manifested as online decision-making issues, persist in computer systems and networks, such as network congestion control, job scheduling for computing clusters, and so forth. Aiming at the problem that data in the network explodes rapidly with the increase of the number of network connection devices, the end point of data processing cannot be located on a cloud platform of a core network completely, almost half of the data needs to be analyzed and processed at the edge of the network, so that the high-efficiency resource allocation is performed, the resource utilization rate is improved, the throughput in the MEC network is improved, and the resource management of self-learning combined allocation of power and cache resources in the network is realized.

The existing resource allocation method has the defects of low resource utilization rate, too complex resource optimization method and the like in the network resource management aspect of mobile edge calculation.

Disclosure of Invention

In view of the above, the present invention is directed to a method and an apparatus for resource allocation of a network access device, which have a global view of adaptive learning, can reduce the resource scheduling computation burden of the network device, and maximize the resource utilization rate.

Based on the above object, the present invention provides a resource allocation method for a network access device, including:

acquiring resource demand information of the current service of the network access equipment;

determining the current power of the network access equipment and the state information of the cache cluster resources;

inputting the resource demand information and the state information into a trained resource allocation model; the resource allocation model is a deep neural network which outputs probability distribution of all possible actions according to input resource demand information and state information; wherein each action corresponds to a power or cache resource allocation result;

according to the probability distribution of all possible actions output by the resource allocation model, taking the action with the maximum probability as the final allocation result of power or cache resources; and

and distributing power and/or cache resources for the service according to the distribution result.

The resource allocation model is obtained by pre-training according to training data comprising previously acquired resource demand information of the service and execution delay of the service; the execution time delay of the service is determined after the service is executed according to the resource allocation result after the resource allocation result output by the resource allocation model according to the training data; the specific training method comprises the following steps:

using the previously collected resource demand information, power and state information of the cache cluster resources as training data;

inputting the training data to the resource allocation model;

according to the probability distribution of all possible actions output by the resource allocation model, taking the action with the maximum probability as the final allocation result of power or cache resources;

distributing power and/or cache resources for the service according to the distribution result;

after executing the service, determining the execution time delay of the service;

calculating a reward value according to the execution delay of the service;

and adjusting the parameters of the resource allocation model according to the reward value.

Preferably, the execution delay of the service is inversely related to the reward value; in particular, the amount of the solvent to be used,

the reward value is calculated according to the following method:tprize value for time of day

(ii) a Wherein the content of the first and second substances,Tindicating that the current service is executed from the beginning to the endtThe duration of the time, i.e. the execution delay of the current service.

Preferably, the adjusting the parameter of the resource allocation model according to the reward value specifically includes:

updating parameters of the resource allocation model according to the following formula 1:

formula 1

Wherein the content of the first and second substances,

to representtTime of day the resource allocation model

Is determined by the parameters of (a) and (b),

indicating updatedtResource allocation model at +1 moment

Is determined by the parameters of (a) and (b),

is a coefficient constant with a value of 0-1;

is composed oftObjective function value at time:

wherein, in the step (A),

is at presenttThe value of the prize at the time of day,γa reward attenuation factor;

tvalue of the cost function at a time

；

And is

；

Wherein the content of the first and second substances,

to representtResource demand information of a service unit of a current service, and status information of power and cache cluster resources, which are input into the resource allocation model at a time,

representing the resource allocation model fortInput of time of day

An action of outputting the highest probability;

representing a cost function calculation based on information of the business units input into the resource allocation model;

representing the resource allocation model fort-input of time 1

An action of outputting the highest probability;

express according tot-1, the largest value among the values of the cost function calculated separately for the information of each business unit input into the resource allocation model;

to represent

The expected value of (c) is,

to represent

The expected value of (d);

wherein the content of the first and second substances,taccumulated prize value for a time

According totPrize value for time of day

Calculated by the following formula 2:

formula 2

Wherein the content of the first and second substances,

indicating that the action has been performed

After, follow upnDue to action at one moment

The result of the reward is that,IErepresenting a cumulative discount reward for the monte carlo method in reinforcement learning.

The invention also provides a resource allocation device of the network access equipment, which comprises:

a demand information acquisition module, configured to acquire resource demand information of a current service of the network access device;

a status information determining module, configured to determine status information of the current power and/or cache cluster resource of the network access device;

an action determining module for inputting the resource demand information and the state information into a trained resource allocation model; the resource allocation model is a deep neural network which outputs probability distribution of all possible actions according to input resource demand information and state information; wherein each action corresponds to a power or cache resource allocation result;

the allocation result determining module is used for taking the action with the maximum probability as the final allocation result of the power or the cache resource according to the probability distribution of all possible actions output by the resource allocation model; and

and the resource allocation module is used for allocating power and/or cache resources to the service according to the allocation result.

The present invention also provides an electronic device comprising a central processing unit, a signal processing and storage unit, and a computer program stored on the signal processing and storage unit and operable on the central processing unit, wherein the central processing unit performs the resource allocation method of the network access device as described above.

The present invention also provides a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions for causing the computer to execute the resource allocation method of a network access device as described above.

In the technical scheme of the invention, the resource demand information of the current service of the network access equipment is obtained; determining the current power of the network access equipment and the state information of the cache cluster resources; inputting the resource demand information and the state information into a trained resource allocation model; the resource allocation model is a deep neural network which outputs probability distribution of all possible actions according to input resource demand information and state information; wherein each action corresponds to a power or cache resource allocation result; according to the probability distribution of all possible actions output by the resource allocation model, taking the action with the maximum probability as the final allocation result of power or cache resources; and allocating power and/or cache resources to the service according to the allocation result. By the resource allocation model which can independently learn how to perform efficient resource allocation, the resource demand information of the input service, the state information of the power and the cache cluster resources are automatically analyzed, the resource allocation is automatically performed, and the resource scheduling calculation burden of the network equipment can be reduced; and because the model is obtained by the execution delay training of the fed back service, the minimum service average delay can be obtained, thereby realizing the maximization of the resource interest rate.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a resource allocation method of a network access device according to an embodiment of the present invention;

FIG. 2 is a flowchart of a resource allocation model training method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of resource allocation model training according to an embodiment of the present invention;

fig. 4 is a block diagram of an internal structure of a resource allocation apparatus of a network access device according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention;

FIG. 6 is a diagram of service and environment resource status information provided by an embodiment of the present invention;

fig. 7 is a schematic diagram of a deep neural network structure of a resource allocation model according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.

It is to be noted that technical terms or scientific terms used in the embodiments of the present invention should have the ordinary meanings as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

The technical scheme of the invention provides a PaC (power and cache) resource allocation model based on deep reinforcement learning, which automatically analyzes and allocates an input data set without human intervention, creates a system capable of autonomously learning how to efficiently allocate resources, reduces the resource scheduling calculation burden of network equipment and realizes the maximization of resource interest rate.

The technical solution of the embodiments of the present invention is described in detail below with reference to the accompanying drawings.

The resource allocation method of the network access device provided by the embodiment of the invention has the flow as shown in fig. 1, and comprises the following steps:

step S101: the network access equipment inputs the resource demand information of the current service, the current power and the state information of the cache cluster resource into the resource allocation model.

In this step, a network access device (e.g., an ONU in an MEC) acquires resource demand information of a current service, determines current power and state information of a cache cluster resource, and inputs the resource demand information and the state information into a trained resource allocation model.

Specifically, the network access device may receive request information of multiple services of one or more users, and the network access device correctly decodes the received request information of multiple services according to the grasped mobile terminal distance, moving speed, channel condition, and the like of the user; assuming that service information data of a request received by a network access device is PaC (Power and Cache) data, the resource requirement of each service is known when the service arrives, and each service has a known resource requirementsIs configured by a resource demand vector

Is given iniFor the numbering of the different traffic classes,i=0,1,2 … n, n being the total number of traffic classes;diis indicated by the reference numberiFor example, the service category may include video service, call service, download service, upload service, and so on;diis shown asiNumber of resource units per resource requirement of a traffic class, e.g.dA 1 indicates that only one power unit (or buffer unit) is needed for the current service,siis indicated by the reference numberiOf the category (b).

Presentation services1 requires one power (or buffer) resource unit,

presentation servicesiDemand fordiA power (or buffer) resource unit.

The network access equipment packs the request information into a plurality of services, each service consists of a plurality of service units, and one service unit of one service can contain the requirement information of one time step of the service and the requirement information of one resource unit.

In practical applications, the network access device may sequentially input the resource requirement information (i.e., the requirement information of the resource unit) in each service unit, the power at the current time step, and the state information of the cache cluster resource to the resource allocation model at each time step. Wherein the network access device is intThe information temporally entered into the resource allocation model may include resource demand information in a business unit, antThe power at a time and the state information of the cache cluster resource are shown in fig. 6 and can be represented by a state space, which is denoted as

. The power and cache cluster resources specifically refer to power and cache cluster resources that can be used by the service.

In practical application, can be intInputting resource demand information in a plurality of service units to the resource allocation model within a time step of a time instant, andtpower at the moment and caching of state information of the cluster resources.

The network access device may be an ONU in the MEC, or a base station, or other network access devices for accessing the intelligent terminal to the network.

Step S102: and according to the resource allocation result output by the resource allocation model, allocating power and cache resources for the service and executing the service.

In this step, the network access device takes the action with the maximum probability as the final power or cache resource distribution result according to the probability distribution of all possible actions output by the resource distribution model; and distributing power and/or cache resources for the service according to the distribution result and executing the service.

Specifically, the resource allocation model is a deep neural network, and the structure of the deep neural network can be as shown in fig. 7; the resource allocation model is a policy network, and a policy function in the policy network can output probability distribution of all possible actions according to the input of the resource allocation model; each action corresponds to a power or cache resource distribution result, wherein the action with the maximum probability is used as the final power or cache resource distribution result; that is, the output of the resource allocation model is represented by an action space, corresponding to the result of allocation of resources; the resource allocation result may be specifically that a position is allocated in the resource slot map for the service unit. The resource slot map specifically includes a power and cached resource slot map. For example, after the resource allocation model outputs an action according to the policy, the network access device allocates a service unit to a position corresponding to the action in the resource slot map according to the action with the highest probability output by the resource allocation model. For example, the action a =1 with the highest probability of the resource allocation model output indicates that the service unit is allocated to the first position of the corresponding first row in the resource slot map.

For example, for the current traffic will betResource requirement information temporally input into a service unit of a resource allocation model andtthe power of the moment and the state information of the cache cluster resource are recorded as

Then the action with the highest probability of the resource allocation model output can be expressed as

I.e. that

Is shown as

The service unit allocates a corresponding position in the resource slot position mapping, namely an allocation result of the service unit for allocating power or cache resources; wherein the content of the first and second substances,

representing a policy function in the resource allocation model.

In practical application, if the resource demand information in a plurality of service units is input into the resource allocation model within a time step, andtthe power at that moment and the state information of the cache cluster resources, the resource allocation model may output results for these service units, respectively, that is, for each service unit, the resource allocation model outputs a probability distribution of all possible actions.

The above-mentioned motion space can be formed by

Is given in

Indicating "scheduling a service unit to the ith position of the resource slot image";

indicating an "invalid" operation, indicating that no position is scheduled at the current time step. By "valid" is meant that the location of each service will be arranged in the resource slot map according to the resource requirements of the arriving service and the required time step to achieve the minimum required time step and the maximum resource utilization. In fact, in the selection

The resource slot map will then be shifted up by one time step in its entirety and all newly arrived traffic will thereafter be processed by the network access device. By decoupling the decision order of the resource allocation model from real-time (i.e., at each time step, time is frozen until the resource allocation model selects an invalid operation), the system can cause the resource allocation model to schedule multiple service units at the same time step while maintaining the operating space linear.

The resource allocation model is obtained by pre-training: the resource allocation model is obtained by pre-training according to the training data comprising the previously acquired resource demand information, power and state information of the cache cluster resources and the execution delay of the service; and the execution time delay of the service is determined after the service is executed according to the resource allocation result after the resource allocation result output by the resource allocation model according to the training data.

The resource allocation model can be trained in an online manner or an offline manner.

When the training is carried out in an online mode, the resource allocation model can be trained in the network access equipment by using the request information of the service received by the network access equipment as training data;

when training in an off-line mode, the resource demand information of the service needs to be collected in advance as training data: in the scenario environment, each scenario will have a fixed number of services arriving and scheduled according to a policy, and when all services complete scheduling, the scenario will terminate. In order to train a general strategy, a plurality of examples of service arrival sequences are considered, a service set is formed, and resource requirement information of each service in the service set is used as training data. In each training iteration, multiple episodes may be simulated for each business set to explore a probability space of possible actions to take using the current policy function, and the resulting results used to refine the policy function for all business sets.

In the above method for training a resource allocation model, at each time step, the model parameters in an iteration process may be adjusted according to the following method, and a specific flow is shown in fig. 2, and specifically includes the following steps:

step S201: and inputting the previously acquired resource demand information, power and state information of the cache cluster resources as training data into the resource allocation model.

Specifically, each service in the training data may be composed of a plurality of service units, and one service unit of one service may include requirement information of one time step of the service and requirement information of one resource unit.

In practical application, the resource demand information in each service unit, the power at the current time step and the state information of the cache cluster resource may be sequentially input to the resource allocation model at each time step. For example, intThe information temporally entered into the resource allocation model may include resource demand information in a business unit, antThe power at a time and the state information of the cache cluster resources, as shown in FIG. 3, can be represented by a state space, which is denoted as

Step S202: and executing the service according to the resource allocation result output by the resource allocation model according to the training data, and determining the execution time delay of the service.

In this step, according to the probability distribution of all possible actions output by the resource allocation model, the action with the highest probability is used as the final allocation result of power or cache resources; distributing power and/or cache resources for the service according to the distribution result, and executing the service; and determining the execution delay of the service.

Specifically, as shown in FIG. 3, the resource allocation model is based on inputs

The action with the highest probability of output can be expressed as

Wherein, in the step (A),

representing a policy function in the resource allocation model.

The objective of the allocation strategy of the strategy function in the resource allocation model is to minimize the average delay of the traffic, and the objective of the reinforcement learning is to maximize the expected cumulative discount reward value:

wherein the content of the first and second substances,

is composed oftThe value of the prize at the time of day,γfor rewarding the attenuation factor, at (0, 1)]And taking values in the interval.

After the resource allocation result is output by the resource allocation model and the service is executed according to the resource allocation result, the time delay of the service can be determined

Wherein

For the completion time of the service (i.e. the time between arrival and completion),

as a servicesThe execution time of (a) is determined,

。

reward value per time step is set to

，tThe prize value at the time is

Here, theTIndicating that the current service is executed from the beginning to the endtThe duration of the time, i.e., the execution delay of the current service; for the case where the current service comprises a plurality of services,Tspecifically representing the operation from the beginning to the execution of each service in the current servicetAverage of the duration at time;Srepresenting the current set of services in the system. It may be noted that the jackpot prize is inversely related to the execution latency of the service over time, so maximizing the jackpot prize minimizes the average latency.

Step S203: and adjusting the parameters of the resource allocation model according to the determined execution delay of the service.

Specifically, as shown in fig. 3, parameters of the resource allocation model, that is, parameters of the policy function, may be updated by a gradient descent method according to the reward value calculated by the execution delay of the service; for example, the parameters of the policy function, i.e. the parameters of the resource allocation model, may be updated according to the following equation 1:

(formula 1)

In the formula 1, the reaction mixture is,

is composed oftTime resource distribution model (policy function)

Is determined by the parameters of (a) and (b),

to be updatedt+1 time resource allocation model (policy function)

Is determined by the parameters of (a) and (b),

is a coefficient constant with a value of 0-1;

is composed oftObjective function value at time:

is at presenttPrize value earned at a time

And passing the reward attenuation factorγThe last moment after attenuation ist-1 the sum of the largest values among the values of the cost function calculated separately for the information of each business unit input into the resource allocation model; wherein the content of the first and second substances,

is a cost function;

tvalue of the cost function at a time

；

And is

(ii) a Wherein the content of the first and second substances,

to representtResource demand information of a service unit of a current service temporally input into the resource allocation model, and the shape of power and cache cluster resourcesThe information of the state is transmitted to the mobile terminal,

representing the resource allocation model fortInput of time of day

An action of outputting the highest probability;

specifically representing a cost function calculation performed based on information of the business units input into the resource allocation model;

representing the resource allocation model fort-input of time 1

An action of outputting the highest probability;

wherein the content of the first and second substances,

to represent

The expected value of (c) is,

to represent

Is calculated from the expected value of (c).

Due to the currently performed action

The state of the resource distribution system at the subsequent time can be influenced, so that the accumulated reward value obtained at each subsequent time is maximized as a modeling distribution strategy after the resource distribution system executes distribution according to the distribution strategy

The basis of (1); wherein the content of the first and second substances,taccumulated prize value for a time

Can be based ontPrize value for time of day

The specific expression is obtained by calculation and is shown as formula 2:

(formula 2)

Wherein the content of the first and second substances,

indicating that the action has been performed

After, follow upnDue to action at one moment

The prize won. Derivation of an expression for the jackpot value indicates that the jackpot is awarded

Equals execution action

Rewards earned immediately thereafter

Plus the accumulated prize value that was available at the previous time, i.e. time t-1

Multiplying by a reward attenuation factorγ。

In the above-mentioned formula 1, the compound,

representation versus cost function

About policy parameters

Differentiating, as shown in equation 3:

(formula 3)

Through the flow method shown in the figure 2, a plurality of iterative processes can be carried out, and the value function can be continuously updated in an iterative manner

Value of (2) and resource allocation model (policy function)

Until the value of the currently calculated cost function equals the value of the objective function, thereby stopping the iterative process; at the moment, the resource allocation model is stable and the optimal value function can be obtained

best, further maximizing the cumulative prize

And determining a strategy function of the neural network at the time

For optimization, it can be used as a strategy function in the resource allocation model obtained by training so as to enable the industryThe accumulated reward value of service resource scheduling can reach the maximum value, and the minimum service average time delay is obtained.

Based on the foregoing resource allocation method of the network access device, an internal structure of a resource allocation apparatus of the network access device provided in the embodiment of the present invention is shown in fig. 4, and includes: a requirement information acquisition module 401, a state information determination module 402, an action determination module 403, an allocation result determination module 404, and a resource allocation module 405.

The requirement information acquiring module 401 is configured to acquire resource requirement information of a current service of the network access device;

the status information determining module 402 is configured to determine the current power of the network access device and/or the status information of the cache cluster resource;

the action determining module 403 is configured to input the resource requirement information and the status information into a trained resource allocation model; the resource allocation model is a deep neural network which outputs probability distribution of all possible actions according to input resource demand information and state information; wherein each action corresponds to a power or cache resource allocation result;

the allocation result determining module 404 is configured to take the action with the highest probability as the final allocation result of power or cache resources according to the probability distribution of all possible actions output by the resource allocation model; and

the resource allocation module 405 is configured to allocate power and/or cache resources to the service according to the allocation result.

Further, a resource allocation apparatus for a network access device provided in an embodiment of the present invention may further include: a model training module 406.

The model training module 406 takes the resource demand information, power and state information of the cache cluster resource of the previously collected service as training data; inputting the training data to the resource allocation model; according to the probability distribution of all possible actions output by the resource allocation model, taking the action with the maximum probability as the final allocation result of power or cache resources; distributing power and/or cache resources for the service according to the distribution result; after executing the service, determining the execution time delay of the service; calculating a reward value according to the execution delay of the service; and adjusting the parameters of the resource allocation model according to the reward value.

The specific implementation method for functions of each module in the resource allocation apparatus of the network access device provided in the embodiment of the present invention may refer to the method of each step in the flows shown in fig. 1 and 2, and is not described herein again.

Fig. 5 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the resource allocation method of the network access device provided in the embodiment of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.

The input/output interface 1030 is used for connecting an input/output module, and can be connected with a nonlinear receiver to receive information from the nonlinear receiver, so as to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

The electronic device may be specifically disposed in a network access device such as an ONU and a base station.

Furthermore, an embodiment of the present invention provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions for causing the computer to execute the resource allocation method of the network access device as described above.

Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.

The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A resource allocation method of a network access device is characterized by comprising the following steps:

2. The method of claim 1, wherein the resource allocation model is pre-trained according to previously collected training data and execution delay of the service; the execution time delay of the service is determined after the service is executed according to the resource allocation result after the resource allocation result output by the resource allocation model according to the training data; the specific training method comprises the following steps:

using the collected resource demand information, power and state information of the cache cluster resource of the service as the training data;

inputting the training data to the resource allocation model;

calculating a reward value according to the execution delay of the service;

3. The method of claim 2, wherein the execution delay of the service is inversely related to the reward value; wherein the content of the first and second substances,

4. The method of claim 3, wherein adjusting the parameters of the resource allocation model according to the reward values comprises:

formula 1

Wherein the content of the first and second substances,

to representtTime of day the resource allocation model

Is determined by the parameters of (a) and (b),

indicating updatedtResource allocation model at +1 moment

Is determined by the parameters of (a) and (b),

is a coefficient constant with a value of 0-1;

is composed oftObjective function value at time:

wherein, in the step (A),

tvalue of the cost function at a time

；

And is

；

Wherein the content of the first and second substances,

representing the resource allocation model fortInput of time of day

An action of outputting the highest probability;

representing the resource allocation model fort-input of time 1

An action of outputting the highest probability;

to represent

The expected value of (c) is,

to represent

The expected value of (d);

According totPrize value for time of day

Calculated by the following formula 2:

formula 2

Wherein the content of the first and second substances,

indicating that the action has been performed

After, follow upnDue to action at one moment

5. The method of claim 1, wherein the obtaining resource requirement information of the current service of the network access device comprises:

the network access equipment receives request information of a plurality of services of one or more mobile terminals;

the network access equipment decodes the received request information of a plurality of services according to the distance from the network access equipment to the mobile terminal, the moving speed of the mobile terminal and the channel condition;

the network access equipment packs the decoded request information of the plurality of services into a plurality of services; each service is composed of a plurality of service units; wherein a service unit of a service comprises: a time step of demand information for the service and a resource unit of demand information.

6. The method of claim 5, wherein inputting the resource demand information and the state information into a trained resource allocation model comprises:

and the network access equipment sequentially inputs the requirement information of the resource units in each service unit, the power of the current time step and the state information of the cache cluster resources into the resource allocation model at each time step.

7. The method according to claim 6, wherein the probability distribution of all possible actions output according to the resource allocation model, and taking the action with the highest probability as the final power or buffer resource allocation result comprises:

and according to the action with the maximum probability in the probability distribution of all possible actions output by the resource allocation model, the action with the maximum probability is taken as the corresponding position of the service unit in the resource slot position image, and the corresponding position is taken as the allocation result of the power and/or cache resource allocation, wherein the resource slot position image comprises the power and the cached resource slot position image.

8. A resource allocation apparatus of a network access device, comprising:

9. An electronic device comprising a central processing unit, a signal processing and storage unit, and a computer program stored on the signal processing and storage unit and executable on the central processing unit, characterized in that the central processing unit implements the method according to any of claims 1-7 when executing the program.

10. A non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions for causing the computer to perform the resource allocation method of any one of claims 1 to 7.