CN117395202A

CN117395202A - DPU resource scheduling method and device for flow processing

Info

Publication number: CN117395202A
Application number: CN202311148992.8A
Authority: CN
Inventors: 郭少勇; 亓峰; 王帅潮; 熊翱; 陈锦前; 邱雪松
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2023-09-06
Filing date: 2023-09-06
Publication date: 2024-01-12

Abstract

The application provides a DPU resource scheduling method and device for flow processing, wherein the method comprises the following steps: acquiring current flow information and environment information of a wireless network; and adopting a D3 QN-based DPU resource scheduling model provided with a priority experience playback pool and a deep neural network, solving a multi-objective constraint optimization function corresponding to the wireless network according to the current environment information and the traffic information, obtaining a DPU resource scheduling decision aiming at the current wireless network, and carrying out DPU resource deployment based on selecting a server node in an infrastructure layer of the wireless network so as to be used for processing the traffic information based on the deployed DPU resources. The method combines the hardware advantages of the DPU chip, can construct an intelligent flow management mode under low delay and high load, can effectively save equipment cost required by resource scheduling, can fully utilize hardware resources to improve the resource utilization rate, and can improve the instantaneity, the dynamics and the expandability of the DPU resource scheduling.

Description

DPU resource scheduling method and device for flow processing

Technical Field

The present disclosure relates to the field of traffic processing technologies, and in particular, to a method and an apparatus for scheduling DPU resources for traffic processing.

Background

With the development of information technology, data centers have become centers for enterprises and financial institutions, etc., to process data and provide services. Stability, continuity and reliability of data centers become key factors in determining economic benefits and social well-being. However, data centers present an operational risk of single point failure. Once infrastructure, electricity, or natural disasters occur, business is often impacted and recovery is difficult in the short term.

Currently, the mainstream flow scheduling method relies on a large number of 7-layer network devices, is high in cost and consumes materials, and is difficult to cope with a major disaster, so that a data center faces serious flow management challenges. With the rapid growth of internet business and mobile services, data center processing traffic is increasing in geometric progression. Meanwhile, the traffic types are increasingly diversified, the requirements on delay and throughput are continuously increased, and the space-time distribution of the traffic is complicated. Conventional traffic management schemes rely on layer 7 devices and link load balancing techniques, which make it difficult for these independently deployed devices to fully perceive traffic information, status and demand changes. And the cost of the scheme is very high, and the expansibility is not strong. The existing flow scheduling algorithm is mainly based on rules or simulation, and has limited effect. It is difficult to fully utilize hardware resources, and parameters cannot be dynamically adjusted and optimized in real time.

Disclosure of Invention

In view of this, embodiments of the present application provide a method and apparatus for scheduling DPU resources for traffic processing, so as to eliminate or improve one or more drawbacks in the prior art.

One aspect of the present application provides a method for scheduling DPU resources for traffic processing, including:

acquiring current flow information and environment information of a wireless network;

and solving a multi-objective constraint optimization function corresponding to the wireless network according to the current environment information and the traffic information of the wireless network by adopting a D3 QN-based DPU resource scheduling model provided with a priority experience playback pool and a deep neural network, obtaining a DPU resource scheduling decision aiming at the current wireless network, and selecting a server node in an infrastructure layer of the wireless network based on the DPU resource scheduling decision to perform DPU resource deployment so as to be used for processing the traffic information based on the deployed DPU resource.

In some embodiments of the present application, before the acquiring the current traffic information and the environment information of the wireless network, the method further includes:

constructing an objective function by taking the total power consumption of processing all traffic consumption of an infrastructure layer of a wireless network and the minimum sum of the total delay of processing all tasks corresponding to the traffic information and the rejected traffic number as targets, wherein the total power consumption, the total delay of all tasks and the rejected traffic number are respectively provided with different weight coefficients;

Constructing a plurality of constraint conditions corresponding to the objective function to form a multi-objective constraint optimization function;

wherein the plurality of constraints include: the total delay of all tasks, the number of rejected traffic, the number of traffic corresponding to the current traffic information, the total amount of DPU resources corresponding to each server node, the DPU resource usage corresponding to each server node, the preset maximum delay, the constraint corresponding to each virtual machine in each server node to the DPU resource amount, the resource scheduling decision and the task deployment decision; the resource scheduling decision is used for specifying a resource scheduling object for the traffic in each server node, and the task deployment decision is used for specifying a deployment object for the traffic in each virtual machine in the server node serving as the resource scheduling object.

In some embodiments of the present application, the D3 QN-based DPU resource scheduling model provided with a preferential empirical playback pool and a deep neural network includes: the system comprises a main network, a target network and the priority experience playback pool, wherein the network structures of the main network and the target network adopt the deep neural network;

The main network is used for generating a DPU resource scheduling decision according to the current environment state to serve as an optimal action corresponding to the current optimal action value, and the main network is decoupled into: a state network and a dominant network with constraints;

the target network is used for calculating the target value of the optimal action;

the priority experience playback pool is used for storing experiences generated in the learning process, including a current state, actions, rewards and a next state; the state space corresponding to the current state and the next state comprises: the total DPU resources, the server resources utilization and the current resource demands corresponding to the flow information corresponding to the server nodes in the wireless network; the action space corresponding to the action comprises the DPU resource scheduling decision; the rewarding value is inversely related to the flow transmission time delay, the power consumption and the task rejection number in the flow transmission process.

In some embodiments of the present application, the acquiring current traffic information and environment information of the wireless network includes:

acquiring each flow to be transmitted in a wireless network from a user and terminal equipment to obtain current flow information;

And acquiring DPU resource quantization result data corresponding to each server node in an infrastructure layer of the wireless network based on a preset resource monitoring service module so as to obtain current environment information.

In some embodiments of the present application, after the deployment-based DPU resource is used to process the traffic information, further comprising:

obtaining DPU resource quantization result data corresponding to each server node in the infrastructure layer;

based on the DPU resource quantization result data, respectively acquiring flow transmission delay and power consumption in the flow transmission process, and acquiring task rejection number corresponding to the flow rejected by the server node;

and calculating a current rewarding value according to the traffic transmission delay, the power consumption and the task acceptance rate, and adding experience containing the rewarding value into the priority experience playback pool.

In some embodiments of the present application, the DPU resource quantization result data includes: DPU communication capability quantization result data and DPU storage capability quantization result data;

correspondingly, DPU resource quantization result data corresponding to each server node are calculated by each server node based on a preset quantization model;

Wherein the quantization model comprises:

a DPU communication capacity quantization function formed by network bandwidth and memory bandwidth of the DPU on the server node;

and a DPU storage capacity quantization function formed by the storage bandwidth of the DPU on the server node and the number of read/write operations per second.

In some embodiments of the present application, the obtaining, based on the DPU resource quantization result data, the traffic transmission delay and the power consumption in the traffic transmission process respectively includes:

calculating the flow transmission time delay of the first flow according to the DPU resource quantization result data of the two flows in the flow information and the data quantity transmitted from the first flow to the other flow;

and determining the resource utilization rate of each server node according to the total DPU resource amount in each server node participating in current traffic transmission in the infrastructure layer and the DPU resource amount occupied on a virtual machine in the server node, determining the dynamic power of each server node according to a preset optimal utilization rate threshold value and the resource utilization rate, and determining the power consumption of each server node based on the static power and the dynamic power of each server node.

Another aspect of the present application provides a DPU resource scheduling apparatus for traffic-oriented processing, including:

the traffic receiving and resource monitoring service module is used for acquiring current traffic information and environment information of the wireless network;

the D3QN deep reinforcement learning agent module is used for solving a multi-objective constraint optimization function corresponding to the wireless network according to the current environment information and the traffic information of the wireless network by adopting a D3 QN-based DPU resource scheduling model provided with a priority experience playback pool and a deep neural network, obtaining a DPU resource scheduling decision aiming at the current wireless network, and selecting a server node in an infrastructure layer of the wireless network based on the DPU resource scheduling decision to perform DPU resource deployment so as to be used for processing the traffic information based on the deployed DPU resources.

In a third aspect, the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the flow processing oriented DPU resource scheduling method when executing the computer program.

A fourth aspect of the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the flow processing oriented DPU resource scheduling method.

The DPU resource scheduling method for flow processing obtains current flow information and environment information of a wireless network; the method comprises the steps of adopting a D3 QN-based DPU resource scheduling model provided with a priority experience playback pool and a deep neural network, solving a multi-objective constraint optimization function corresponding to the wireless network according to current environment information and flow information of the wireless network, obtaining a DPU resource scheduling decision aiming at the current wireless network, selecting a server node in an infrastructure layer of the wireless network based on the DPU resource scheduling decision to conduct DPU resource deployment, processing the flow information based on the deployed DPU resource, solving the problem of joint optimization of resource scheduling under the multi-constraint condition by introducing priority experience playback in the D3QN, determining a strategy according to the flow information and the DPU resource in a server, introducing priority experience playback, enabling samples with larger time sequence difference errors (TD-error) to be endowed with higher priority, enabling the samples with higher sampling probability, and ensuring that the algorithm considers more samples with higher value and difficult learning. Compared with the traditional uniform random sampling, the preferential experience playback not only changes the parameter updating mode, but also changes the distribution of sampling data. That is, the method combines the hardware advantages of the DPU chip, can construct an intelligent flow management mode under low delay and high load, can effectively save equipment cost required by resource scheduling, can fully utilize hardware resources to improve the resource utilization rate, and can improve the instantaneity, the dynamics and the expandability of the DPU resource scheduling.

Additional advantages, objects, and features of the application will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.

It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present application are not limited to the above-detailed description, and that the above and other objects that can be achieved with the present application will be more clearly understood from the following detailed description.

Drawings

The accompanying drawings are included to provide a further understanding of the application, and are incorporated in and constitute a part of this application. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the application. Corresponding parts in the drawings may be exaggerated, i.e. made larger relative to other parts in an exemplary device actually manufactured according to the present application, for convenience in showing and describing some parts of the present application. In the drawings:

fig. 1 is a schematic flow chart of a first flow chart of a DPU resource scheduling method for flow-oriented processing in an embodiment of the present application.

Fig. 2 is a schematic architecture diagram of a DPU resource scheduling system for flow-oriented processing in an embodiment of the present application.

Fig. 3 is a second flowchart of a DPU resource scheduling method for flow-oriented processing in an embodiment of the present application.

Fig. 4 is a schematic diagram illustrating an implementation procedure of the D3QN-CN algorithm in an example of the present application.

Fig. 5 is a third flowchart of a DPU resource scheduling method for flow-oriented processing in an embodiment of the present application.

Fig. 6 is a schematic structural diagram of a DPU resource scheduling apparatus for flow-oriented processing in an embodiment of the present application.

Fig. 7 is an exemplary schematic diagram of a scenario in which server nodes cooperate to process traffic in a wireless network according to an application example of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail with reference to the embodiments and the accompanying drawings. The exemplary embodiments of the present application and their descriptions are used herein to explain the present application, but are not intended to be limiting of the present application.

It should be noted here that, in order to avoid obscuring the present application due to unnecessary details, only structures and/or processing steps closely related to the solution according to the present application are shown in the drawings, while other details not greatly related to the present application are omitted.

It should be emphasized that the term "comprises/comprising" when used herein is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.

It is also noted herein that the term "coupled" may refer to not only a direct connection, but also an indirect connection in which an intermediate is present, unless otherwise specified.

Hereinafter, embodiments of the present application will be described with reference to the drawings. In the drawings, the same reference numerals represent the same or similar components, or the same or similar steps.

In an existing flow scheduling mode, a load balancing method is designed and applied to a load balancer. The method includes the steps that received traffic is shunted, and a load balancing strategy is configured, wherein the load balancing strategy is a mode for distributing traffic load balancing to all nodes; and forwarding the split traffic to public cloud and/or private cloud according to the load balancing strategy. And then, the flow scheduling strategy is reconfigured according to the recorded flow types, and the load balancing strategy is adjusted, so that the resource utilization rate is improved, and the user cost is reduced. The load balancing strategy is a mode for distributing traffic load balancing to each node; forwarding the split flow to public cloud and/or private cloud according to a load balancing strategy; acquiring and updating various resource information of traffic in public cloud and private cloud, analyzing to obtain the type of the traffic and making records, wherein the resource information comprises a cloud type, a memory, CPU information, IP, a disk and bandwidth; traffic scheduling policies are ways to forward different types of traffic combinations. However, the policy is adjusted only according to the traffic type, and node resource status and link status are not fully considered, which may make efficient use of resources difficult. This approach does not mention how to handle dynamically changing traffic patterns, and it can be difficult to efficiently cope with traffic peaks.

In another existing flow scheduling mode, when a flow request sent by a user terminal of a target user is received, the method acquires user data and server data of the target user; acquiring a user flow portrait corresponding to the user according to the user data, and acquiring a server capacity portrait of each of a plurality of servers; inputting the user data, the server data, the user flow portraits and a plurality of server capacity portraits into a first flow matching model to obtain a target server corresponding to the target user; and dispatching the flow request to the target server. Aims at optimizing the allocation of network traffic resources. However, depending on the accuracy of the user data and the server data, if there is a deviation in these data, the distribution effect may be affected. Meanwhile, the method does not consider real-time performance, and the requirement of low time delay can be difficult to reach.

In a third existing flow scheduling mode, a time sensitive network flow scheduling method based on a superior and inferior mechanism is provided, flow information and network information of all time sensitive network applications initiating transmission requests are collected in a centralized mode, coding and population initialization of the superior and inferior mechanism are carried out by combining time sensitive network transmission problem characteristics, each gene of the superior and inferior mechanism corresponds to the current node position of each time sensitive task flow, each individual calculates an adaptability value describing individual quality according to scheduling constraint requirements, the adaptability value is embodied as end-to-end time delay, and parameters and weights can be adjusted in a self-adaptive mode under different network environments so as to meet scheduling requirements of mobile scenes such as the Internet of things and the vehicle-mounted network. Taking the minimum end-to-end time delay as an optimization target, combining time sensitive network flow scheduling constraint and routing, adopting a cross variation method of route load optimization to perform near-optimal solution search, and replacing a low-adaptability population in the population according to a fitness value; decoding the near-optimal solution after iterative search, and outputting the decoded near-optimal solution in a gating list form. However, the method has limitation on the effect of searching and optimizing through a near-optimal solution, and is difficult to achieve real optimization. Meanwhile, the effect of a scheduling algorithm is improved by not fully utilizing an artificial intelligence technology, and unnecessary repeated searching can exist.

As IDC data shows, the global data volume has grown in the past 10 years at a rate of approximately 50% of the annual average composite growth rate, and therefore a dedicated computing chip, the Data Processing Unit (DPU), is required to provide a computational power growth more rapidly. The DPU is a data-driven special purpose processor that supports resource virtualization and application traffic recognition/scheduling. DPU technology can realize low-cost flow management, but the current DPU resource scheduling algorithm for flow is still to be developed. Therefore, the research on a novel flow scheduling method realized based on the DPU chip has important significance. The DPU resource scheduling algorithm for flow processing is expected to realize low-cost effective flow identification and edge calculation and support dynamic resource optimization based on deep reinforcement learning, and a more intelligent flow network and resource management scheme is constructed through a DPU technology. In order to solve the above difficulties, the present application needs to perform overall scheduling for DPU resources occupied by traffic processing.

Therefore, in order to solve the above problems, the DPU chip provides a new idea. Based on the prior experience playback and the deep reinforcement learning technology which can be realized by the DPU chip, the flow-oriented dynamic resource scheduling algorithm is hopeful to be constructed. By combining the hardware advantages of the DPU chip, the intelligent flow management scheme with low delay and high load can be constructed. The application aims to overcome the defects in the prior art and provides a DP U resource scheduling method facing flow processing based on deep reinforcement learning.

In order to construct a flow intelligent management mode under low delay and high load, embodiments of the present application provide a flow processing-oriented DPU resource scheduling method, a flow processing-oriented DPU resource scheduling device for executing the flow processing-oriented DPU resource scheduling method, an entity device, and a computer readable storage medium, respectively.

The following examples are provided to illustrate the invention in more detail.

Based on this, the embodiment of the present application provides a flow processing-oriented DPU resource scheduling method that may be implemented by a flow processing-oriented DPU resource scheduling device, referring to fig. 1, where the flow processing-oriented DPU resource scheduling method specifically includes the following contents:

step 100: and acquiring current flow information and environment information of the wireless network.

In one or more embodiments of the present application, the traffic information refers to each traffic currently to be transmitted in the wireless network, and the data sources of these traffic are from the user and the terminal device, etc. It is understood that the user refers to a device such as a mobile terminal held by the user and having a function of transmitting data via a wireless link.

In one or more embodiments of the present application, the environmental information includes DPU resource quantization result data corresponding to each server node in an infrastructure layer of the wireless network, where the DPU resource quantization result data may include DPU communication capability quantization result data and DPU storage capability quantization result data.

Correspondingly, the DPU resource scheduling device facing the flow processing can be specifically realized by a D3QN Agent and a resource monitoring service. The D3QN Agent and the resource monitoring service in the resource scheduling layer may be used as the resource scheduling layer in the DPU resource scheduling system facing the flow processing, see fig. 2, where the resource monitoring service is configured to obtain environmental information of the wireless network, that is, receive, in real time, DPU resource quantization result data corresponding to each server node sent by each server node in the infrastructure layer of the wireless network. The infrastructure layer is controlled by the resource scheduling layer. The resource scheduling layer consists of a D3QN Agent and a resource monitoring service. The resource scheduling layer acquires the flow from the user and the terminal equipment, then acquires the resource information from the resource monitoring service, and the D3QN Agent makes a scheduling decision. The DPU requirements of the traffic and the expected completion time or expiration date both affect the decision of D3 QNAgent.

Step 200: and solving a multi-objective constraint optimization function corresponding to the wireless network according to the current environment information and the traffic information of the wireless network by adopting a D3 QN-based DPU resource scheduling model provided with a priority experience playback pool and a deep neural network, obtaining a DPU resource scheduling decision aiming at the current wireless network, and selecting a server node in an infrastructure layer of the wireless network based on the DPU resource scheduling decision to perform DPU resource deployment so as to be used for processing the traffic information based on the deployed DPU resource.

In step 200, preferential empirical playback is introduced in the D3QN to solve the joint optimization problem of resource scheduling under multiple constraints. DQ3N Agent may determine the policy based on the traffic information and DPU resources within the server. The method may further comprise a network structure consisting of a main function and an objective function, and the neural network may apply a robust network (reducing networks) and add a residual structure, unlike the deep Q network using the full connection layer. Priority experience playback is also introduced, samples with larger time sequence difference errors (TD-error) are given higher priority, so that the samples have higher sampling probability, and the algorithm is ensured to consider more samples with higher value and difficult learning. Compared with the traditional uniform random sampling, the preferential experience playback not only changes the parameter updating mode, but also changes the distribution of sampling data.

In one or more embodiments of the present application, the DPU resource scheduling decision at least includes path selection between a source node (i.e., a user or a terminal) and a destination (i.e., a destination terminal) of each flow corresponding to the flow information, server node selection for subtask deployment, and resource scheduling of the server node. The tasks mentioned in the embodiments of the present application refer to traffic, and the subtasks refer to sub-traffic, and if the DPU performance or resources of one or some server nodes are insufficient, the traffic may be divided into sub-traffic to be distributed to a plurality of server nodes for processing.

As can be seen from the above description, the DPU resource scheduling method for flow processing provided in the embodiments of the present application combines the hardware advantages of the DPU chip, so that an intelligent flow management mode with low delay and high load can be constructed, the equipment cost required by resource scheduling can be effectively saved, the hardware resources can be fully utilized to improve the resource utilization rate, and the instantaneity, the dynamics and the expandability of DPU resource scheduling can be improved.

In order to further improve the effectiveness and accuracy of the flow processing-oriented DPU resource scheduling, in the flow processing-oriented DPU resource scheduling method provided in the embodiment of the present application, referring to fig. 3, the flow processing-oriented DPU resource scheduling method specifically includes the following contents:

step 010: and constructing an objective function by taking the total power consumption of processing all traffic consumption of an infrastructure layer of the wireless network and the minimum sum of the total delay of processing all tasks corresponding to the traffic information and the rejected traffic number as targets, wherein the total power consumption, the total delay of all tasks and the rejected traffic number are respectively provided with different weight coefficients.

Specifically, the objective function is as follows:

min a*Time+b*Energy+c*rej

the Energy is the total power consumed by processing all flows and is calculated by the static power and the dynamic power of all servers; time is the total delay to process all tasks; n (N) _rej Is the number of rejected traffic, num is the total number of traffic; a, b and c are weight coefficients.

Step 020: and constructing a plurality of constraint conditions corresponding to the objective function to form a multi-objective constraint optimization function.

That is, the D3QN Agent can make a reasonable decision according to the current environmental state, and deploy the traffic to the most suitable server for processing. The optimization target is to reduce transmission delay to the greatest extent and reduce power consumption on the premise of meeting the service quality of users. The optimization problem may be formulated as a multi-objective constraint optimization problem.

Specifically, a number of the constraints of equation (one) are as follows:

(9)：D _r,i ∈{s _i |s _i ∈S}

wherein the wireless network comprises N users u= { U ₁ ，u ₂ ，u ₃ ，…，u _N And M server nodes, S refers to a set of server nodes in the wireless network, the set of server nodes s= { S ₁ ，s ₂ ，s ₃ ，…，s _M -wherein s _i Refers to the ith server node in the server node set, and i is less than or equal to M. Each user U e U holds a flow t to be processed _i . User u may communicate traffic t over a wireless link _i Uploading the traffic to a network, and transmitting the calculation result to a target terminal after a plurality of server nodes cooperatively process all traffic of a certain user. Flow t _i Representing the DPU resource minimum threshold asFlow completion time->Not exceeding maximum delay->When the idle resources of the server are insufficient or the delay is too large, t is refused _i . For flow t _i Defining a resource scheduling decision D _r,i I.e. which server's resources are scheduled. Defining task deployment decision D _t,i I.e., to which virtual machine the traffic is deployed.

D _r,i ∈{s _i |s _i ∈S}

Each server node has several virtual machines vm, each traffic can run on one vm _D The amount of DPU resources allocated to the virtual machine.

|T _u The I is the flow quantity corresponding to the flow information at present;is the total amount of resources of each DPU of M servers S; Is the resource usage of each DPU of the M servers S.

Some additional assumptions about the system model described above are provided in this application: 1) Assuming that no malicious server would give an erroneous result; 2) The present application assumes that there is no contention for resources that would cause interruption of the traffic being transmitted; 3) The assumption is that the server is stable during each scheduling period, meaning that no server joins or leaves the system during that period. The multi-objective optimization in this scenario is an NP-HARD problem. In a network, the environment parameters are dynamically variable, and as the number of user terminals and servers increases, the state information and decision space of the environment will exponentially expand.

In order to further improve the effectiveness and training stability of the DPU resource scheduling decision of the DPU resource scheduling oriented to the flow processing, in the embodiment of the DPU resource scheduling method oriented to the flow processing provided in the application, the DPU resource scheduling model based on D3QN provided with the priority experience playback pool and the deep neural network includes: the system comprises a main network, a target network and the priority experience playback pool, wherein the network structures of the main network and the target network adopt the deep neural network;

D3QN is based on the idea of introducing DDQN on the basis of the lasting DQN, the main network is utilized to acquire the action corresponding to the optimal action value under the state, and then the target network is utilized to calculate the action value of the action, so that the target value is obtained, the problem of overestimation of the lasting DQN is solved, and the training stability is improved. The D3QN splits the action value function into the state value function and the advantage function, so that the relative importance of different actions can be better modeled and the change of the state value and the advantage can be processed compared with the DDQN. Thus, the present application makes real-time decisions on the scheduling of DPU resources for processing traffic under the network.

Specifically, referring to fig. 4, the D3QN-CN algorithm includes three modules: a primary network, a target network, and a pool of priority experience playback. The main network is based on the current environmental state s _t A resource allocation decision is generated. The primary network is decoupled again into a state network V (s _t ；θ _V ) And dominant network a (s _t ，a _t ；θ _A ) Two parts. To improve the efficiency of data use, the present applicationGive A(s) _t ，a _t ；θ _A ) A constraint is imposed that the network tends to use V (s _t ；θ _V ) To solve the problem. In FIG. 4, θ _V Representing neural network weights, θ in a state network _A Representing neural network weights in the dominant network, weight representing experience weights in the preferential experience playback; qtarget(s) _t ；a _t ) Representing a predicted value of the target network; VM0 through VM4 represent different virtual machines, node0 through node3 represent different server nodes, and 0 through 4 represent different traffic, respectively.

Wherein Q(s) _t ，a _t ) Representing the predicted value of the main network, |A|, is the sum of the dominant values of the entire motion space, with the aim of adding A (s _t ，a _t ；θ _A ) Normalization processing is performed.

And the target network is used to calculate the target value of the optimal action, the network having the same structure as the main network but different parameters. The state network within the target network is replaced by V'(s) _t ；θ′ _V ) And A'(s) _t ，a _t ；θ′ _A ). The priority experience playback pool is used for storing experiences generated in the D3QN Agent learning process and comprises a current state s _t Action a _t Prize value reward and next state s _t+1 . When the experience playback pool is full, the old experience is overridden with the new experience.

The state space mainly comprises the total amount of resources of each DPU of M servers S in the networkResource usage amount->Server resource utilization->And resource requirement of the current flow +.>The State space State is defined as:

the action space mainly comprises a resource scheduling decision D _r . In the context of the present application, the head-of-queue task dequeues if the ready queue is not empty. At this time, the Agent needs to schedule the resources of one server and select one virtual machine on the node to deploy the current task. The action space is defined as:

Action＝{a _t |a _t ＝<D _r ，D _t >}

the scheduling algorithm of the present application aims to optimize the latency, power consumption and number of task rejections, while the ED3QN-CN algorithm aims to maximize rewards after performing actions. Thus, rewards are inversely related to latency, power consumption, and number of task rejections.

R _T ＝-T _t /10.0

Wherein R is _T For rewarding value in time delay, T _t Is the transmission delay of the task. R is R _E For the prize value in terms of power consumption, the server_num is the number of servers, pwr _S And Pwr of _D The static power and the dynamic power of the server s, respectively.

Based on the method, a DPU resource scheduling model based on D3QN with a priority experience playback pool and a deep neural network is adopted, a multi-objective constraint optimization function corresponding to the wireless network is solved according to the current environment information and the flow information of the wireless network, and a concrete algorithm example process for the current DPU resource scheduling decision of the wireless network is shown in a table 1.

TABLE 1

In order to further improve the instantaneity and effectiveness of acquiring the current traffic information and the environment information of the wireless network, in the embodiment of the flow processing-oriented DPU resource scheduling method provided in the present application, referring to fig. 3, step 100 in the flow processing-oriented DPU resource scheduling method specifically includes the following contents:

step 110: acquiring each flow to be transmitted in a wireless network from a user and terminal equipment to obtain current flow information;

and, step 120: and acquiring DPU resource quantization result data corresponding to each server node in an infrastructure layer of the wireless network based on a preset resource monitoring service module so as to obtain current environment information.

Correspondingly, in order to further improve the effectiveness and reliability of the scheduling decision result generated by the D3 QN-based DPU resource scheduling model provided with the priority experience playback pool and the deep neural network, in the embodiment of the flow processing oriented DPU resource scheduling method provided in the present application, referring to fig. 3, step 200 in the flow processing oriented DPU resource scheduling method further specifically includes the following:

step 300: DPU resource quantization result data corresponding to each server node in the infrastructure layer based on a preset resource monitoring service module;

Step 400: based on the DPU resource quantization result data, respectively acquiring flow transmission delay and power consumption in the flow transmission process, and acquiring task rejection number corresponding to the flow rejected by the server node;

step 500: and calculating a current rewarding value according to the traffic transmission delay, the power consumption and the task acceptance rate, and adding experience containing the rewarding value into the priority experience playback pool.

In order to effectively improve the validity of the DPU resource quantization result data corresponding to each server node in the infrastructure layer in the steps 120 and 300, in the embodiment of the flow processing oriented DPU resource scheduling method provided in the present application, the DPU resource quantization result data includes: DPU communication capability quantization result data and DPU storage capability quantization result data;

wherein the quantization model comprises:

Specifically, in a wireless network, the goal of quantization is to associate and integrate resources, and to implement unified collaborative management of resources, so as to flexibly schedule DPU resources to meet the needs of specific traffic, and to efficiently utilize ubiquitous resources. In order to describe the DPU within the server node, this application considers three aspects of communication, memory and storage, and a quantization model is designed. The model aims to comprehensively consider performance indexes of the aspects, so that quantitative comparison is carried out on DPUs of different models.

The application makes a decision D on resource scheduling according to network bandwidth and memory bandwidth _r，i Selected server s _i Communication capability of upper DPUModeling was performed, denoted b _net 、b _cap Units are kbit/s (kilobits transmitted per second). The DPU communication capability quantization function is as follows:

wherein,refers to the communication capability, i.e. the DPU communication capability, is quantized result data.

The application also uses the memory bandwidth and the number of read/write operations per second (Input/Output Operations Per Second, IOPS) to make a resource scheduling decision D _r，i Selected server s _i Storage capability of upper DPU Description is given, respectively denoted b _st o, iops. The DPU storage capacity quantization function is as follows:

wherein, among them,refers to the storage capacity, i.e. the DPU storage capacity, quantifies the resulting data.

w _i I.e {0,1,2,3} is a trainable hyper-parameter. By means of this model, different models of DPUs can be mapped to a unified dimension.

Based on the quantization model, in order to further improve accuracy and effectiveness of acquiring flow transmission delay and power consumption in a flow transmission process, in an embodiment of the flow processing-oriented DPU resource scheduling method provided in the present application, referring to fig. 5, step 400 in the flow processing-oriented DPU resource scheduling method specifically includes the following contents:

step 410: and calculating the flow transmission time delay of the first flow according to the DPU resource quantization result data of the two flows in the flow information and the data quantity transmitted from the first flow to the other flow.

Specifically, the resource scheduling decisions D are obtained by using a quantization model _r，i And D _r，j Communication capabilities of DPUs on selected serversAnd memory capacity->In kbit/s (kilobits transmitted per second). The present application sets the flow t _i To t _j The amount of data transferred is denoted- >Flow t _i And t _j Time of transmission T between _t Can be expressed as:

thus, the flow rate t is obtained _i Is a transmission delay T of (1) _t 。

Step 420: and respectively determining the resource utilization rate of each server node according to the total amount of DPU resources in each server node participating in current traffic transmission in the infrastructure layer and the occupied DPU resources on a virtual machine in the server node, respectively determining the dynamic power of each server node according to a preset optimal utilization rate threshold value and the resource utilization rate, and respectively determining the power consumption corresponding to each server node based on the static power and the dynamic power corresponding to each server node.

Specifically, the power consumption of the server is not linear, the optimal energy utilization rate is about 70%, and the power consumption is obviously improved after exceeding the optimal energy utilization rate. In order to minimize the power consumption of the server, the application is expected to stabilize the utilization rate of the server resources near the optimal value, and simultaneously reduce the number of started servers on the premise of ensuring the real-time performance of the task. Thus, the present application builds a power consumption model and estimates the current power from the server's resource allocation status.

The power consumption of the server is controlled by the static power Pwr _s And dynamic power Pwr _d Two parts. Wherein, the static power is fixed in the node operation state, and the dynamic power changes along with the change of the resource utilization rate.

The parameter α=0.5, β=10, the optimal utilization U ^* =0.7. The total power of the node s is Pwr _s +Pwr _d . U is the resource utilization of the server, wherein C _D Respectively represent the total amount of DPU resources in the server, and R _D Representing the amount of resources occupied on the virtual machine.

And, step 430: and acquiring the task rejection number corresponding to the traffic rejected by the server node.

From the software aspect, the present application further provides a DPU resource scheduling device for performing flow processing in all or part of the flow processing-oriented DPU resource scheduling method, referring to fig. 6, where the flow processing-oriented DPU resource scheduling device specifically includes the following contents:

the traffic acceptance and resource monitoring service module 10 is configured to obtain current traffic information and environment information of the wireless network.

The D3QN deep reinforcement learning agent module 20 is configured to solve a multi-objective constraint optimization function corresponding to the wireless network according to current environment information and traffic information of the wireless network by using a D3 QN-based DPU resource scheduling model provided with a priority experience playback pool and a deep neural network, obtain a DPU resource scheduling decision for the current wireless network, and select a server node in an infrastructure layer of the wireless network based on the DPU resource scheduling decision to perform DPU resource deployment, so as to use the deployed DPU resource to process the traffic information.

The D3QN deep reinforcement learning Agent module may be abbreviated as D3QN Agent.

The embodiment of the DPU resource scheduling device for flow processing provided in the present application may be specifically used to execute the processing flow of the embodiment of the DPU resource scheduling method for flow processing in the foregoing embodiment, and the functions thereof are not described herein in detail, and may refer to the detailed description of the embodiment of the DPU resource scheduling method for flow processing.

The part of the DPU resource scheduling device facing the flow processing for carrying out the DPU resource scheduling facing the flow processing can be realized in a server or can be completed in client equipment. Specifically, the selection may be made according to the processing capability of the client device, and restrictions of the use scenario of the user. The present application is not limited in this regard. If all operations are completed in the client device, the client device may further include a processor for specific processing of DPU resource scheduling for traffic-oriented processing.

The client device may have a communication module (i.e. a communication unit) and may be connected to a remote server in a communication manner, so as to implement data transmission with the server. The server may include a server on the side of the task scheduling center, and in other implementations may include a server of an intermediate platform, such as a server of a third party server platform having a communication link with the task scheduling center server. The server may include a single computer device, a server cluster formed by a plurality of servers, or a server structure of a distributed device.

Any suitable network protocol may be used for communication between the server and the client device, including those not yet developed at the filing date of this application. The network protocols may include, for example, TCP/IP protocol, UDP/IP protocol, HTTP protocol, HTTPS protocol, etc. Of course, the network protocol may also include, for example, RPC protocol (Remote Procedure Call Protocol ), REST protocol (Representational State Transfer, representational state transfer protocol), etc. used above the above-described protocol.

As can be seen from the above description, the DPU resource scheduling device for flow processing provided in the embodiments of the present application can combine the hardware advantages of the DPU chip, can construct an intelligent flow management mode under low delay and high load, can effectively save the equipment cost required by resource scheduling, can fully utilize hardware resources to improve the resource utilization rate, and can improve the real-time performance, the dynamic performance and the expandability of DPU resource scheduling.

In order to further explain the above flow processing-oriented DPU resource scheduling method, the present application further provides a flow processing-oriented DPU resource scheduling method implemented by using a flow processing-oriented DPU resource scheduling system, referring to fig. 7, where the flow processing-oriented DPU resource scheduling system includes: the method comprises the steps of D3QN Agent for executing the DPU resource scheduling method facing the flow processing, an infrastructure layer containing each server node provided with the DPU resource, terminal equipment with the flow as a data source end of the flow and terminal equipment for receiving the flow as a destination terminal of the flow. Wherein 0 to 6 represent respectively different flow rates to be treated.

Fig. 7 illustrates a scenario in which server nodes in a wireless network cooperatively process traffic, where the traffic is generated on a terminal device (e.g., a camera, a mobile phone, or a drone), and is transmitted to the terminal device receiving the result after being cooperatively analyzed and processed by a plurality of servers. After the D3QN Agent perceives the current environment and flow information, decision is made about path selection between a data source and a destination, node selection for subtask deployment and resource scheduling of a server. After subtasks are deployed, the delay models can respectively calculate transmission delay, and the power consumption models can calculate power in the flow transmission process. The present application also implements a quantization model on a server. The quantization model quantizes the DPU resources. Then, the reward function calculates a reward value reward according to the processing time delay, the power consumption and the task acceptance rate, and the D3QN Agent adds the learned experience to the experience playback zone. And finally, sampling by using the prior experience playback by the D3QN Agent, updating the parameters of the deep neural network, and generating a new strategy.

In summary, the DPU resource scheduling method for flow processing provided by the application example of the present application has the following steps

The beneficial effects are that:

(A) The DPU resource scheduling method based on the D3QN is provided for solving the problem of multi-objective constraint optimization, improving the resource utilization rate and reducing the time delay, the power consumption and the flow rejection number of a strategy.

(B) Considering the isomerism of the server, a quantization model is designed to describe the calculation, storage and communication capacities of DPUs of different models, realize the measurement of DPU resources and enhance the robustness of the algorithm.

(C) Resource monitoring services are designed to dynamically perceive traffic and resource status. And defining a reward function for evaluating processing delay, power consumption and flow acceptance rate, and driving the D3QN Agent to perform strategy learning.

The embodiment of the application further provides an electronic device, which may include a processor, a memory, a receiver, and a transmitter, where the processor is configured to perform the DPU resource scheduling method for flow processing according to the foregoing embodiment, and the processor and the memory may be connected by a bus or other manners, for example, by a bus connection. The receiver may be connected to the processor, memory, by wire or wirelessly.

The processor may be a central processing unit (Central Processing Unit, CPU). The processor may also be any other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof.

The memory, as a non-transitory computer readable storage medium, may be used to store a non-transitory software program, a non-transitory computer executable program, and a module, such as program instructions/modules corresponding to the DPU resource scheduling method for flow processing in the embodiments of the present application. The processor executes various functional applications and data processing of the processor by running non-transitory software programs, instructions and modules stored in the memory, that is, the DPU resource scheduling method for flow processing in the above-described method embodiment is implemented.

The memory may include a memory program area and a memory data area, wherein the memory program area may store an operating system, at least one application program required for a function; the storage data area may store data created by the processor, etc. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory may optionally include memory located remotely from the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory that, when executed by the processor, perform the DPU resource scheduling method for flow-oriented processing in an embodiment.

In some embodiments of the present application, the user equipment may include a processor, a memory, and a transceiver unit, where the transceiver unit may include a receiver and a transmitter, and the processor, the memory, the receiver, and the transmitter may be connected by a bus system, the memory storing computer instructions, and the processor executing the computer instructions stored in the memory to control the transceiver unit to transmit and receive signals.

As an implementation manner, the functions of the receiver and the transmitter in the present application may be considered to be implemented by a transceiver circuit or a dedicated chip for transceiver, and the processor may be considered to be implemented by a dedicated processing chip, a processing circuit or a general-purpose chip.

As another implementation manner, a manner of using a general-purpose computer may be considered to implement the server provided in the embodiments of the present application. I.e. program code for implementing the functions of the processor, the receiver and the transmitter are stored in the memory, and the general purpose processor implements the functions of the processor, the receiver and the transmitter by executing the code in the memory.

The embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the aforementioned flow processing oriented DPU resource scheduling method. The computer readable storage medium may be a tangible storage medium such as Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, floppy disks, hard disk, a removable memory disk, a CD-ROM, or any other form of storage medium known in the art.

Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein can be implemented as hardware, software, or a combination of both. The particular implementation is hardware or software dependent on the specific application of the solution and the design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave.

It should be clear that the present application is not limited to the particular arrangements and processes described above and illustrated in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions, or change the order between steps, after appreciating the spirit of the present application.

The features illustrated and/or described in the context of one embodiment may be used in combination with, or instead of, the features of other embodiments in the same way or in a similar way in one or more other embodiments

The foregoing description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and variations may be made to the embodiment of the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims

1. The DPU resource scheduling method for the flow processing is characterized by comprising the following steps of:

2. The flow processing oriented DPU resource scheduling method of claim 1, further comprising, prior to said obtaining current flow information and environment information of a wireless network:

3. The DPU resource scheduling method for traffic processing according to claim 1, wherein the D3 QN-based DPU resource scheduling model provided with a priority experience playback pool and a deep neural network comprises: the system comprises a main network, a target network and the priority experience playback pool, wherein the network structures of the main network and the target network adopt the deep neural network;

4. The flow processing oriented DPU resource scheduling method of claim 1, wherein the obtaining current flow information and environment information of the wireless network comprises:

5. The flow processing oriented DPU resource scheduling method of claim 1, further comprising, after the deployment-based DPU resources are used to process the flow information:

6. The DPU resource scheduling method for traffic processing according to claim 4 or 5, wherein the DPU resource quantization result data comprises: DPU communication capability quantization result data and DPU storage capability quantization result data;

wherein the quantization model comprises:

7. The DPU resource scheduling method for traffic processing of claim 5, wherein the obtaining the traffic transmission delay and the power consumption in the traffic transmission process based on the DPU resource quantization result data respectively comprises:

8. A DPU resource scheduling apparatus for traffic processing, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the flow processing oriented DPU resource scheduling method of any one of claims 1 to 7 when executing the computer program.

10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a flow processing oriented DPU resource scheduling method as claimed in any one of claims 1 to 7.