CN117255126A

CN117255126A - Data-intensive task edge service combination method based on multi-objective reinforcement learning

Info

Publication number: CN117255126A
Application number: CN202311038793.1A
Authority: CN
Inventors: 程良伦; 黄诗卿; 王涛
Original assignee: Guangdong Nengge Knowledge Technology Co ltd; Guangdong University of Technology
Current assignee: Guangdong Nengge Knowledge Technology Co ltd; Guangdong University of Technology
Priority date: 2023-08-16
Filing date: 2023-08-16
Publication date: 2023-12-19

Abstract

The application discloses a data-intensive task edge service combination method based on multi-objective reinforcement learning, which comprises the steps of receiving a service request, acquiring task request data, decomposing the task request data, acquiring a plurality of subtask data, constructing a dependency graph of a task execution flow, determining an optimal service scheduling strategy by combining a service combination optimization algorithm, and performing service scheduling to execute a data-intensive task. According to the method and the system, the advantages of edge calculation are fully utilized by a service combination optimization algorithm, the calculation resources and the network resources of the edge nodes are reasonably combined and cooperated, limited resources are effectively utilized, resource waste is avoided, the condition that resources among the edge nodes occupy imbalance is reduced, the efficiency and the performance of task processing are optimized while the node load is balanced, real-time efficient processing of data-intensive tasks is achieved, and the production efficiency and the resource utilization efficiency are improved. The method and the device are widely applied to the technical field of data processing.

Description

Data-intensive task edge service combination method based on multi-objective reinforcement learning

Technical Field

The application relates to the technical field of data processing, in particular to a data-intensive task edge service combination method based on multi-objective reinforcement learning.

Background

In the last decades, urban and industrial internet environments have undergone rapid development and revolution. With the popularity of internet of things (IoT) technology, sensors, devices, and systems generate large amounts of data, covering a variety of areas from urban infrastructure to industrial production. These data are widely used for real-time monitoring, prediction, optimization, decision making, and the like.

However, in the face of massive real-time data processing tasks, there may be some limitations to conventional cloud computing modes. Because the data may face problems such as network delay, bandwidth limitation, potential safety hazards and the like in the transmission process, it may be impractical to transmit all the data to the cloud for centralized processing. Particularly, for tasks with high real-time requirements, such as intelligent traffic management, intelligent factory automation, real-time environment monitoring and the like, the performance of the traditional cloud computing may be reduced due to the delay of the traditional cloud computing, and the demands of the data-intensive tasks on computing, storage and network resources are high, so that the traditional cloud computing cannot meet the demands of efficient processing.

Noun interpretation:

data intensive tasks: the computer program task containing a large amount of data analysis processing operation has the characteristics of large data volume, high data complexity and high data change speed.

Disclosure of Invention

In order to solve at least one technical problem in the related art, the embodiment of the application provides a data-intensive task edge service combination method based on multi-objective reinforcement learning, which aims to efficiently process data-intensive tasks in a distributed edge service component, balance loads of edge nodes and achieve higher overall performance.

The data-intensive task edge service combination method based on multi-objective reinforcement learning provided by the embodiment of the application comprises the following steps:

receiving a service request and acquiring task request data; the task request data is relevant task data corresponding to the data-intensive task sending out the service request;

decomposing the task request data to obtain a plurality of subtask data;

constructing a dependency graph of a task execution flow according to the plurality of subtask data;

determining an optimal service scheduling strategy according to a preset service combination optimization algorithm and the dependency graph;

and carrying out service scheduling according to the optimal service scheduling strategy so as to execute the data-intensive task.

In some embodiments, the subtask data includes subtask category data, subtask number data, and subtask execution data.

In some embodiments, the step of constructing a dependency graph of the task execution flow according to the plurality of subtask data specifically includes:

initializing the dependency graph;

acquiring task dependency relations among all the subtasks according to the plurality of subtask data;

and constructing the dependency graph according to the task dependency.

In some embodiments, the step of determining the optimal service scheduling policy according to a preset service combination optimization algorithm and the dependency graph specifically includes:

constructing a multi-target Markov decision process MDP model, and training the MDP model in a deep reinforcement learning mode;

and traversing the dependency graph, and selecting services and nodes through the MDP model to obtain the optimal service scheduling strategy.

In some embodiments, the dependency graph is specifically represented by the following formula:

G _de ＝(N，V，W)

N＝RS

V＝{v _ij ，v _jk ...，}

wherein G is _de Representing a dependency graph, wherein RS is a task, N is a node set, each node in the node set is a subtask in the task RS, V is an edge set, V _ij Representing subtask SF _i And subtask SF _j There is a sequence relation between SF _i Is the starting point and is SF _j Front-end tasks of SF _i And SF (sulfur hexafluoride) _j Respectively an ith subtask and a jth subtask in the RS, W is node additional information,respectively correspond to the current subtasks SF _i Is the integrated processing time of SF _i Result waiting time of processing completion, processing SF _i Is a service, node carrying the service and SF _i Is executed by the processor.

In some embodiments, the MDP model is specifically represented by the following formula:

MOMDP＝{SS，A，P，R，γ，ω，D _ω }

wherein MOMDP represents an MDP model, SS is a state set in a service scheduling process, A is an action, P is a probability that executing action A under the state SS obtains rewards R and changes the state, R is rewards of executing action A under the state SS, gamma is a discount rate, gamma epsilon [0,1 ]]Gamma denotes historical rewards and impact on the future, bigger denotes the more important past selections, ω is the task allocation preference parameter, ω=<ω _{Processing/responding} ，ω _{Network system} ，ω _(Resource) >Represents the preference degree omega of the system for each optimization target _{Processing/responding} Preference parameters, ω, for processing time and response time of the service _{Network system} To perform preference parameters, ω, for network conditions when a service is performed _(Resource) Preference parameter for resource allocation situation when service is executed, D _ω Is the probability distribution of ω;

wherein the SS _k SF for the kth state in SS _k Is in state SS _k Corresponding subtasks to be executed currently, SF _k-1 For the current subtask SF _k Is used for the pre-task of (1),for subtask SF _k-1 Data amount to be processed, +.>To represent the execution of the pre-task SF _k-1 N represents a set comprising all service entity nodes, +>To independently complete subtask SF _k And having the same input-output type, set of atomic micro-services,/->For subtask SF _k Minimum resource requirement for execution, +.>Indicating whether there is a satisfactory subtask SF _k Atomic micro-service performing the minimum resource requirements needed +.>For the available resources of the service entity node n, < +.>Indicating whether available service entity nodes n exist, BA is the bandwidth of all links, PA is a network reachable matrix, QT is the data packet queuing delay of all nodes, SF _k+1 For the current subtask SF _k Post tasks of (2);

wherein A is _k To perform subtask SF _k Is used for the action of (a),to independently complete subtask SF _k And having the same input-output type, set of atomic micro-services,/->For atomic micro-services, k identifies subtasks, j identifies atomic micro-services are in the collectionBit sequence of->Representing atomic microservices->May be deployed on a service entity node N to provide a service, N representing a set containing all service entity nodes;

Wherein,is shown in the current state SS _k Under, select->Action, get rewarded->And observe the state change to SS _k+1 Probability of (2);

wherein,is a bonus vector, represented in state SS _k Next, executing action A _kj The resulting instant prize is awarded to the user,representing user's +.>Satisfaction in treatment time, +.>Representing user's +.>Satisfaction in time of responding to user request, < >>Is a user's execution completion atomic micro-service->The satisfaction of the time the data packet is transmitted in the network is then determined.

In some embodiments, the step of constructing a multi-objective markov decision process MDP model and training the MDP model by adopting a deep reinforcement learning mode specifically includes:

setting an optimization target and constraint conditions;

establishing a service scheduling database;

acquiring historical service scheduling data and storing the historical service scheduling data into the service scheduling database;

invoking the historical service scheduling data from the service scheduling database by adopting a deep reinforcement learning mode, performing iterative training on the MDP model according to the optimization target and the constraint condition, and calculating a loss function;

And stopping training the MDP model when the error result reflected by the loss function is in a preset range or the iterative training frequency reaches a preset frequency threshold value.

In some embodiments, the loss function is set by using the homolunar optimization method, specifically expressed by the following formula:

L(θ)＝(1-λ)L ^A (θ)+λL ^B (θ)(λ∈[0，1])

wherein L is ^A (θ) is a time-series differential mean square error of the Q value of the network output, θ is a model parameter, L ^B (θ) is a function of the values that will be smoother, ω and ω ' are the set preference vectors, Q (ss, a, ω; θ) is the value of selecting action a in the task current state ss, ss and ss ' are the task states, a is the action, y is the approximation of Q (ss, a, ω; θ), Q ' (ss ', a, ω '; θ) _k ) The Q value representing the next state of maximum inner product with y under the current selected preference ω, r is the current prize value, λ is initially 0, and λ=λ is increased exponentially ^step ，discount factor>Step is the number of iterations of the model.

In some embodiments, the step of traversing the dependency graph, and selecting a service and a node through the MDP model to obtain the optimal service scheduling policy specifically includes:

Setting a time period of service scheduling; the time period comprises a state setting period, a service scheduling period and an information storage period;

when the state setting period is in, detecting the service execution state corresponding to each subtask, modifying the task state of each subtask, and updating the task state;

when the service scheduling period is in, traversing the dependency graph, detecting node additional information corresponding to each subtask, determining an arranging task, selecting a service and a node for executing the arranging task through the MDP model, and determining a service scheduling strategy of the arranging task; the scheduling task is a subtask which is used for carrying out service scheduling in the service scheduling period and is executed after the service scheduling period is finished;

collecting and storing historical service schedule data while in the information storage period; the historical service schedule data is used for training by the MDP model.

In some embodiments, when executing the step of detecting the service execution state corresponding to each subtask while in the state setting period, modifying the task state of each subtask, and updating the task state, the method further includes:

And stopping executing service scheduling operation on the current task when the state of service execution corresponding to each subtask is detected to be completed, and receiving task request data corresponding to the next task sending out a service request.

According to the data-intensive task edge service combination method based on multi-objective reinforcement learning, service requests are received, task request data are obtained, the task request data are decomposed, a plurality of subtask data are obtained, a dependency graph of task execution flow is built, an optimal service scheduling strategy is determined by combining a service combination optimization algorithm, and service scheduling is performed to execute tasks. According to the method and the system, the advantages of edge calculation are fully utilized by a service combination optimization algorithm, the calculation resources and the network resources of the edge nodes are reasonably combined and cooperated, limited resources are effectively utilized, resource waste is avoided, the condition that resources among the edge nodes occupy imbalance is reduced, the efficiency and the performance of task processing are optimized while the node load is balanced, real-time efficient processing of data-intensive tasks is achieved, and the production efficiency and the resource utilization efficiency are improved.

Drawings

FIG. 1 is a flow chart of a method for data-intensive task edge service composition based on multi-objective reinforcement learning provided in an embodiment of the present application;

FIG. 2 is a schematic algorithm flow diagram of a model training mode of an MDP model in an embodiment of the present application;

FIG. 3 is a flow chart of determining an optimal service scheduling policy using a dependency graph and MDP model in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.

Referring to fig. 1, fig. 1 is an optional flowchart of a data-intensive task edge service combining method based on multi-objective reinforcement learning according to an embodiment of the present application, which may include, but is not limited to, steps S101 to S105:

step S101, receiving a service request and acquiring task request data;

step S102, decomposing task request data to obtain a plurality of subtask data;

step S103, constructing a dependency graph of a task execution flow according to the plurality of subtask data;

step S104, determining an optimal service scheduling strategy according to a preset service combination optimization algorithm and a dependency graph;

step S105, service scheduling is carried out according to the optimal service scheduling strategy to execute the data-intensive tasks.

In step S101 of some embodiments, the task request data is relevant task data corresponding to a data-intensive task RS that sends out a service request, where the task relevant data includes multiple subtask data, specifically, the task request data of the RS is obtained, and the RS may be divided into multiple subtask SFs that are smaller _i Each subtask SF _i And own subtask data are provided, namely:

RS＝{SF ₁ ，SF ₂ ，…，SF _k ，…，SF _j }

wherein when all SF is _i After all execution is completed, the RS execution is considered to be completed. Where, assuming that only sequential execution relationships exist between subtasks, each subtask may be completed by relying on an atomic micro-service (the smallest unit of service provided). Meanwhile, in order to mask the problems due to system isomerism, each atomic microservice (i identifies subtasks, j identifies bit sequences), all packaged in the form of a docker image. When needed, the task assigning center will require the service entity node (which may be a stand-alone or an edge node cluster, appearing outside as a single node providing resources) to dynamically load, instantiate and immediately execute the image, using +.>Representing execution task SF _j Is a service entity node of (a). Use->Representing atomic microservices->Whether deployed to node N (N e N), N is used to represent the set containing all the service entity nodes, i.e., n= {1,2,3,.}.

In some embodiments, the following definitions are made for the application scenario and related formulas of the service combination optimization algorithm in the embodiments of the present application:

for each atomic microservice described aboveIt will all predefine its own SLA (Service Level Agreement (service level agreement), which is a contract or agreement defining the service level requirements of the service provider, in which SLA will be right->Maximum overall delay ∈of (2)>Minimum overall delay->Maximum data volume that can be processedMinimum data size->And minimum resource requirements for its execution +.>To make definitions, particularly defineThe following are provided:

considering the hardware differences, performance loss (including instruction virtualization, etc.) that may occur when a service is executed may require different resource footprints to achieve the same performance, and thus, on different machines, The minimum resources required for execution may also be different, namely:

wherein,representing execution of atomic microservices->Optional resource requirement, maximum data size +.>And minimum data size->The values are set according to practical experience, so that the balance between the resource requirement and the processing efficiency is maintained in the reference running environment.

For maximum overall delayIt can be subdivided into maximum processing delay, maximum response delay and maximum transmission delay, namely:

wherein,for maximum processing delay, the maximum processing delay refers to +.>The processing size in the reference operating environment is +.>During the data of (a) the desired delay taken from the start of the processing to the correct result is +.>For maximum response time delay, maximum response time delay refers to +.>On the reference operating environment, the upper bound of the time delay from the complete acquisition of the input data to the actual start of processing is->For maximum transmission delay, the maximum transmission delay means +.>The expected maximum time delay from the start of transmission of the result data to the completion of the transmission of the last frame of result data on a reference operating environment comprising computing resources and network resources, which can be set according to the service characteristics, for- >And +.>The setting of (2) will be determined according to the characteristics of the service itself, and may also be given in combination with actual experience.

For minimum overall delayAnd can be subdivided into the mostSmall processing delay, minimum response delay and minimum transmission delay, namely:

wherein,for minimum processing delay, minimum processing delay refers to +.>The processing size in the reference operating environment is +.>The desired delay from the start of the processing to the time of obtaining the correct result, S is the minimum response delay, the minimum response delay referring to +.>In the reference operating environment, the lower bound of the time delay from the complete acquisition of the input data to the actual start of processing is->For minimum transmission delay, minimum transmission delay means +.>In the reference operating environment, a desired minimum delay from the start of transmission of the result data to the completion of transmission of the last frame of the result data is expected forAnd +.>The setting of (2) will be determined according to the characteristics of the service itself, and may also be given in combination with actual experience.

In some embodimentsIn an example, for a determined subtask SF _i And the subtask data amount D that it needs to process, the system will generate a data amount capable of processing SF _i And satisfies the maximum data volume that can be processedIs a set of selectable microservices:

Wherein,to process subtask SF _i Is a set of optional microservices of any one atomic microservice +.>All can independently complete subtasks SF _i And have the same input-output type.

When the service is deployed to the node, the host n needs to use the current available resourcesAllocating not less than performing subtasks SF _i Minimum resource requirement->To support service operations, otherwise host n will refuse to provide service.

When the service execution is completed, the subtask SF is completed _i After that, the host n will immediately reclaim the resources it occupies, and the result of execution will be saved in the host n (if the execution is abnormally finished, the result of execution will be the input data of the execution exception service, so that other machines can be assigned to complete the service by the dispatch center later), if the task dispatch center is not in a fixed length of time eta _{Results wait} Internally responding to the result (i.e. indicating the machine to be executed next, facilitating the current machine to upload the result of this stepTo the next executing machine, constituting a workflow), host n discards the result.

The network topology between the serving entity nodes is known, the available bandwidth of the network links of serving entity node i to serving entity node k It is known that the queuing delay of the data packets of the service entity node i is +.>The available bandwidth, the network topology and the data packet queuing delay are known to be refreshed within a fixed time interval, so that the service entity node is ensured to grasp the network state. Use matrix +.>Representing the available bandwidth of all network links, pa= (PA) _ik )∈{0，1} ^N×N As a network reachability matrix (pa if there is a direct link between serving entity node i and serving entity node k _ik =1, and vice versa 0), qt= { QT is used _i I e N represents the packet queuing delay for all serving entity nodes.

For the best path for transmitting data of size O from serving entity node i to serving entity node k (i+.k)And corresponding total transmission delay->The method can be obtained by the following steps:

if the pa is _ik =1, the best path is defined asThe total time delay is as follows:

if pa is _ik =0, then calculateObtaining a directed graph NET (N, V) according to the PA, wherein N represents a service entity node, m and N are both service entity nodes, and +.>Representing the total delay from the serving entity node m to the serving entity node n.

Wherein pa is _mn Representing the network reachability matrix from service entity node m to service entity node n, if V _mn And is not 0, it means that there is a weight V from the service entity node m to the service entity node n _mn And the value of which is the total time O is transmitted from serving entity node m to serving entity node n.

After obtaining g=net (N, V), using Dijkstra algorithm to obtain an optimal transmission path of data from service entity node i to service entity node kThe total delay of transmission of O from i to k can be obtained at the same time:

wherein i, m, n and k are all service entity nodes, and when i=k, the current transmission is independent of the network, and then

According to the scene definition, the formula definition and the like, the optimization target and the constraint condition of the service combination optimization algorithm in the embodiment of the application are obtained, and the optimization target and the constraint condition are specifically represented by the following formulas:

if it isThen->

If->Then->

For equation (3) above, which is the optimization objective, it is indicated that the maximization of the accumulation is requiredAnd->

Representing user to task SF _i The degree of satisfaction in the processing time (total execution time excluding the time of data preparation, waiting for processing and data transmission) is as high as possible, and can be obtained by the following formula:

wherein,representing user's +.>Satisfaction in treatment time, +.>Binary labels 0 and 1 for labeling whether +.>Processing SF _i ，/>For atomic microservice->Is used to determine the actual processing delay of the (c) signal,representing from- >After completion of the preparation work, SF is processed from the beginning _i Processing completion SF _i Time spent,/->For the maximum processing delay +.>Is the minimum processing delay.

Representing user to service SF _i The satisfaction degree of the speed responding to the user request is better as the value is larger, and the satisfaction degree is obtained through the following formula:

wherein,representing user's +.>Satisfaction in time of responding to user request, < >>Binary labels 0 and 1 for labeling whether +.>Processing SF _i ，/>For atomic microservice->Is>Representing from->Receives the processing request until the SF is actually processed _i Time delay between->For the maximum response time delay, +.>Is the minimum response delay.

For equation (4) above, which is the optimization objective, it is indicated that the maximization of the accumulation is requiredIndicating the user to SF _i The result of the execution is transmitted to the processing SF _i+1 The greater the value, the better the satisfaction of the time of transmission of the data packet in the network, and can be obtained by:

wherein,representing user's execution of atomic micro-service>Afterwards, the satisfaction of the time of the transmission of the data packet in the network,/->Binary labels 0 and 1 for labeling whether +. >Processing SF _i ，/>Is atomic micro-service->Transmission delay of->Representing from->After execution, SF is completed _i The time delay of the complete transmission of the resulting data from m to n, and (2)>Can be obtained by calculation of the above formula (1) and formula (2).

For equation (5) above, which is an optimization objective, it represents the resource allocation situation where a balanced system is required, where P _n (n.epsilon.N) represents the sum of resources of the service entity node NSpecific gravity occupied in system resources, +.>N represents a set containing all the service entity nodes, being the available resources of the service entity node N.

For the above formula (6), it is a constraint condition in which,to process subtask SF _i Is represented by formula (6) for any task SF _i It is necessary to find atomic micro-services that can satisfy the task function.

For the above formula (7), it is a constraint condition in whichRepresenting atomic microservices->Whether or not deployed to node N (n.epsilon.N), equation (7) represents the available resources of only the current computing node ∈N>Greater than the required resource->Atomic microservices can be deployed on nodes at that time.

The above formula (8) is used as a constraintRepresenting atomic microservices->Whether or not deployed to node N (n.epsilon.N), equation (8) represents the SF for the task _i Must be able to find an atomic microservice +.>Which is to be deployed on a certain service entity node n to perform a task SF _i Is a task request of (1).

For the above formula (9), it is a constraint condition in whichRepresenting atomic microservices->Whether or not to be deployed to node N (n.epsilon.N), use ∈ ->Representing atomic microservices->Whether or not deployed to node m (mεN), equation (9) represents SF _i Will only be served by atomic micro-services deployed on a single service entity node n>And (5) processing.

Under the condition of defining the above scenes, optimizing targets and constraint conditions, the system models the above optimization problems as MDP problems, and solves the problems by using a reinforcement learning mode. Since the optimization objective is related to the choice of micro-services and nodes, the reinforcement learning model generated by the MDP problem can find the atomic micro-services under the condition that the constraint condition is satisfiedAnd a combination of service entity nodes n, so that the combination thereof can maximize the optimization objective as much as possible, namely, the above formula (3), the above formula (4), and the above formula (5).

In some embodiments, step S103 may include, but is not limited to including, step S201 to step S203:

step S201, initializing a dependency graph;

step S202, task dependency relations among all the subtasks are obtained according to the subtask data;

Step S203, constructing a dependency graph according to the task dependency.

In steps S201 to S203 of some embodiments, task RS is decomposed to obtain a plurality of subtasks SF _i And corresponding subtask data including subtask category data, subtask number data, and subtaskExecuting data and the like, and constructing a dependency relationship graph according to task dependency relationships among all sub-tasks, wherein the dependency relationship graph is specifically represented by the following formula:

G _de ＝(N，V，W)

N＝RS

V＝{v _ij ，v _jk ...，}

SF as described above _i The execution state of (1) includes:

and (3) to be executed: indicating that the current task has a pre-task which is not completed and does not meet the execution condition

The method can be performed as follows: indicating that all the front-end tasks are completed and the tasks meet the execution conditions and can be scheduled

Is executing: indicating that the scheduling of the task is completed currently and the task is not executed yet to finish

Normal execution ends: indicating that the current task is normally executed and obtaining a correct result.

Abnormal execution ends: indicating that the current task is completed in abnormal execution and the correct result is not obtained.

In some embodiments, step S104 may include, but is not limited to including, step S301 to step S302:

step S301, constructing a multi-objective Markov decision process MDP model, and training the MDP model in a deep reinforcement learning mode;

step S302, traversing the dependency graph, and selecting services and nodes through an MDP model to obtain an optimal service scheduling strategy.

In step S301 of some embodiments, in consideration of markov nature naturally existing between task execution, in the task scheduling, the load of the node is balanced while the corresponding QOE related index is maximized as much as possible, so as to meet the optimization objective as much as possible. The optimization problem is modeled as an action selection problem: for the current task SF _i The data size that needs to be processed is known to beHow to select the proper<Service-node>Action to cause SF _i Can be performed while maximizing the above +_ as much as possible while guaranteeing system load balancing >And->

The multi-objective Markov decision process MDP model is specifically represented by the following formula:

MOMDP＝{SS，A，P，R，γ，ω，D _ω }

wherein MOMDP represents an MDP model, SS is a state set in a service scheduling process, A is an action, P is a probability that executing action A under the state SS obtains rewards R and changes the state, R is rewards of executing action A under the state SS, gamma is a discount rate, gamma epsilon [0,1 ]]Gamma denotes historical rewards and impact on the future, bigger denotes the more important past selections, ω is the task allocation preference parameter, ω=<ω _{Processing/responding} ，ω _{Network system} ，ω _(Resource) >Representation systemPreference degree, omega, of each optimization objective _{Processing/responding} Preference parameters, ω, for processing time and response time of the service _{Network system} To perform preference parameters, ω, for network conditions when a service is performed _(Resource) Preference parameter for resource allocation situation when service is executed, D _ω Is the probability distribution of ω;

wherein,is shown in the current state SS _k Under, select A _kj Action, get rewards/>And observe the state change to SS _k+1 Probability of (2);

wherein,is a bonus vector, represented in state SS _k Next, executing action A _kj The resulting instant prize is awarded to the user,representing user's +. >Satisfaction in treatment time, +.>Representing user's +.>Satisfaction in time of responding to user request, < >>Representing user's execution of atomic micro-service>Afterwards, the satisfaction degree of the transmission time of the data packet in the network; />

Wherein,representing available resources of the serving entity node n, < +.>P is the sum of resources of the service entity node n _n Resource sum representing service entity node n +.>The weight occupied in the system resource, N, represents the set containing all the service entity nodes.

In the embodiment of the present application, multiple objectives are represented by preference vectors, i.e. the task allocation preference parameter ω is added, most of conventional reinforcement learning does not have a preference concept, and the reward function used by the conventional reinforcement learning is a value function, and is also a function with a given state and action, and can output only a single value. The user preference vector is formed by selecting a user from a preset task distribution, and the single-value reward function in the traditional sense can be obtained by carrying out inner product on the vector output by the reward function and the user preference vector.

By introducing the user preference vector, the user can select the emphasis point of the task to be completed, such as shorter time, lower system load and the like, and then input the user preference vector, the subtask to be completed and the system state into the training completion model, so that a subtask processing scheme which meets the user preference as far as possible on the premise of completing the user task can be obtained. For example, the reward function is [ goal 1, goal 2, goal 3], the user's preference may be similar to [1,2,3], where 1,2,3 in [1,2,3] are weights assigned to goal 1, goal 2, and goal 3, respectively, by the user so that the user preference vector may indicate that the user is more focused on the third goal.

Each subtask can be provided with different preferences, the system can select a proper scheme to execute according to the subtask which is required to be executed currently and the corresponding user preference vector, and the combined services in the edge service combination method are provided by edge equipment, so that the real-time performance of data processing can be enhanced.

In some embodiments, step S301 may include, but is not limited to including, step S401 to step S405:

step S401, setting an optimization target and constraint conditions;

Step S402, a service scheduling database is established;

step S403, acquiring historical service scheduling data, and storing the historical service scheduling data into a service scheduling database;

step S404, adopting a deep reinforcement learning mode to call historical service scheduling data from a service scheduling database, carrying out iterative training on the MDP model according to an optimization target and constraint conditions, and calculating a loss function;

and step S405, when the error result reflected by the loss function is within a preset range or the iterative training times reach a preset time threshold, stopping training the MDP model.

In step S405 of some embodiments, for the smoothness of the loss function, the loss function is set by using the homotopy optimization method, specifically expressed by the following formula:

L(θ)＝(1-λ)L ^A (θ)+λL ^B (θ)(λ∈[0，1])

taking into account L ^A The optimal solution front for (θ) is discrete, resulting in a steep function, making optimization more difficult, and thereforeIntroduction of L ^B (θ) as a smoothing term, wherein L ^A (θ) is a time-series differential mean square error of the Q value of the network output, θ is a model parameter, L ^B (θ) is a function of the values that will be smoother, ω and ω ' are the set preference vectors, Q (ss, a, ω; θ) is the value of selecting action a in the task current state ss, ss and ss ' are the task states, a is the action, y is the approximation of Q (ss, a, ω; θ), Q ' (ss ', a, ω '; θ) _k ) The Q value representing the next state of maximum inner product with y is the current prize value, r is the current prize value, λ is initially 0, and then exponentially increasing, i.e., λ=λ (discrete factor) at the current selected preference ω ^step ，discount factor>Step is the number of iterations of the model.

In some embodiments, step S302 may include, but is not limited to including, step S501 through step S504:

step S501, setting a time period of service scheduling;

step S502, when the state setting period is in, detecting the service execution state corresponding to each subtask, modifying the task state of each subtask, and updating the task state;

step S503, traversing the dependency graph when the service scheduling period is in, detecting node additional information corresponding to each subtask, determining a scheduling task, selecting a service and a node for executing the scheduling task through an MDP model, and determining a service scheduling strategy of the scheduling task;

step S504, when in the information storage period, collects and stores history service schedule data.

In step S501 of some embodiments, the time period ψ includes a state setting period ψ _i，check Service scheduling period ψ _i，chor Information storage period ψ _i，model The method comprises the steps of carrying out a first treatment on the surface of the Every time the system passes a fixed time period psi, the system execution state is checked once, and the next task to be executed is arranged according to the task execution state.

In step S502 of some embodiments, a period ψ is set in the state _i，check Middle pair subtasksBased on the state of its service execution, including normal completion of execution and abnormal end of execution, in particular:

the normal completion execution: i.e. the service has completed processing the task, at which point the task SF it is performing _k The state of (a) is changed from being executed to normal execution ending, and a timer theta is initialized _k And take SF as _k Side v as starting point _k. The task of the corresponding endpoint is changed to an executable state.

Abnormal execution ends: i.e. an exception in service execution, including service exception exit, total service processing time exceeding eta _{Maximum time delay} . At this time, SF is taken _k The state of (a) is changed from being executed to abnormal execution ending, and a timer theta is initialized _k 。

In step S503 of some embodiments, the task is scheduled as a subtask that is scheduled for service in the service scheduling period and that is executed after the service scheduling period is ended; historical service schedule data is used for training by the MDP model. After the update of the task state is completed in step S502, the service scheduling period ψ is entered _i，chor Traversing the dependency graph, searching whether a task with abnormal execution ended exists or not according to node additional information W of the node corresponding to each subtask, and if not, corresponding theta _k The highest subtask SF _k If the task is empty, traversing the W again, searching the subtasks in an executable state and taking the subtasks as the tasks (scheduling tasks) needing to be scheduled.

Subsequently, assume that SF is determined this time _k As tasks that need to be scheduled (orchestration tasks), the system will collect system state, including:

front-end task SF requiring processing _k-1 Results data of (a)

Temporary storageIs->Or->According to SF _k If SF is determined to be in the state of _k Is executable, thenShould be temporarily present +.>Conversely, SF should be performed before temporary existence _k Failed machine->Applying;

and->Wherein (1)>Is an optional task set,/->Is the minimum amount of resources required for the service, +.>Is a node available resource;

system network conditions: the BA, the PA and the QT.

After the system state is collected, the system sets proper task distribution preference parameters omega according to the current execution time and system load condition of the task RS, inputs the collected system state and omega into a model and selects the next combinationCombination of adapted atomic micro-services and service entity nodes

After the combination is obtained, n loading is requiredAnd requires n to allocate enough resources, then indicates temporary storage +. >Will->Transmitting to n, end θ _k Is to re-clock SF _k+1 Is being executed to complete the SF _k+1 Is arranged in the sequence of the steps. If no executable task is currently found or no appropriate task execution combination is found, this stage is skipped.

In step S504 of some embodiments, after completing the service schedule of the orchestration task, the system enters an information storage period ψ _i，model The period requires storing the current collected system state as historical service scheduling data, facilitating iterative training of subsequent models. The historical service schedule data includes information such as the state observed by the current system and the accumulated time of service operation, specifically, at a certain psi _i，model In the period, if the previous system is psi in the same time period _i，chor Periodically arrange tasks SF _k The corresponding observed subtask state SS _k And action A taken _kj Stored in the service dispatch database and processed by the pre-task SF _k-1 Atomic microservice of (2)In the execution of (2), the pre-task SF is calculated by combining the above expression (11), the above expression (13), the above expression (15) and the above expression (16) _k-1 Corresponding reward vector->And stored in a service dispatch database.

The MDP model then trains by randomly sampling pairs of sequential state actions (historical service schedule data) from the service schedule database, specifically:

Wherein the SS _k-1 For the prepositive task SF _k-1 Task state of A _k-1p For the prepositive task SF _k-1 The action of the selection is performed,for the prepositive task SF _k-1 In state SS _k-1 Select action A _k-1p Obtained bonus vector, SS _k For the current task SF _k Is a task state of (a).

Referring to fig. 2, fig. 2 is an algorithm flow diagram of an alternative model training manner of the MDP model in the embodiment of the present application, after a sufficient number of pairs of continuous state actions are stored in the service scheduling database, the MDP model starts to collect data of the action experience set for training, until the error result reflected by the loss function is within a preset range or the number of iterative training reaches a preset number threshold, the training of the MDP model is stopped, and the training is ended.

In step S502 of some embodiments, when it is detected that the service execution state corresponding to each subtask is completed, the execution of the service scheduling operation on the current task RS is stopped, and task request data corresponding to the next task that issues a service request is received.

Referring to fig. 3, fig. 3 is an optional flowchart for determining an optimal service scheduling policy by using a dependency graph and an MDP model according to an embodiment of the present application, where the method includes the following steps:

decomposing task request data of the data-intensive tasks, and establishing a dependency relationship graph according to task dependency relationships among all subtasks;

Detecting service execution states of all the subtasks, and updating task states of all the subtasks;

traversing the dependency graph, and determining an orchestration task;

determining an optimal service scheduling combination for arranging tasks through an MDP model;

calculating historical service scheduling data generated by selecting the optimal service scheduling combination, and storing the historical service scheduling data into a service scheduling database;

performing service scheduling of arranging tasks;

detecting the service execution state of each subtask in the next time period;

and judging whether the data-intensive task is executed or not.

According to the data-intensive task edge service combination method based on multi-objective reinforcement learning, service requests are received, task request data are obtained, the task request data are decomposed, a plurality of subtask data are obtained, a dependency graph of task execution flow is built, an optimal service scheduling strategy is determined in combination with a service combination optimization algorithm, and service scheduling is performed to execute tasks. According to the method and the system, the advantages of edge calculation are fully utilized by a service combination optimization algorithm, the calculation resources and the network resources of the edge nodes are reasonably combined and cooperated, limited resources are effectively utilized, resource waste is avoided, the condition that resources among the edge nodes occupy imbalance is reduced, the efficiency and the performance of task processing are optimized while the node load is balanced, real-time efficient processing of data-intensive tasks is achieved, and the production efficiency and the resource utilization efficiency are improved.

The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and as those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.

It will be appreciated by those skilled in the art that the technical solutions shown in the figures do not constitute limitations of the embodiments of the present application, and may include more or fewer steps than shown, or may combine certain steps, or different steps.

Preferred embodiments of the present application are described above with reference to the accompanying drawings, and thus do not limit the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims

1. A data-intensive task edge service combining method based on multi-objective reinforcement learning, comprising:

Decomposing the task request data to obtain a plurality of subtask data;

2. The data-intensive task edge service combining method of claim 1, wherein the subtask data comprises subtask category data, subtask number data, and subtask execution data.

3. The method for combining data-intensive task edge services according to claim 1, wherein the step of constructing a dependency graph of task execution flow from a plurality of subtask data specifically comprises:

initializing the dependency graph;

and constructing the dependency graph according to the task dependency.

4. The method for combining data-intensive task edge services according to claim 1, wherein the step of determining an optimal service scheduling policy according to a preset service combination optimization algorithm and the dependency graph specifically comprises:

5. The method of claim 4, wherein the dependency graph is specifically represented by the following formula:

G _de ＝(N，V，W)

N＝RS

V＝{v _ij ，v _jk ...，}

wherein G is _de Representing a dependency graph, wherein RS is a task, N is a node set, each node in the node set is a subtask in the task RS, V is an edge set, V _ij Representing subtask SF _i And subtask SF _j There is a sequence relation between SF _i Is the starting point and is SF _j Front-end tasks of SF _i And SF (sulfur hexafluoride) _j Respectively an ith subtask and a jth subtask in the RS, W is node additional information,respectively correspond to the current subtasks SF _i Is used for the integrated processing time of the (a),SF _i result waiting time of processing completion, processing SF _i Is a service, node carrying the service and SF _i Is executed by the processor.

6. The data-intensive task edge service combining method of claim 4, wherein the MDP model is specifically represented by the following formula:

MOMDP＝{SS，A，P，R，γ，ω，D _ω }

wherein MOMDP represents an MDP model, SS is a state set in a service scheduling process, A is an action, P is a probability that executing action A under the state SS obtains rewards R and changes the state, R is rewards of executing action A under the state SS, gamma is a discount rate, gamma epsilon [0,1 ] ]Gamma denotes historical rewards and impact on the future, bigger denotes the more important past selections, ω is the task allocation preference parameter, ω=<ω _{Processing/responding} ，ω _{Network system} ，ω _(Resource) >Represents the preference degree omega of the system for each optimization target _{Processing/responding} Preference parameters, ω, for processing time and response time of the service _{Network system} To perform preference parameters, ω, for network conditions when a service is performed _(Resource) Preference parameter for resource allocation situation when service is executed, D _ω Is the probability distribution of ω;

wherein the SS _k SF for the kth state in SS _k Is in state SS _k Corresponding subtasks to be executed currently, SF _k-1 For the current subtask SF _k Is used for the pre-task of (1),for subtask SF _k-1 Data amount to be processed, +.>To represent execution of the pre-taskSF _k-1 N represents a set comprising all service entity nodes, +>To independently complete subtask SF _k And having the same input-output type, set of atomic micro-services,/->For subtask SF _k Minimum resource requirement for execution, +.>Indicating whether there is a satisfactory subtask SF _k Atomic micro-service performing the minimum resource requirements needed +.> For the available resources of the service entity node n, < +.>Indicating whether available service entity nodes n exist, BA is the bandwidth of all links, PA is a network reachable matrix, QT is the data packet queuing delay of all nodes, SF _k+1 For the current subtask SF _k Post tasks of (2);

wherein A is _k To perform subtask SF _k Is used for the action of (a),to independently complete subtask SF _k And atomic microservices with the same input-output typeSet of->For atomic micro-services, k identifies subtasks, j identifies that atomic micro-services are in the set +.>Bit sequence of->Representing atomic microservices->May be deployed on a service entity node N to provide a service, N representing a set containing all service entity nodes;

wherein,is shown in the current state SS _k Under, select A _kj Action, get rewarded->And observe the state change to SS _k+1 Probability of (2);

7. The method for data-intensive task edge service combining as defined in claim 4, wherein the step of constructing a multi-objective markov decision process MDP model and training the MDP model by deep reinforcement learning comprises:

Setting an optimization target and constraint conditions;

establishing a service scheduling database;

8. The method of claim 7, wherein the loss function is set by using a homolun optimization method, and is specifically represented by the following formula:

L(θ)＝(1-λ)L ^A (θ)+λL ^B (θ)(λ∈[0，1])

wherein L is ^A (θ) is a time-series differential mean square error of the Q value of the network output, θ is a model parameter, L ^B (θ) is a function of the values that will be smoother, ω and ω ' are the set preference vectors, Q (ss, a, ω; θ) is the value of selecting action a at task current states ss and ω, ss and ss ' are task states, a is action, y is an approximation of Q (ss, a, ω; θ), Q ' (ss ', a, ω '; θ) _k ) The Q value representing the next state of maximum inner product with y under the current selected preference ω, r is the current prize value, λ is initially 0, and λ=λ is increased exponentially thereafter ^step ，discount factor>Step is the number of iterations of the model.

9. The method for combining data-intensive task edge services according to claim 5, wherein the step of traversing the dependency graph, performing service and node selection through the MDP model, and obtaining the optimal service scheduling policy specifically comprises:

10. The method of claim 9, wherein when executing the step of detecting a state of service execution corresponding to each subtask while in the state setting cycle, modifying a task state of each subtask, and updating the task state, further comprising: