CN116578403A

CN116578403A - RPA flow scheduling method and system based on deep reinforcement learning

Info

Publication number: CN116578403A
Application number: CN202310834496.1A
Authority: CN
Inventors: 袁水平; 陈伟雄; 汤旭贤; 谢帅宇
Original assignee: Anhui Sigao Intelligent Technology Co ltd
Current assignee: Anhui Sigao Intelligent Technology Co ltd
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2023-08-11

Abstract

The invention provides an RPA flow scheduling method and system based on deep reinforcement learning, comprising the following steps: acquiring a task feature set through an RPA task demand list uploaded by a user; the method comprises the steps of obtaining an actuator cluster state of a cluster machine through resource fluctuation and hardware information when an RPA actuator executes; constructing an RPA flow scheduling model, and optimizing the RPA flow scheduling model through a task feature set and an executor cluster state to obtain an optimized RPA flow scheduling model; and performing RPA flow scheduling tasks through the optimized RPA flow scheduling model. The invention fully considers the constraint rule and task characteristics of the RPA task demand list, and monitors the resource fluctuation and hardware information of each machine in the cluster from the registered execution machine in real time, and is used for optimizing the RPA flow scheduling model; and a multi-objective optimization scheduling strategy for execution time and resource cost is formulated through the task feature set and the executor cluster state, so that the average execution time of the RPA job is reduced, and the cluster machine consumption is reduced.

Description

RPA flow scheduling method and system based on deep reinforcement learning

Technical Field

The invention relates to the technical field of distributed task scheduling, in particular to an RPA flow scheduling method and system based on deep reinforcement learning.

Background

Robotic Process Automation (RPA) is a rapidly evolving automation technology in recent years, and the development of artificial intelligence forces people to see what work should be automated and what work should be done by humans. One of the important developments is RPA, which is a high-repeatability task performed by humans in their application's User Interface (UI) using software robots to simulate and replicate, with the greatest advantage of RPA being to implement complex business links by simulating the complex behavior of humans to complete the business in batches, rather than writing large amounts of code and scripts. After a user submits a batch of RPA tasks, it needs to be scheduled on the executor cluster state to specify which executor each task is executed by. This is a classical distributed task scheduling system, and common scheduling algorithms include: a round-robin method, namely numbering cluster nodes and sequentially distributing tasks one by one; greedy method-distributing according to task resource demands and node loads; performing mathematical modeling on the scheduling problem by using a meta heuristic algorithm, and performing iterative optimization to solve a scheduling decision under the condition of meeting resource constraint; reinforcement learning algorithm-according to the resource status of the cluster, a selection scheduling decision is obtained by considering a plurality of optimization targets.

The RPA task emphasizes low machine overhead in the scheduling process, and the resource consumption of the cluster state of the executor is reduced by informing the execution time of the control task, so that the method is more suitable for multi-objective optimization by adopting a deep reinforcement learning method. However, the existing algorithm fails to consider the task characteristics of the RPA, such as dynamic changes of the resources of the executor, and the task resource requirements are divided into strong constraint and weak constraint. Strongly constrained resources must be met if an exclusive display is required; weakly constrained resources are predictions of task execution requirements, such as how much memory is occupied, and such resources can be overloaded during scheduling, but can affect the resource consumption of the cluster. The existing algorithm fails to consider the influence of task resource requirements on a scheduling strategy, so that the scheduling accuracy is insufficient.

Disclosure of Invention

In order to solve the technical problems, the invention provides an RPA flow scheduling method based on deep reinforcement learning, which comprises the following steps:

s1: acquiring a task feature set through an RPA task demand list uploaded by a user;

s2: the method comprises the steps of obtaining an actuator cluster state of a cluster machine through resource fluctuation and hardware information when an RPA actuator executes;

s3: constructing an RPA flow scheduling model, and optimizing the RPA flow scheduling model through a task feature set and an executor cluster state to obtain an optimized RPA flow scheduling model;

s4: and performing RPA flow scheduling tasks through the optimized RPA flow scheduling model.

Preferably, step S1 specifically includes:

the method comprises the steps of obtaining constraint rules and task characteristics of an RPA task demand list, wherein the constraint rules comprise: the task features are network/IO intensive, CPU intensive or memory intensive, and the constraint rules and the task features are converted into task feature sets.

Preferably, step S2 specifically includes:

s21: when the RPA actuator is started, the registration center automatically senses immediately, the RPA actuator is registered to the unified management center, and the running state of the RPA actuator is maintained in the whole process;

s22: when the RPA executor starts a background execution flow, automatically and regularly reporting hardware resources to a registration center, and acquiring hardware information of the RPA executor in real time;

s23: the background execution flow of the RPA executor automatically reports the software environment of the RPA executor to a registration center;

s24: in the running process of the RPA actuator, each assigned task audit log is recorded at any time, the running state of the actuator is judged through the task audit log, an actuator resource calculation model based on the audit log is constructed, and resource fluctuation is obtained through real-time calculation of the actuator resource calculation model and is uploaded to a resource management center;

s25: and acquiring the actuator cluster state of the clustered machine through the resource fluctuation and the hardware information.

Preferably, the step S3 specifically includes:

s31: allocating corresponding executors in the executor cluster state for each task in the task feature set with the aim of minimizing the RPA job running time and the resource consumption of the cluster machine, and constructing a scheduling decision sequence of an RPA flow scheduling model;

s32: setting a system parameter beta for the multi-objective optimization problem to specify the optimization priority of the RPA flow scheduling model, and simultaneously reducing the resource consumption of the executor cluster state and the average execution time of all tasks;

s33: setting key components of the deep reinforcement learning of an RPA flow scheduling model, wherein the key components comprise an intelligent agent, a curtain, a state space and an action space;

s34: setting a reward function of the RPA flow scheduling model, wherein the reward function comprises: instant rewards and curtain rewards;

s35: constructing a loss function, and calculating to obtain a state-action cost function of the RPA flow scheduling model through the loss function and the DQN algorithm;

s36: and optimizing the RPA flow scheduling model through steps S31-S35 to obtain an optimized RPA flow scheduling model.

Preferably, the expression of the task feature set is: t= { T ₁ ,t ₂ ...,t _N N is the total number of tasks;

the expression of the actuator cluster state is: e= { E ₁ ,e ₂ ...e _M M is the total number of actuators;

actuator e _m Is of the strongly constrained resource capacity ofWeak constraint resource Capacity is +.>The unit price is P _m The method comprises the steps of carrying out a first treatment on the surface of the Task t _n Is +.>The weakly constrained resource requirement isTask t _n In the actuator e _m The execution time on is T _nm The method comprises the steps of carrying out a first treatment on the surface of the n is the number of the task, m is the number of the executor, p is the total number of the strongly constrained resource capacity, and q is the total number of the weakly constrained resource capacity.

Preferably, the expression of the scheduling decision sequence is: x= [ X ] ₁ ,x ₂ ,...,x _n ]；

wherein ,x_n =e(t _n ) Representing task t _n Dispatch to t _n Corresponding actuator e (t _n ) Executing on the computer;

x _n =indicating that task t is not to be performed _n Assigned to any actuator; actuator e _m Run length of Q _m Is K (e) _m )，K(e _m ) The sum of the strongly constrained resource requirements of all the tasks in the system cannot exceed e _m Is the remaining capacity of (1), namely:

resource consumption Cost of executor cluster state _T The method comprises the following steps:

the average running time of the task is as follows:

。

preferably, the curtain rewards are set by the following steps:

s341: and calculating the maximum value of the state consumption of the executor cluster generated by all tasks in one curtain, wherein the calculation formula is as follows:

wherein i is the number of the task, j and k are the numbers of the executor;

s342: carrying out normalization operation on actual resource consumption in the actuator cluster state, wherein the calculation formula is as follows:

Cost _T resource consumption for the executor cluster state;

s343: calculating the maximum average running time and the minimum average running time of all tasks, wherein the calculation formula is as follows:

normalizing the average running time of the actual operation in the actuator cluster state, wherein the calculation formula is as follows:

wherein ,Avg_T The average running time of the task is;

s344: the calculation formula of the curtain rewards is as follows:

wherein ,R_fixed Is an adjustable system parameter for controlling the size of the curtain rewards.

Preferably, step S35 specifically includes:

s351: the expression of the loss function is:

wherein ,representing data in an experience pool of the DQN algorithm, wherein s represents a current state, a represents an action strategy, r represents rewards, and s' represents a next state; y is _i Represents an objective function, Q represents a state cost function, θ _i Representing parameters in the neural network, i being the number of tasks, IE representing mathematical expectations, i.e. +.>Represents->About->Is a desired value of (2);

the expression of the objective function is:

wherein ,representing learning rate, a 'representing action strategy of maximum value in state s'>Representing the corresponding maximum value of the state cost function Q in the state s';

s352: and adjusting parameters of the RPA flow scheduling model to enable the value of the loss function to be smaller than a preset value, and obtaining a state-action cost function Q (s, a).

An RPA process scheduling system based on deep reinforcement learning, comprising:

the task feature set acquisition module is used for acquiring a task feature set through the RPA task demand list uploaded by the user;

the executor cluster state acquisition module is used for acquiring the executor cluster state of the cluster machine through the resource fluctuation and the hardware information when the RPA executor executes;

the scheduling model optimization module is used for constructing an RPA flow scheduling model, optimizing the RPA flow scheduling model through the task feature set and the executor cluster state, and obtaining an optimized RPA flow scheduling model;

and the scheduling module is used for performing RPA flow scheduling tasks through the optimized RPA flow scheduling model.

The invention has the following beneficial effects:

the invention fully considers the constraint rule and task characteristics of the RPA task demand list, and monitors the resource fluctuation and hardware information of each machine in the cluster from the registered execution machine in real time, and is used for optimizing the RPA flow scheduling model; and a multi-objective optimization scheduling strategy for execution time and resource cost is formulated through the task feature set and the executor cluster state, so that the average execution time of the RPA job is reduced, and the cluster machine consumption is reduced.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present invention;

the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, the invention provides an RPA flow scheduling method based on deep reinforcement learning, comprising the following steps:

Further, the step S1 specifically includes:

Specifically, according to an RPA task demand list uploaded by a user, acquiring key constraint rules in the RPA task demand list, and packaging the key constraint rules into a corresponding RPA model; the constraint rule in the RPA model is converted into an RPA task feature, the RPA model is provided with a strong constraint rule and a weak constraint rule, and the strong constraint rule is a hardware base and a strong user requirement which are necessary for the RPA to run, and mainly comprises hard requirements of whether software is exclusive, whether a display is exclusive, whether interruption is possible, the resolution of the display and the software version is possible; the weakly constrained rules are typically expressed on resources, mainly including estimated run length, CPU, memory and bandwidth requirements; strongly constrained rules generally affect RPA flow operational performance once violations necessarily result in failure of RPA flow scheduling.

For a one-time scheduling process, the M executors E and the N RPA tasks T are specifically expressed as follows:

E={e ₁ ,e ₂ ,...,e _M }

T={t ₁ ,t ₂ ,...,t _N }

actuatore _m Is a strongly constrained resource capacity of (1)Cs _m And weakly constrained resource capacityCw _m The expression is as follows:

taskst _n Is a strongly constrained resource requirement of (1)Rs _n And weakly constrained resource requirementsRw _n The expression is as follows:

further, the step S2 specifically includes:

specifically, for better automatic operation and maintenance management, an actuator automatic registration is provided; when the executor is automatically registered, the registration center of the executor immediately and automatically senses when the RPA executor is started and registers the executor to the unified management center, so that the global non-sensing dynamic capacity expansion and contraction of an administrator are realized; the whole process maintains various running states of the executor, and realizes high availability, high expandability and high operability of the executor in the cluster;

specifically, aiming at better measuring the current execution capacity and the maximum execution capacity of the executor, the method is used for more accurate and efficient scheduling, and provides real-time dynamic collection of hardware-level resource information of the executor; the executor starts background execution flow, automatically reports hardware resources of the executor machine to a registration center at regular time, and achieves the purpose of collecting the maximum memory capacity, the memory utilization rate, the maximum CPU core number, the CPU utilization rate, the network bandwidth and the disk IO rate of the executor machine in real time; collecting computing hardware resources, and collecting common external equipment hardware resources such as display resolution, display quantity, camera quantity, microphone quantity and the like so as to support more fine-grained, accurate and reasonable RPA task scheduling and distribution;

the CPU utilization rate calculation method comprises the following steps:

wherein kernel is the difference between the current operating system kernel clock and the operating system kernel clock before the specific time, user is the difference between the current user process clock and the user process clock before the specific time, and idle is the interval length between the starting time point and the ending time point which need to be counted when the utilization rate is calculated;

the memory usage calculation formula is:

wherein total refers to the total physical memory capacity of the actuator machine, buffered refers to the buffer size of the operating system for the block device, cached refers to the buffer size of the operating system for the file system, and free refers to the free memory capacity of the physical memory of the current actuator machine;

the IO rate is expressed in terms of IO Time IO_Time once:

seek_time refers to the average addressing time of the disk, rotation_speed refers to the average rotation delay of the disk, IO_chunk_size refers to the size of single IO data amount of the disk, and transfer_rate refers to the maximum read-write rate of the disk;

specifically, for determining the execution environment of the executor software, the ecological detection of the executor software is provided, and the background execution flow of the executor is automatically reported to an executor registry; the specific detected software environment comprises a browser kernel version, a browser release version, an excel version, a word version and the like;

specifically, aiming at the conditions of task load and resource use of an executor, statistics and automatic collection of the executor task are provided. During the operation of the executor, each assigned task audit log is recorded at any time, and the method comprises the following steps: recording when the tasks are distributed, recording when the tasks are run, and recording when the tasks are finished. Judging the running state of the actuator through the audit log, constructing an actuator resource calculation method based on the audit log, calculating the condition that the actuator occupies calculation resources in real time and reporting the condition to a resource management center;

Further, the step S3 specifically includes:

specifically, the RPA flow scheduling is modeled into a variable-size vector boxing problem, and under the resource limit meeting a strong constraint rule, the corresponding executor is allocated to each RPA task with the aim of minimizing the running time of the RPA job and the resource consumption of the executor cluster state;

specifically, the calculation formula of the average execution time is:

specifically, defining an agent as a job scheduler in the cluster, during each time step, the agent observing the cluster state and taking action, receiving rewards and the next observable state from the environment based on the action; wherein the time steps are discrete, event driven;

defining a curtain as the time required by the agent to finish scheduling all tasks in a task list, wherein if any scheduling action violates a strong constraint rule, the current curtain is terminated in advance;

setting a state space as a state of a current actuator cluster state, wherein the state space comprises occupation conditions of CPU and memory resources of all actuators and unit price of each actuator;

setting an action space, wherein actions represent allocation of an actuator for a task and also comprise waiting actions due to insufficient resources of the cluster;

specifically, the intelligent agent obtains instant rewards after each scheduling action, specifically: if the agent successfully schedules a task, a positive instant reward is obtained, and if waiting is selected, a negative instant reward is obtained; at the end of each curtain, the intelligence will get a curtain prize: a positive curtain reward is obtained if all tasks are successfully scheduled; a negative curtain prize is obtained if the curtain ends prematurely due to violation of strong constraint rules;

specifically, the input of the RPA flow scheduling model is the resource consumption condition in the current executor cluster state, including the occupancy rate of the CPU and the memory, and the information of the RPA model; outputting all possible scheduling actions and corresponding values thereof under the current state; for all returned dispatch actions, adoptIs scheduled to act; after the scheduling action is completed, updating parameters of the RPA flow scheduling model according to the rewarding function;

s36: optimizing the RPA flow scheduling model through steps S31-S35 to obtain an optimized RPA flow scheduling model;

specifically, for an RPA task sequence, each new taskThe arrival will trigger a new scheduling decision +.>Wherein the scheduling decision is considered an action, the scheduler uses a state-action cost function in combination with the current state of the executor cluster state to select the scheduling decision; after completing the scheduling decisions of all RPA tasks, a scheduling decision sequence X is returned that minimizes task execution time and cluster resource consumption.

Further, the expression of the task feature set is: t= { T ₁ ,t ₂ ...,t _N N is the total number of tasks;

Further, the expression of the scheduling decision sequence is: x= [ X ] ₁ ,x ₂ ,...,x _n ]；

x _n =indicating that task t is not to be performed _n Assigned toAny actuator; actuator e _m Run length of Q _m Is K (e) _m )，K(e _m ) The sum of the strongly constrained resource requirements of all the tasks in the system cannot exceed e _m Is the remaining capacity of (1), namely:

the average running time of the task is as follows:

。

further, the curtain rewards are set up as follows:

wherein i is the number of the task, j and k are the numbers of the executor;

Cost _T resource consumption for the executor cluster state;

wherein ,Avg_T The average running time of the task is;

s344: the calculation formula of the curtain rewards is as follows:

Further, the step S35 specifically includes:

s351: the expression of the loss function is:

the expression of the objective function is:

The invention provides an RPA flow scheduling system based on deep reinforcement learning, which comprises the following modules:

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third, etc. do not denote any order, but rather the terms first, second, third, etc. are used to interpret the terms as labels.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. The RPA flow scheduling method based on deep reinforcement learning is characterized by comprising the following steps:

2. The RPA process scheduling method based on deep reinforcement learning according to claim 1, wherein step S1 specifically comprises:

3. The RPA process scheduling method based on deep reinforcement learning according to claim 1, wherein step S2 specifically comprises:

4. The RPA process scheduling method based on deep reinforcement learning according to claim 1, wherein step S3 specifically comprises:

5. The RPA process scheduling method based on deep reinforcement learning of claim 4, wherein the expression of the task feature set is: t= { T ₁ ,t ₂ ...,t _N N is the total number of tasks;

6. The deep reinforcement learning-based RPA process scheduling method according to claim 5, wherein the expression of the scheduling decision sequence is: x= [ X ] ₁ ,x ₂ ,...,x _n ]；

the average running time of the task is as follows:

。

7. the RPA process scheduling method based on deep reinforcement learning according to claim 5, wherein the curtain rewards are set in the following steps:

wherein i is the number of the task, j and k are the numbers of the executor;

Cost _T resource consumption for the executor cluster state;

wherein ,Avg_T The average running time of the task is;

s344: the calculation formula of the curtain rewards is as follows:

8. The RPA process scheduling method based on deep reinforcement learning according to claim 5, wherein step S35 is specifically:

s351: the expression of the loss function is:

wherein ,representing data in an experience pool of the DQN algorithm, wherein s represents a current state, a represents an action strategy, r represents a reward, and s' represents a next state；y _i Represents an objective function, Q represents a state cost function, θ _i Representing parameters in the neural network, i being the number of tasks, IE representing mathematical expectations, i.e. +.>Represents->With respect toIs a desired value of (2);

the expression of the objective function is:

9. An RPA process scheduling system based on deep reinforcement learning is characterized by comprising the following modules: