CN116578403A - RPA flow scheduling method and system based on deep reinforcement learning - Google Patents

RPA flow scheduling method and system based on deep reinforcement learning Download PDF

Info

Publication number
CN116578403A
CN116578403A CN202310834496.1A CN202310834496A CN116578403A CN 116578403 A CN116578403 A CN 116578403A CN 202310834496 A CN202310834496 A CN 202310834496A CN 116578403 A CN116578403 A CN 116578403A
Authority
CN
China
Prior art keywords
rpa
task
executor
actuator
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310834496.1A
Other languages
Chinese (zh)
Inventor
袁水平
陈伟雄
汤旭贤
谢帅宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Sigao Intelligent Technology Co ltd
Original Assignee
Anhui Sigao Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Sigao Intelligent Technology Co ltd filed Critical Anhui Sigao Intelligent Technology Co ltd
Priority to CN202310834496.1A priority Critical patent/CN116578403A/en
Publication of CN116578403A publication Critical patent/CN116578403A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Data Mining & Analysis (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Game Theory and Decision Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Factory Administration (AREA)

Abstract

The invention provides an RPA flow scheduling method and system based on deep reinforcement learning, comprising the following steps: acquiring a task feature set through an RPA task demand list uploaded by a user; the method comprises the steps of obtaining an actuator cluster state of a cluster machine through resource fluctuation and hardware information when an RPA actuator executes; constructing an RPA flow scheduling model, and optimizing the RPA flow scheduling model through a task feature set and an executor cluster state to obtain an optimized RPA flow scheduling model; and performing RPA flow scheduling tasks through the optimized RPA flow scheduling model. The invention fully considers the constraint rule and task characteristics of the RPA task demand list, and monitors the resource fluctuation and hardware information of each machine in the cluster from the registered execution machine in real time, and is used for optimizing the RPA flow scheduling model; and a multi-objective optimization scheduling strategy for execution time and resource cost is formulated through the task feature set and the executor cluster state, so that the average execution time of the RPA job is reduced, and the cluster machine consumption is reduced.

Description

RPA flow scheduling method and system based on deep reinforcement learning
Technical Field
The invention relates to the technical field of distributed task scheduling, in particular to an RPA flow scheduling method and system based on deep reinforcement learning.
Background
Robotic Process Automation (RPA) is a rapidly evolving automation technology in recent years, and the development of artificial intelligence forces people to see what work should be automated and what work should be done by humans. One of the important developments is RPA, which is a high-repeatability task performed by humans in their application's User Interface (UI) using software robots to simulate and replicate, with the greatest advantage of RPA being to implement complex business links by simulating the complex behavior of humans to complete the business in batches, rather than writing large amounts of code and scripts. After a user submits a batch of RPA tasks, it needs to be scheduled on the executor cluster state to specify which executor each task is executed by. This is a classical distributed task scheduling system, and common scheduling algorithms include: a round-robin method, namely numbering cluster nodes and sequentially distributing tasks one by one; greedy method-distributing according to task resource demands and node loads; performing mathematical modeling on the scheduling problem by using a meta heuristic algorithm, and performing iterative optimization to solve a scheduling decision under the condition of meeting resource constraint; reinforcement learning algorithm-according to the resource status of the cluster, a selection scheduling decision is obtained by considering a plurality of optimization targets.
The RPA task emphasizes low machine overhead in the scheduling process, and the resource consumption of the cluster state of the executor is reduced by informing the execution time of the control task, so that the method is more suitable for multi-objective optimization by adopting a deep reinforcement learning method. However, the existing algorithm fails to consider the task characteristics of the RPA, such as dynamic changes of the resources of the executor, and the task resource requirements are divided into strong constraint and weak constraint. Strongly constrained resources must be met if an exclusive display is required; weakly constrained resources are predictions of task execution requirements, such as how much memory is occupied, and such resources can be overloaded during scheduling, but can affect the resource consumption of the cluster. The existing algorithm fails to consider the influence of task resource requirements on a scheduling strategy, so that the scheduling accuracy is insufficient.
Disclosure of Invention
In order to solve the technical problems, the invention provides an RPA flow scheduling method based on deep reinforcement learning, which comprises the following steps:
s1: acquiring a task feature set through an RPA task demand list uploaded by a user;
s2: the method comprises the steps of obtaining an actuator cluster state of a cluster machine through resource fluctuation and hardware information when an RPA actuator executes;
s3: constructing an RPA flow scheduling model, and optimizing the RPA flow scheduling model through a task feature set and an executor cluster state to obtain an optimized RPA flow scheduling model;
s4: and performing RPA flow scheduling tasks through the optimized RPA flow scheduling model.
Preferably, step S1 specifically includes:
the method comprises the steps of obtaining constraint rules and task characteristics of an RPA task demand list, wherein the constraint rules comprise: the task features are network/IO intensive, CPU intensive or memory intensive, and the constraint rules and the task features are converted into task feature sets.
Preferably, step S2 specifically includes:
s21: when the RPA actuator is started, the registration center automatically senses immediately, the RPA actuator is registered to the unified management center, and the running state of the RPA actuator is maintained in the whole process;
s22: when the RPA executor starts a background execution flow, automatically and regularly reporting hardware resources to a registration center, and acquiring hardware information of the RPA executor in real time;
s23: the background execution flow of the RPA executor automatically reports the software environment of the RPA executor to a registration center;
s24: in the running process of the RPA actuator, each assigned task audit log is recorded at any time, the running state of the actuator is judged through the task audit log, an actuator resource calculation model based on the audit log is constructed, and resource fluctuation is obtained through real-time calculation of the actuator resource calculation model and is uploaded to a resource management center;
s25: and acquiring the actuator cluster state of the clustered machine through the resource fluctuation and the hardware information.
Preferably, the step S3 specifically includes:
s31: allocating corresponding executors in the executor cluster state for each task in the task feature set with the aim of minimizing the RPA job running time and the resource consumption of the cluster machine, and constructing a scheduling decision sequence of an RPA flow scheduling model;
s32: setting a system parameter beta for the multi-objective optimization problem to specify the optimization priority of the RPA flow scheduling model, and simultaneously reducing the resource consumption of the executor cluster state and the average execution time of all tasks;
s33: setting key components of the deep reinforcement learning of an RPA flow scheduling model, wherein the key components comprise an intelligent agent, a curtain, a state space and an action space;
s34: setting a reward function of the RPA flow scheduling model, wherein the reward function comprises: instant rewards and curtain rewards;
s35: constructing a loss function, and calculating to obtain a state-action cost function of the RPA flow scheduling model through the loss function and the DQN algorithm;
s36: and optimizing the RPA flow scheduling model through steps S31-S35 to obtain an optimized RPA flow scheduling model.
Preferably, the expression of the task feature set is: t= { T 1 ,t 2 ...,t N N is the total number of tasks;
the expression of the actuator cluster state is: e= { E 1 ,e 2 ...e M M is the total number of actuators;
actuator e m Is of the strongly constrained resource capacity ofWeak constraint resource Capacity is +.>The unit price is P m The method comprises the steps of carrying out a first treatment on the surface of the Task t n Is +.>The weakly constrained resource requirement isTask t n In the actuator e m The execution time on is T nm The method comprises the steps of carrying out a first treatment on the surface of the n is the number of the task, m is the number of the executor, p is the total number of the strongly constrained resource capacity, and q is the total number of the weakly constrained resource capacity.
Preferably, the expression of the scheduling decision sequence is: x= [ X ] 1 ,x 2 ,...,x n ];
wherein ,xn =e(t n ) Representing task t n Dispatch to t n Corresponding actuator e (t n ) Executing on the computer;
x n =indicating that task t is not to be performed n Assigned to any actuator; actuator e m Run length of Q m Is K (e) m ),K(e m ) The sum of the strongly constrained resource requirements of all the tasks in the system cannot exceed e m Is the remaining capacity of (1), namely:
resource consumption Cost of executor cluster state T The method comprises the following steps:
the average running time of the task is as follows:
preferably, the curtain rewards are set by the following steps:
s341: and calculating the maximum value of the state consumption of the executor cluster generated by all tasks in one curtain, wherein the calculation formula is as follows:
wherein i is the number of the task, j and k are the numbers of the executor;
s342: carrying out normalization operation on actual resource consumption in the actuator cluster state, wherein the calculation formula is as follows:
Cost T resource consumption for the executor cluster state;
s343: calculating the maximum average running time and the minimum average running time of all tasks, wherein the calculation formula is as follows:
normalizing the average running time of the actual operation in the actuator cluster state, wherein the calculation formula is as follows:
wherein ,AvgT The average running time of the task is;
s344: the calculation formula of the curtain rewards is as follows:
wherein ,Rfixed Is an adjustable system parameter for controlling the size of the curtain rewards.
Preferably, step S35 specifically includes:
s351: the expression of the loss function is:
wherein ,representing data in an experience pool of the DQN algorithm, wherein s represents a current state, a represents an action strategy, r represents rewards, and s' represents a next state; y is i Represents an objective function, Q represents a state cost function, θ i Representing parameters in the neural network, i being the number of tasks, IE representing mathematical expectations, i.e. +.>Represents->About->Is a desired value of (2);
the expression of the objective function is:
wherein ,representing learning rate, a 'representing action strategy of maximum value in state s'>Representing the corresponding maximum value of the state cost function Q in the state s';
s352: and adjusting parameters of the RPA flow scheduling model to enable the value of the loss function to be smaller than a preset value, and obtaining a state-action cost function Q (s, a).
An RPA process scheduling system based on deep reinforcement learning, comprising:
the task feature set acquisition module is used for acquiring a task feature set through the RPA task demand list uploaded by the user;
the executor cluster state acquisition module is used for acquiring the executor cluster state of the cluster machine through the resource fluctuation and the hardware information when the RPA executor executes;
the scheduling model optimization module is used for constructing an RPA flow scheduling model, optimizing the RPA flow scheduling model through the task feature set and the executor cluster state, and obtaining an optimized RPA flow scheduling model;
and the scheduling module is used for performing RPA flow scheduling tasks through the optimized RPA flow scheduling model.
The invention has the following beneficial effects:
the invention fully considers the constraint rule and task characteristics of the RPA task demand list, and monitors the resource fluctuation and hardware information of each machine in the cluster from the registered execution machine in real time, and is used for optimizing the RPA flow scheduling model; and a multi-objective optimization scheduling strategy for execution time and resource cost is formulated through the task feature set and the executor cluster state, so that the average execution time of the RPA job is reduced, and the cluster machine consumption is reduced.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present invention;
the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, the invention provides an RPA flow scheduling method based on deep reinforcement learning, comprising the following steps:
s1: acquiring a task feature set through an RPA task demand list uploaded by a user;
s2: the method comprises the steps of obtaining an actuator cluster state of a cluster machine through resource fluctuation and hardware information when an RPA actuator executes;
s3: constructing an RPA flow scheduling model, and optimizing the RPA flow scheduling model through a task feature set and an executor cluster state to obtain an optimized RPA flow scheduling model;
s4: and performing RPA flow scheduling tasks through the optimized RPA flow scheduling model.
Further, the step S1 specifically includes:
the method comprises the steps of obtaining constraint rules and task characteristics of an RPA task demand list, wherein the constraint rules comprise: the task features are network/IO intensive, CPU intensive or memory intensive, and the constraint rules and the task features are converted into task feature sets.
Specifically, according to an RPA task demand list uploaded by a user, acquiring key constraint rules in the RPA task demand list, and packaging the key constraint rules into a corresponding RPA model; the constraint rule in the RPA model is converted into an RPA task feature, the RPA model is provided with a strong constraint rule and a weak constraint rule, and the strong constraint rule is a hardware base and a strong user requirement which are necessary for the RPA to run, and mainly comprises hard requirements of whether software is exclusive, whether a display is exclusive, whether interruption is possible, the resolution of the display and the software version is possible; the weakly constrained rules are typically expressed on resources, mainly including estimated run length, CPU, memory and bandwidth requirements; strongly constrained rules generally affect RPA flow operational performance once violations necessarily result in failure of RPA flow scheduling.
For a one-time scheduling process, the M executors E and the N RPA tasks T are specifically expressed as follows:
E={e 1 ,e 2 ,...,e M }
T={t 1 ,t 2 ,...,t N }
actuatore m Is a strongly constrained resource capacity of (1)Cs m And weakly constrained resource capacityCw m The expression is as follows:
taskst n Is a strongly constrained resource requirement of (1)Rs n And weakly constrained resource requirementsRw n The expression is as follows:
further, the step S2 specifically includes:
s21: when the RPA actuator is started, the registration center automatically senses immediately, the RPA actuator is registered to the unified management center, and the running state of the RPA actuator is maintained in the whole process;
specifically, for better automatic operation and maintenance management, an actuator automatic registration is provided; when the executor is automatically registered, the registration center of the executor immediately and automatically senses when the RPA executor is started and registers the executor to the unified management center, so that the global non-sensing dynamic capacity expansion and contraction of an administrator are realized; the whole process maintains various running states of the executor, and realizes high availability, high expandability and high operability of the executor in the cluster;
s22: when the RPA executor starts a background execution flow, automatically and regularly reporting hardware resources to a registration center, and acquiring hardware information of the RPA executor in real time;
specifically, aiming at better measuring the current execution capacity and the maximum execution capacity of the executor, the method is used for more accurate and efficient scheduling, and provides real-time dynamic collection of hardware-level resource information of the executor; the executor starts background execution flow, automatically reports hardware resources of the executor machine to a registration center at regular time, and achieves the purpose of collecting the maximum memory capacity, the memory utilization rate, the maximum CPU core number, the CPU utilization rate, the network bandwidth and the disk IO rate of the executor machine in real time; collecting computing hardware resources, and collecting common external equipment hardware resources such as display resolution, display quantity, camera quantity, microphone quantity and the like so as to support more fine-grained, accurate and reasonable RPA task scheduling and distribution;
the CPU utilization rate calculation method comprises the following steps:
wherein kernel is the difference between the current operating system kernel clock and the operating system kernel clock before the specific time, user is the difference between the current user process clock and the user process clock before the specific time, and idle is the interval length between the starting time point and the ending time point which need to be counted when the utilization rate is calculated;
the memory usage calculation formula is:
wherein total refers to the total physical memory capacity of the actuator machine, buffered refers to the buffer size of the operating system for the block device, cached refers to the buffer size of the operating system for the file system, and free refers to the free memory capacity of the physical memory of the current actuator machine;
the IO rate is expressed in terms of IO Time IO_Time once:
seek_time refers to the average addressing time of the disk, rotation_speed refers to the average rotation delay of the disk, IO_chunk_size refers to the size of single IO data amount of the disk, and transfer_rate refers to the maximum read-write rate of the disk;
s23: the background execution flow of the RPA executor automatically reports the software environment of the RPA executor to a registration center;
specifically, for determining the execution environment of the executor software, the ecological detection of the executor software is provided, and the background execution flow of the executor is automatically reported to an executor registry; the specific detected software environment comprises a browser kernel version, a browser release version, an excel version, a word version and the like;
s24: in the running process of the RPA actuator, each assigned task audit log is recorded at any time, the running state of the actuator is judged through the task audit log, an actuator resource calculation model based on the audit log is constructed, and resource fluctuation is obtained through real-time calculation of the actuator resource calculation model and is uploaded to a resource management center;
specifically, aiming at the conditions of task load and resource use of an executor, statistics and automatic collection of the executor task are provided. During the operation of the executor, each assigned task audit log is recorded at any time, and the method comprises the following steps: recording when the tasks are distributed, recording when the tasks are run, and recording when the tasks are finished. Judging the running state of the actuator through the audit log, constructing an actuator resource calculation method based on the audit log, calculating the condition that the actuator occupies calculation resources in real time and reporting the condition to a resource management center;
s25: and acquiring the actuator cluster state of the clustered machine through the resource fluctuation and the hardware information.
Further, the step S3 specifically includes:
s31: allocating corresponding executors in the executor cluster state for each task in the task feature set with the aim of minimizing the RPA job running time and the resource consumption of the cluster machine, and constructing a scheduling decision sequence of an RPA flow scheduling model;
specifically, the RPA flow scheduling is modeled into a variable-size vector boxing problem, and under the resource limit meeting a strong constraint rule, the corresponding executor is allocated to each RPA task with the aim of minimizing the running time of the RPA job and the resource consumption of the executor cluster state;
s32: setting a system parameter beta for the multi-objective optimization problem to specify the optimization priority of the RPA flow scheduling model, and simultaneously reducing the resource consumption of the executor cluster state and the average execution time of all tasks;
specifically, the calculation formula of the average execution time is:
s33: setting key components of the deep reinforcement learning of an RPA flow scheduling model, wherein the key components comprise an intelligent agent, a curtain, a state space and an action space;
specifically, defining an agent as a job scheduler in the cluster, during each time step, the agent observing the cluster state and taking action, receiving rewards and the next observable state from the environment based on the action; wherein the time steps are discrete, event driven;
defining a curtain as the time required by the agent to finish scheduling all tasks in a task list, wherein if any scheduling action violates a strong constraint rule, the current curtain is terminated in advance;
setting a state space as a state of a current actuator cluster state, wherein the state space comprises occupation conditions of CPU and memory resources of all actuators and unit price of each actuator;
setting an action space, wherein actions represent allocation of an actuator for a task and also comprise waiting actions due to insufficient resources of the cluster;
s34: setting a reward function of the RPA flow scheduling model, wherein the reward function comprises: instant rewards and curtain rewards;
specifically, the intelligent agent obtains instant rewards after each scheduling action, specifically: if the agent successfully schedules a task, a positive instant reward is obtained, and if waiting is selected, a negative instant reward is obtained; at the end of each curtain, the intelligence will get a curtain prize: a positive curtain reward is obtained if all tasks are successfully scheduled; a negative curtain prize is obtained if the curtain ends prematurely due to violation of strong constraint rules;
s35: constructing a loss function, and calculating to obtain a state-action cost function of the RPA flow scheduling model through the loss function and the DQN algorithm;
specifically, the input of the RPA flow scheduling model is the resource consumption condition in the current executor cluster state, including the occupancy rate of the CPU and the memory, and the information of the RPA model; outputting all possible scheduling actions and corresponding values thereof under the current state; for all returned dispatch actions, adoptIs scheduled to act; after the scheduling action is completed, updating parameters of the RPA flow scheduling model according to the rewarding function;
s36: optimizing the RPA flow scheduling model through steps S31-S35 to obtain an optimized RPA flow scheduling model;
specifically, for an RPA task sequence, each new taskThe arrival will trigger a new scheduling decision +.>Wherein the scheduling decision is considered an action, the scheduler uses a state-action cost function in combination with the current state of the executor cluster state to select the scheduling decision; after completing the scheduling decisions of all RPA tasks, a scheduling decision sequence X is returned that minimizes task execution time and cluster resource consumption.
Further, the expression of the task feature set is: t= { T 1 ,t 2 ...,t N N is the total number of tasks;
the expression of the actuator cluster state is: e= { E 1 ,e 2 ...e M M is the total number of actuators;
actuator e m Is of the strongly constrained resource capacity ofWeak constraint resource Capacity is +.>The unit price is P m The method comprises the steps of carrying out a first treatment on the surface of the Task t n Is +.>The weakly constrained resource requirement isTask t n In the actuator e m The execution time on is T nm The method comprises the steps of carrying out a first treatment on the surface of the n is the number of the task, m is the number of the executor, p is the total number of the strongly constrained resource capacity, and q is the total number of the weakly constrained resource capacity.
Further, the expression of the scheduling decision sequence is: x= [ X ] 1 ,x 2 ,...,x n ];
wherein ,xn =e(t n ) Representing task t n Dispatch to t n Corresponding actuator e (t n ) Executing on the computer;
x n =indicating that task t is not to be performed n Assigned toAny actuator; actuator e m Run length of Q m Is K (e) m ),K(e m ) The sum of the strongly constrained resource requirements of all the tasks in the system cannot exceed e m Is the remaining capacity of (1), namely:
resource consumption Cost of executor cluster state T The method comprises the following steps:
the average running time of the task is as follows:
further, the curtain rewards are set up as follows:
s341: and calculating the maximum value of the state consumption of the executor cluster generated by all tasks in one curtain, wherein the calculation formula is as follows:
wherein i is the number of the task, j and k are the numbers of the executor;
s342: carrying out normalization operation on actual resource consumption in the actuator cluster state, wherein the calculation formula is as follows:
Cost T resource consumption for the executor cluster state;
s343: calculating the maximum average running time and the minimum average running time of all tasks, wherein the calculation formula is as follows:
normalizing the average running time of the actual operation in the actuator cluster state, wherein the calculation formula is as follows:
wherein ,AvgT The average running time of the task is;
s344: the calculation formula of the curtain rewards is as follows:
wherein ,Rfixed Is an adjustable system parameter for controlling the size of the curtain rewards.
Further, the step S35 specifically includes:
s351: the expression of the loss function is:
wherein ,representing data in an experience pool of the DQN algorithm, wherein s represents a current state, a represents an action strategy, r represents rewards, and s' represents a next state; y is i Represents an objective function, Q represents a state cost function, θ i Representing parameters in the neural network, i being the number of tasks, IE representing mathematical expectations, i.e. +.>Represents->About->Is a desired value of (2);
the expression of the objective function is:
wherein ,representing learning rate, a 'representing action strategy of maximum value in state s'>Representing the corresponding maximum value of the state cost function Q in the state s';
s352: and adjusting parameters of the RPA flow scheduling model to enable the value of the loss function to be smaller than a preset value, and obtaining a state-action cost function Q (s, a).
The invention provides an RPA flow scheduling system based on deep reinforcement learning, which comprises the following modules:
the task feature set acquisition module is used for acquiring a task feature set through the RPA task demand list uploaded by the user;
the executor cluster state acquisition module is used for acquiring the executor cluster state of the cluster machine through the resource fluctuation and the hardware information when the RPA executor executes;
the scheduling model optimization module is used for constructing an RPA flow scheduling model, optimizing the RPA flow scheduling model through the task feature set and the executor cluster state, and obtaining an optimized RPA flow scheduling model;
and the scheduling module is used for performing RPA flow scheduling tasks through the optimized RPA flow scheduling model.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third, etc. do not denote any order, but rather the terms first, second, third, etc. are used to interpret the terms as labels.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (9)

1. The RPA flow scheduling method based on deep reinforcement learning is characterized by comprising the following steps:
s1: acquiring a task feature set through an RPA task demand list uploaded by a user;
s2: the method comprises the steps of obtaining an actuator cluster state of a cluster machine through resource fluctuation and hardware information when an RPA actuator executes;
s3: constructing an RPA flow scheduling model, and optimizing the RPA flow scheduling model through a task feature set and an executor cluster state to obtain an optimized RPA flow scheduling model;
s4: and performing RPA flow scheduling tasks through the optimized RPA flow scheduling model.
2. The RPA process scheduling method based on deep reinforcement learning according to claim 1, wherein step S1 specifically comprises:
the method comprises the steps of obtaining constraint rules and task characteristics of an RPA task demand list, wherein the constraint rules comprise: the task features are network/IO intensive, CPU intensive or memory intensive, and the constraint rules and the task features are converted into task feature sets.
3. The RPA process scheduling method based on deep reinforcement learning according to claim 1, wherein step S2 specifically comprises:
s21: when the RPA actuator is started, the registration center automatically senses immediately, the RPA actuator is registered to the unified management center, and the running state of the RPA actuator is maintained in the whole process;
s22: when the RPA executor starts a background execution flow, automatically and regularly reporting hardware resources to a registration center, and acquiring hardware information of the RPA executor in real time;
s23: the background execution flow of the RPA executor automatically reports the software environment of the RPA executor to a registration center;
s24: in the running process of the RPA actuator, each assigned task audit log is recorded at any time, the running state of the actuator is judged through the task audit log, an actuator resource calculation model based on the audit log is constructed, and resource fluctuation is obtained through real-time calculation of the actuator resource calculation model and is uploaded to a resource management center;
s25: and acquiring the actuator cluster state of the clustered machine through the resource fluctuation and the hardware information.
4. The RPA process scheduling method based on deep reinforcement learning according to claim 1, wherein step S3 specifically comprises:
s31: allocating corresponding executors in the executor cluster state for each task in the task feature set with the aim of minimizing the RPA job running time and the resource consumption of the cluster machine, and constructing a scheduling decision sequence of an RPA flow scheduling model;
s32: setting a system parameter beta for the multi-objective optimization problem to specify the optimization priority of the RPA flow scheduling model, and simultaneously reducing the resource consumption of the executor cluster state and the average execution time of all tasks;
s33: setting key components of the deep reinforcement learning of an RPA flow scheduling model, wherein the key components comprise an intelligent agent, a curtain, a state space and an action space;
s34: setting a reward function of the RPA flow scheduling model, wherein the reward function comprises: instant rewards and curtain rewards;
s35: constructing a loss function, and calculating to obtain a state-action cost function of the RPA flow scheduling model through the loss function and the DQN algorithm;
s36: and optimizing the RPA flow scheduling model through steps S31-S35 to obtain an optimized RPA flow scheduling model.
5. The RPA process scheduling method based on deep reinforcement learning of claim 4, wherein the expression of the task feature set is: t= { T 1 ,t 2 ...,t N N is the total number of tasks;
the expression of the actuator cluster state is: e= { E 1 ,e 2 ...e M M is the total number of actuators;
actuator e m Is of the strongly constrained resource capacity ofWeak constraint resource Capacity is +.>The unit price is P m The method comprises the steps of carrying out a first treatment on the surface of the Task t n Is +.>The weakly constrained resource requirement isTask t n In the actuator e m The execution time on is T nm The method comprises the steps of carrying out a first treatment on the surface of the n is the number of the task, m is the number of the executor, p is the total number of the strongly constrained resource capacity, and q is the total number of the weakly constrained resource capacity.
6. The deep reinforcement learning-based RPA process scheduling method according to claim 5, wherein the expression of the scheduling decision sequence is: x= [ X ] 1 ,x 2 ,...,x n ];
wherein ,xn =e(t n ) Representing task t n Dispatch to t n Corresponding actuator e (t n ) Executing on the computer;
x n =indicating that task t is not to be performed n Assigned to any actuator; actuator e m Run length of Q m Is K (e) m ),K(e m ) The sum of the strongly constrained resource requirements of all the tasks in the system cannot exceed e m Is the remaining capacity of (1), namely:
resource consumption Cost of executor cluster state T The method comprises the following steps:
the average running time of the task is as follows:
7. the RPA process scheduling method based on deep reinforcement learning according to claim 5, wherein the curtain rewards are set in the following steps:
s341: and calculating the maximum value of the state consumption of the executor cluster generated by all tasks in one curtain, wherein the calculation formula is as follows:
wherein i is the number of the task, j and k are the numbers of the executor;
s342: carrying out normalization operation on actual resource consumption in the actuator cluster state, wherein the calculation formula is as follows:
Cost T resource consumption for the executor cluster state;
s343: calculating the maximum average running time and the minimum average running time of all tasks, wherein the calculation formula is as follows:
normalizing the average running time of the actual operation in the actuator cluster state, wherein the calculation formula is as follows:
wherein ,AvgT The average running time of the task is;
s344: the calculation formula of the curtain rewards is as follows:
wherein ,Rfixed Is an adjustable system parameter for controlling the size of the curtain rewards.
8. The RPA process scheduling method based on deep reinforcement learning according to claim 5, wherein step S35 is specifically:
s351: the expression of the loss function is:
wherein ,representing data in an experience pool of the DQN algorithm, wherein s represents a current state, a represents an action strategy, r represents a reward, and s' represents a next state;y i Represents an objective function, Q represents a state cost function, θ i Representing parameters in the neural network, i being the number of tasks, IE representing mathematical expectations, i.e. +.>Represents->With respect toIs a desired value of (2);
the expression of the objective function is:
wherein ,representing learning rate, a 'representing action strategy of maximum value in state s'>Representing the corresponding maximum value of the state cost function Q in the state s';
s352: and adjusting parameters of the RPA flow scheduling model to enable the value of the loss function to be smaller than a preset value, and obtaining a state-action cost function Q (s, a).
9. An RPA process scheduling system based on deep reinforcement learning is characterized by comprising the following modules:
the task feature set acquisition module is used for acquiring a task feature set through the RPA task demand list uploaded by the user;
the executor cluster state acquisition module is used for acquiring the executor cluster state of the cluster machine through the resource fluctuation and the hardware information when the RPA executor executes;
the scheduling model optimization module is used for constructing an RPA flow scheduling model, optimizing the RPA flow scheduling model through the task feature set and the executor cluster state, and obtaining an optimized RPA flow scheduling model;
and the scheduling module is used for performing RPA flow scheduling tasks through the optimized RPA flow scheduling model.
CN202310834496.1A 2023-07-10 2023-07-10 RPA flow scheduling method and system based on deep reinforcement learning Pending CN116578403A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310834496.1A CN116578403A (en) 2023-07-10 2023-07-10 RPA flow scheduling method and system based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310834496.1A CN116578403A (en) 2023-07-10 2023-07-10 RPA flow scheduling method and system based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN116578403A true CN116578403A (en) 2023-08-11

Family

ID=87536118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310834496.1A Pending CN116578403A (en) 2023-07-10 2023-07-10 RPA flow scheduling method and system based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN116578403A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117350681A (en) * 2023-12-04 2024-01-05 尚恰实业有限公司 Automatic approval method and system based on RPA robot sharing center
CN117634867A (en) * 2024-01-26 2024-03-01 杭州实在智能科技有限公司 RPA flow automatic construction method and system combining large language model and reinforcement learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109388484A (en) * 2018-08-16 2019-02-26 广东石油化工学院 A kind of more resource cloud job scheduling methods based on Deep Q-network algorithm
CN110489223A (en) * 2019-08-26 2019-11-22 北京邮电大学 Method for scheduling task, device and electronic equipment in a kind of isomeric group
CN111722910A (en) * 2020-06-19 2020-09-29 广东石油化工学院 Cloud job scheduling and resource allocation method
CN114281528A (en) * 2021-12-10 2022-04-05 重庆邮电大学 Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster
CN114842507A (en) * 2022-05-20 2022-08-02 中国电子科技集团公司第五十四研究所 Reinforced pedestrian attribute identification method based on group optimization reward
CN115408136A (en) * 2022-11-01 2022-11-29 安徽思高智能科技有限公司 RPA flow scheduling method based on genetic algorithm

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109388484A (en) * 2018-08-16 2019-02-26 广东石油化工学院 A kind of more resource cloud job scheduling methods based on Deep Q-network algorithm
CN110489223A (en) * 2019-08-26 2019-11-22 北京邮电大学 Method for scheduling task, device and electronic equipment in a kind of isomeric group
CN111722910A (en) * 2020-06-19 2020-09-29 广东石油化工学院 Cloud job scheduling and resource allocation method
CN114281528A (en) * 2021-12-10 2022-04-05 重庆邮电大学 Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster
CN114842507A (en) * 2022-05-20 2022-08-02 中国电子科技集团公司第五十四研究所 Reinforced pedestrian attribute identification method based on group optimization reward
CN115408136A (en) * 2022-11-01 2022-11-29 安徽思高智能科技有限公司 RPA flow scheduling method based on genetic algorithm

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117350681A (en) * 2023-12-04 2024-01-05 尚恰实业有限公司 Automatic approval method and system based on RPA robot sharing center
CN117350681B (en) * 2023-12-04 2024-03-08 尚恰实业有限公司 Automatic approval method and system based on RPA robot sharing center
CN117634867A (en) * 2024-01-26 2024-03-01 杭州实在智能科技有限公司 RPA flow automatic construction method and system combining large language model and reinforcement learning
CN117634867B (en) * 2024-01-26 2024-05-24 杭州实在智能科技有限公司 RPA flow automatic construction method and system combining large language model and reinforcement learning

Similar Documents

Publication Publication Date Title
Tuli et al. COSCO: Container orchestration using co-simulation and gradient based optimization for fog computing environments
CN116578403A (en) RPA flow scheduling method and system based on deep reinforcement learning
CN115408136B (en) RPA flow scheduling method based on genetic algorithm
CN110389820B (en) Private cloud task scheduling method for resource prediction based on v-TGRU model
Fan et al. Deep reinforcement agent for scheduling in HPC
CN115037749B (en) Large-scale micro-service intelligent multi-resource collaborative scheduling method and system
CN107404523A (en) Cloud platform adaptive resource dispatches system and method
CN114756358B (en) DAG task scheduling method, device, equipment and storage medium
You et al. Comprehensive workload analysis and modeling of a petascale supercomputer
CN115237581B (en) Heterogeneous computing power-oriented multi-strategy intelligent scheduling method and device
Cohen et al. Managing stochastic, finite capacity, multi-project systems through the cross-entropy methodology
CN112052081B (en) Task scheduling method and device and electronic equipment
Fan et al. Dras: Deep reinforcement learning for cluster scheduling in high performance computing
CN116610082B (en) RPA job workflow redundancy scheduling method and system based on deep reinforcement learning
Funika et al. Automated cloud resources provisioning with the use of the proximal policy optimization
CN117707759A (en) Multi-tenant GPU cluster elastic quota scheduling method and system
Salamun et al. Evolving scheduling heuristics with genetic programming for optimization of quality of service in weakly hard real-time systems
Khanli et al. An approach to grid resource selection and fault management based on ECA rules
CN111625352A (en) Scheduling method, device and storage medium
Vahidipour et al. Priority assignment in queuing systems with unknown characteristics using learning automata and adaptive stochastic Petri nets
CN116932198A (en) Resource scheduling method, device, electronic equipment and readable storage medium
Islam Network Load Balancing Methods: Experimental Comparisons and Improvement
Wei et al. Composite rules selection using reinforcement learning for dynamic job-shop scheduling
Xing et al. A QoS-oriented Scheduling and Autoscaling Framework for Deep Learning
Dhok et al. Using pattern classification for task assignment in mapreduce

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20230811