CN112101773B

CN112101773B - Multi-agent system task scheduling method and system for process industry

Info

Publication number: CN112101773B
Application number: CN202010948695.1A
Authority: CN
Inventors: 尉秀梅; 胡大鹏; 姜雪松; 朱庆存; 孟超
Original assignee: Qilu University of Technology
Current assignee: Qilu University of Technology
Priority date: 2020-09-10
Filing date: 2020-09-10
Publication date: 2024-06-07
Anticipated expiration: 2040-09-10
Also published as: CN112101773A

Abstract

The scheme is characterized in that a task scheduling model integrating a plurality of production units is built based on MAS technology according to the characteristics of a manufacturing process of the process industry, and meanwhile, a TS_ QLEARNING algorithm is applied to the model to form a task control system applied to the process industry.

Description

Multi-agent system task scheduling method and system for process industry

Technical Field

The disclosure relates to the technical field of control of process industry, in particular to a multi-agent system task scheduling method and system for the process industry.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Modern industry is increasingly dependent on data, and the amount of data in industrial production is beginning to enter the PB level, which causes quality changes in industrial data compared to past production data. In recent years, research on multi-Agent artificial intelligence shows that the multi-Agent system theory in the multi-Agent artificial intelligence provides feasible technical support for the realization of an intelligent manufacturing system, and the multi-Agent system theory also becomes one of research hotspots in the manufacturing field.

In one aspect, currently, multi-Agent control models in the flow industry manufacturing process fall into three categories: centralized, hierarchical, and distributed. The inventors have found that there is a low fault tolerance in the centralized way and may easily lead to security problems, which will lead to a breakdown of the whole system once the central control computer fails; the upper layer and the lower layer in the layering are in subordinate relation, and the upper layer and the lower layer are strongly dependent; compared to centralized and hierarchical systems, the distributed system is relatively independent, each subsystem can achieve local optimization of each subsystem, however, overall optimization of the entire system is difficult to achieve, and requires higher network and computing power requirements.

On the other hand, task scheduling is also one of the important contents of the multi-Agent system, and a reasonable production task scheduling scheme plays an important role in improving the production efficiency of enterprises. Job shop is taken as a production task scheduling problem, is a strong NP-hard problem, and the inventor discovers that a plurality of researchers apply heuristic algorithms to solve the NP-hard problem, but the method has defects, such as the defects that a Q learning algorithm is easy to fall into local optimum, low calculation efficiency and the like when solving large-scale task scheduling.

Disclosure of Invention

In order to solve the problems, the present disclosure provides a multi-agent system task scheduling method and system for process industry; according to the scheme, the improved Q learning algorithm is applied to task scheduling of the multi-agent system in the process industry, so that more excellent operation sequences can be obtained, resources of the multi-agent system are scheduled more reasonably, and idle time of the multi-agent system is reduced.

According to a first aspect of the disclosed embodiments, there is provided a multi-intelligent system task scheduling method for a process industry, including:

constructing an intelligent cooperative control model oriented to the whole process, wherein the model is composed of agents connected with each production stage through a bus by a system Agent;

Acquiring an initial job sequence of a task, and completing field agents required by each job and processing time required by each field Agent for executing each job;

solving a job sequence with the shortest total idle time of the field agents by using a TS_ QLEARNING algorithm;

and the intelligent cooperative control model performs task scheduling according to the job sequence.

Furthermore, the intelligent system control model is of a layered structure, the upper system agents are used for uniform resource scheduling and task allocation, each workshop Agent of the lower layer comprises a workshop control Agent and a plurality of field agents, the system agents issue tasks through buses, task decomposition is achieved through interaction among workshops, the tasks are allocated to the field agents through the workshop control agents, and the field agents cooperate with each other to complete the tasks.

Further, the task scheduling method searches for an optimal job sequence by minimizing the sum of all the on-site Agent idle times.

Further, the scheduling method needs to follow the following constraints:

each field Agent can only execute one operation at a time; the operation of each task can only be executed by one field Agent at a time; once operation is started on the machine, it cannot be interrupted; other task operations cannot be performed until the previous operation is not completed; task operations can only be performed by machines of the same type, and the processing time per Agent on site and the number of agents available on site are known.

Further, the task includes a number of jobs that require processing with a number of field agents.

Further, the TS_ QLEARNING algorithm is a combination of a tabu search algorithm and a Q learning algorithm, and initial solutions of a preset number of operation sequences are obtained through the tabu search algorithm and stored in a tabu table; and carrying out optimization solution by QLEARNING algorithm based on the initial solution in the tabu list to obtain the optimal operation sequence.

Furthermore, in the optimization process of the TS_ QLEARNING algorithm, the idle time is used as a feedback signal, and the complete operation sequence and the corresponding total idle time are obtained through iterative computation.

According to a second aspect of the disclosed embodiments, there is provided a multi-intelligent system task scheduling system for a process industry, comprising:

The model building module is used for building an intelligent cooperative control model facing the whole process, and the model is composed of agents connected with each production stage through a bus by a system Agent;

the data acquisition module is used for acquiring agents required by different tasks and processing time data required by each Agent;

And the optimal job sequence acquisition module is used for solving an optimal job sequence by utilizing a TS_ QLEARNING algorithm, and the intelligent cooperative control model performs task scheduling according to the job sequence.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, including a memory, a processor, and a computer program stored to run on the memory, where the processor implements the multi-agent system task scheduling method for the process industry when executing the program.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the multi-agent system task scheduling method for the process industry.

Compared with the prior art, the beneficial effects of the present disclosure are:

(1) According to the results of simulation experiments, the TS_ QLEARNING algorithm has obvious advantages over the QLEARNING algorithm in terms of task scheduling, and can obtain more excellent job sequences, so that resources of the multi-agent system are more reasonably scheduled.

(2) Because of the excellent convergence rate of the tabu search algorithm and the fact that it can be done before QLEARNING training without consuming much time, the tabu table can be reused as an initially solved memory table. Therefore, the problem that the result is poor due to the strange environment in the early training stage of QLEARNING algorithm is solved.

(3) In the actual production process, there are always some urgent tasks, and the QLEARNING algorithm has poor operability for processing the urgent tasks. And TS_ QLEARNING can adjust the length of the tabu list by setting special amnesty criterion, thereby realizing quick emergency task processing.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the exemplary embodiments of the disclosure and together with the description serve to explain the disclosure, and do not constitute an undue limitation on the disclosure.

FIG. 1 is a flow industry multi-Agent hierarchical control model as described in one embodiment of the present disclosure;

FIG. 2 is an example of a task scheduling Gantt chart according to one embodiment of the present disclosure;

Fig. 3 is a flowchart of the ts_ QLEARNING algorithm described in the first embodiment of the present disclosure.

Detailed Description

The disclosure is further described below with reference to the drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments in accordance with the present disclosure. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

The process industry is an important support for the development of national economy. Modern process industry integrated manufacturing systems are one of the important competing technologies that increase the competitiveness of process enterprises. The selection of a proper model to realize intelligent optimization control of the production process is always the key point of research.

The rapid development of artificial intelligence has also accelerated the level of intelligence in the process industry in the 4.0 era of industry. In industry 4.0, the information physical production system of intelligent plants is the core of this transformation, from intelligent material entering intelligent plants to intelligent products. It is a dynamic configuration production method. The workstation may access the real-time network through the network. All relevant information is automatically switched to the production mode and the production material is replaced according to the information content, so that the production operation mode which is the best match is adjusted.

Aiming at the characteristic of industry 4.0, the current multi-agent control model is difficult to realize global optimization. Finding the appropriate algorithm to achieve collaboration between agents is also a major problem. The disclosure provides an intelligent cooperative control model for a whole process, which consists of a system agent and agents in each production stage. The structure of the model is layered, the upper system agents are mainly used for uniform resource scheduling and task allocation, each workshop using the production steps of multiple agents is equivalent to a small control system, and the multiple agents cooperate with each other to complete tasks; each Agent may communicate with each other. Wherein the multi-Agent control can cooperate with each other to complete a task (as shown in figure one); the task scheduled by the scheduling control Agent is used for realizing global cooperative control of the intelligent manufacturing process; for such a global optimization model, how to implement task scheduling for multi-agent systems is a problem that must be addressed.

Embodiment one:

An object of the present embodiment is to provide a multi-agent system task scheduling method for the process industry.

Task scheduling in the manufacturing process refers to planning, scheduling and arranging various production tasks in space, time and resources under the condition of meeting the technological requirements and the existing production equipment requirements; because the process of producing the product in the process industry or multiple processes of the same product needs to share resources and equipment, the production must be reasonably planned through an algorithm; the aim of the production task scheduling is to reasonably plan and configure resources, determine the processing time and sequence of products in different equipment, and improve the production efficiency; a process industry manufacturing process task schedule may be described as n jobs being processed on m machines; each job contains several production operations that must be performed on different machines. All jobs have the same processing order as they pass through the machine; no priority constraint exists between the operations of different jobs; operation cannot be interrupted and each machine is at time; all machines, each part having the same machining path; the working sequence is arbitrary; the aim is to find an appropriate sequence of operations to minimize the sum of the machine idle times and to take into account the following constraints and assumptions:

(1) Each machine can only perform one operation at a time

(2) The operation of a job being performed by only one machine at a time

(3) Once the operation is started on the machine, it cannot be interrupted

(4) Other job operations cannot be performed until the previous operation is not completed

(5) There is no backup route, i.e., job operations can only be performed by one type of machine, and the operation processing time and the number of operable machines are known in advance.

Based on the above constraints, the present embodiment proposes a multi-intelligent system task scheduling method for process industry, including:

Further, the field Agent in this embodiment represents a machine that executes a job; assuming that there are 4 jobs, each job must be processed on 3 unrelated machines, the resulting job sequence is assumed to be { J _A,J_B,J_C,J_D }.

The time that job i needs to spend on machine J is represented by J _im_j; as shown in fig. 2, a gand diagram for the case of this work sequence is shown.

In fig. 2, x _i (i= {1,2,3,4 }) represents the idle time of different machines during the job, byRepresenting the total idle time of three scheduling tasks; the task scheduling optimization objective is to find a job sequence that minimizes X, and in this disclosure, the idle time outside the gantt chart is defined as "external machine idle time", such as X ₁,x₂. Others are defined as "internal machine idle times", e.g., x ₃,x₄.

In addition, in order to reflect the time required for the sequence to actually complete all jobs, the maximum completion time makespan or C _max required for the jobs to actually complete needs to be calculated to reflect the rationality of the results; if the end result reduces the value of the machine idle time, but in practice more time is needed to complete the task sequence, the result is obviously also unreasonable; the general task scheduling problem is expressed as n/m/C _max, and involves n jobs, each of which needs to be operated on m machines; in Python we define N-dimensional matrices p and C, with a number of rows N and a number of columns M; we can obtain the processing time p (i, J) of job i on machine m from the dataset and job sequence { J ₁,J₂,...,J_n }, then calculate the completion time of C (J _i, J) as follows:

C(J₁,m₁)＝p(J₁,m₁)

C(J_i,m₁)＝C(J_i-1,m₁)+p(J_i,m₁)

C(J₁,m_j)＝C(J₁,m_j-1)+p(J₁,m_j)

C(J_i,m_j)＝max{C(J_i-1,m_j),C(J_i,m_j-1)}+p(i,m_j)

C_max＝C(J_n,m_m)

Wherein i=2, n; j=2..m.

Thus, when job permutation is { J ₁,J₂,...,J_n }, C _max is the time when the last operation of job J _n was completed; since the task scheduling problem is NP-hard problem, a large amount of resources are consumed; we simply substituted the final job sequence trained by QLEARNING algorithm or ts_ QLEARNING algorithm into the above formula to obtain C _max.

Assuming that n jobs need to be executed on m uncorrelated machines, and an optimal job sequence is obtained through training of an algorithm; also, assume that J _k is the first job in the obtained optimal job sequence; then, J _km_l represents the time that job J _k needs to spend on machine m _l; in fact, J _k is the initial solution obtained by the algorithm; from this initial solution J _k, the value of "external idle time" can be obtained, while the calculation of "internal idle time" requires a complete solution.

We define the idle time as T, the external idle time as T _e, and the internal idle time as T _i. Obviously, the total idle time T is equal to the sum of T _e and T _i, and T _e can be obtained according to the following formula:

T_e＝(m-1)J_km₁+(m-2)J_km₂+…+J_km_m-1

from this formula we can conclude that the external idle time is only related to the initial solution and that the larger the number of machines, the greater the impact on the overall.

The Q-based learning algorithm suffers from the disadvantage that it is not known what action should be taken in the invisible state, in other words, the Q-learning agent cannot evaluate the unknown state; this is likely to occur in the early stages of training. To solve this problem, the present disclosure proposes a novel ts_ QLEARNING algorithm that combines a TS algorithm and a Q learning algorithm; in the algorithm, some better initial solutions are recorded through a tabu table; it is emphasized that the TS QLEARNING algorithm does not obtain the best initial solution by the TS algorithm, but instead treats the tabu table as a memory table to exclude some very poor initial solutions, which tend to result in very large external idle times.

Among them, the tabu search is a meta heuristic developed by Glover (1986); in each iteration, the tabu search moves from one solution to an improved solution near the current solution, and a tabu table can be used to prevent some old solutions from having certain features in the iteration of the new solution, so the convergence speed of the TS algorithm is very fast.

Since J _km₁ has the greatest coefficient in the external idle time function, J _km₁ has the greatest impact on external idle time; firstly, defining a candidate solution as { J ₁m₁,J₂m₁,…,J_km₁,…J_nm₁ }, and placing a task with smaller external idle time in the candidate solution into a tabu list until the tabu list is full; the length of the tabu table is set to be 1/3 of the task number, and the length of the tabu table can be adjusted according to actual requirements so as to control the range of the initial solution.

Also, in jobshop schedule training, machine time and tooling costs are used as input parameters, and job sequences are used as variable parameters. The goal is to find a suitable working order to minimize idle time.

To accommodate reinforcement learning methods, states may be reasonably defined as job sequences, or more precisely as job priority relationships. A state change (or operation) is defined as a change in a job priority relationship. Unlike Q-learning, the initial solution of TS QLEARNING is randomly obtained from a list of taboos. Notably, TS_ QLEARNING selects an initial solution, which also amounts to performing an action; likewise, after the action is performed, a reward (i.e., the next state and updated Q-table) is also obtained; when the scheduling problem is solved, different feedback signals can be used, and idle time is adopted as a reward signal in the scheme disclosed by the disclosure, and the specific technical concept is that the shorter the idle time is, the more excellent the action is.

Furthermore, the TS_ QLEARNING algorithm can obtain a required tabu list before training, and the preference is updated continuously along with the training, so that the behavior selection strategy is influenced to converge to the found quasi-optimal operation sequence; after training is completed, a final operation sequence and total idle time are obtained, and then C _max is obtained according to the calculation formula of C _max.

Further, to demonstrate the superiority of the solution of the present disclosure, in this embodiment, the task scheduling results of the method of the present disclosure and the existing Q learning algorithm are verified using the basic scheduling reference examples available in OR-Library.

Wherein the OR-Library is a collection of test data sets for various Operations Research (OR) problems; there are n jobs that need to be performed on m unrelated machines; in this case, each job consists of m non-preemptive operations, each operation of the industry using a different machine at a given time, can wait before being processed, and the degree instance provides three types of instances, and the data sets describe the machines required for each job and the processing time of all jobs in each machine.

To evaluate the quality of the different algorithms, different cases were randomly selected, and 10 operations were performed on the method of the present disclosure and the Q learning algorithm, respectively, to obtain an average value. Tailard there are many instances in the dataset, each instance having a size (work x machine) of 20x5, 20x10, 4x 20x15, 20x20, respectively. We implemented the Q-Learning algorithm and algorithm on Python and run on a device with CPU i7 and 16GB RAM.

To make the experiment more reasonable, the q-learning algorithm is set in this embodiment to have episodes (max_ episodes =10,000) with the same TS-QLEARNING setting as described in this disclosure, learning rate (α=0.1) and discount factor (γ=0.8) to ensure that both algorithms run under the same conditions; for the TS_ QLEARNING algorithm, the length of the tabu list is set to be one third of the number of jobs in the embodiment; the initial solution of the TS_ QLEARNING algorithm is obtained from the tabu list; the final sequence obtained by algorithm training is obtained through the two methods respectively, and the value of C _max is calculated according to the final sequence.

In the experiment, 10,000 iterations of the Q learning algorithm and the ts_ QLearing algorithm were performed for each Taillard questions, and after 10 runs, the average of the experimental results was recorded in table 1. As shown in Table 1, the experimental results (specifically including the results of 16 Taillard examples) performed by selecting examples suitable for different complexity problems are shown

Table 1:Q-Learning and TS_Q-Learning algorithm experimental results

Overall the results show that the idle time obtained by the ts_q learning algorithm is better than the idle time obtained by the Q learning algorithm in any dataset. For the value of C _max, we have also obtained better results in the TS_Q learning algorithm than the Q learning algorithm. Therefore, our algorithm is more advantageous than the Q-learning algorithm in solving the task scheduling problem.

Further, as shown in fig. 3, a flowchart of the ts_ QLEARNING algorithm is shown, and specific steps of the ts_ QLEARNING algorithm are as follows:

Step 1: initializing a tabu table, a Q table and an optimal idle time best.

Step 2: the length of the tabu table, the maximum number of tabu search iterations, and special amnesty criteria are set. And storing the better candidate solutions into a tabu list through tabu search until the maximum iteration number of the tabu search is reached.

Step 3: for each training period, if there are more tasks to complete the scheduling, the iteration is started. The state s is initialized and the task sequence job _ seq.

Step 4: judging whether an initial solution is obtained; if no initial solution is obtained, one is randomly selected from the tabu list and the state s', r after execution is observed. And updates the Q table, state s, and task sequence job_seq according to Q (s, a) ≡ (1- α) Q (s, a) +α [ r+γmax _a' Q (s ', a') ], s≡s ', job_seq fact ζ job_seq+s'; if an initial solution has been obtained, then actions are selected according to the strategy of Q (ε -greed) and the state after execution s', r is observed. And updates the Q table, state s, and task sequence job_seq according to Q (s, a) ≡ (1- α) Q (s, a) +α [ r+γmax _a' Q (s ', a') ], s≡s ', job_seq fact ζ job_seq+s'.

And 5, repeating the step 4 until all tasks are scheduled.

Step 6, if the idle time of s is less than best, then update best according to best≡s

Step 7, repeating the steps 3-5 until reaching QLEARNING maximum iteration times

And finally, outputting the complete operation sequence and the corresponding total idle time.

Embodiment two:

An object of the present embodiment is to provide a multi-intelligent system task scheduling system for the process industry.

A multi-intelligent system task scheduling system for a process industry, comprising:

Embodiment III:

An object of the present embodiment is to provide an electronic apparatus.

An electronic device comprising, a memory, a processor and a computer program stored to run on the memory, the processor implementing the steps of:

Embodiment four:

an object of the present embodiment is to provide a computer-readable storage medium.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps comprising:

The multi-agent system task scheduling method and system for the process industry provided by the embodiment can be completely realized, and have wide application prospects.

The foregoing description of the preferred embodiments of the present disclosure is provided only and not intended to limit the disclosure so that various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

While the specific embodiments of the present disclosure have been described above with reference to the drawings, it should be understood that the present disclosure is not limited to the embodiments, and that various modifications and changes can be made by one skilled in the art without inventive effort on the basis of the technical solutions of the present disclosure while remaining within the scope of the present disclosure.

Claims

1. A multi-intelligent system task scheduling method for a process industry, comprising:

Acquiring an initial job sequence of a task, and completing a field Agent required by each job and processing time required by each field Agent executing each job;

the intelligent cooperative control model performs task scheduling according to the job sequence;

The task scheduling method searches for an optimal job sequence by minimizing the sum of idle time of all on-site agents;

The TS_ QLEARNING algorithm is a combination of a tabu search algorithm and a Q learning algorithm, and initial solutions of a preset number of operation sequences are obtained through the tabu search algorithm and stored in a tabu table; based on the initial solution in the tabu list, carrying out optimization solution through QLEARNING algorithm to obtain an optimal operation sequence;

In the optimization process of the TS_ QLEARNING algorithm, the idle time is used as a feedback signal, and a complete operation sequence and the corresponding total idle time are obtained through iterative calculation; wherein the initial solution of TS QLEARNING is randomly obtained from the list of taboos.

2. The multi-intelligent system task scheduling method for the process industry according to claim 1, wherein the intelligent cooperative control model is of a layered structure, an upper system Agent is used for uniform resource scheduling and task allocation, each lower workshop Agent comprises a workshop control Agent and a plurality of field agents, the system agents issue tasks through buses, task decomposition is achieved through cooperation among workshops, the tasks are allocated to the field agents through the workshop control agents, and the field agents cooperate with each other to complete the tasks.

3. A multi-intelligent system task scheduling method for use in a process industry as claimed in claim 1, wherein said scheduling method requires following constraints:

4. A multi-intelligent system task scheduling method for a process industry, as set forth in claim 1, wherein said tasks comprise jobs that require processing by a plurality of on-site agents.

5. A multi-intelligent system task scheduling system for use in a process industry, comprising:

The optimal operation sequence acquisition module is used for solving an optimal operation sequence by utilizing a TS_ QLEARNING algorithm, and the intelligent cooperative control model performs task scheduling according to the operation sequence;

6. An electronic device comprising a memory, a processor and a computer program stored for execution on the memory, wherein the processor, when executing the program, implements a multi-agent system task scheduling method for a process industry as claimed in any one of claims 1-4.

7. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a multi-agent system task scheduling method for a process industry according to any of claims 1-4.