CN116128334A

CN116128334A - Quality inspection task scheduling method, equipment and medium

Info

Publication number: CN116128334A
Application number: CN202211572850.XA
Authority: CN
Inventors: 杨依睿; 杨思洁; 徐韬; 陈欢军; 徐开; 章江铭; 袁健; 佘清顺; 黄俊杰; 姜伟昊; 谢泽楠; 刘思; 周佑
Original assignee: Zhejiang University ZJU; Marketing Service Center of State Grid Zhejiang Electric Power Co Ltd
Current assignee: Zhejiang University ZJU; Marketing Service Center of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2023-05-16

Abstract

The invention discloses a quality inspection task scheduling method, which relates to the technical field of bone mass transfer inspection and is used for solving the problem of lack of a scheduling algorithm in the prior art, and the method comprises the following steps: s1, initializing model training parameters, wherein the model is a reinforcement learning model; s2, constructing scheduling state characteristics; s3, outputting a corresponding action according to the current scheduling state, and decoding the scheduling state to obtain a sample and equipment corresponding to the action; s4, calculating a reward value and updating the training parameters; s5, judging whether the scheduling task is completed or not: when the scheduling task is completed and the training step number is reached, training is finished, otherwise, returning to the step S2; and when the dispatching task is not completed, entering a next dispatching state and returning to the step S2. The invention also discloses quality inspection task scheduling electronic equipment and a computer storage medium. The invention models based on reinforcement learning, and further obtains a scheduling model with good effect.

Description

Quality inspection task scheduling method, equipment and medium

Technical Field

The invention relates to the technical field of bone mass transfer detection, in particular to a quality inspection task scheduling method, equipment and medium based on reinforcement learning.

Background

Quality control by mass transfer is a key work in electric power metering, and automatic quality control task scheduling becomes a natural choice for improving the detection efficiency and accuracy of various meter devices. However, unlike the existing flexible job shop scheduling problem that the working procedure required by each workpiece is fixed, each sample in the quality inspection task scheduling problem has no fixed quality inspection item, a batch of quality inspection tasks are finished on a batch of samples, the optimization space is larger, meanwhile, the quality inspection tasks have nonlinear relations such as serial, parallel, mutual exclusion and the like, and the constraint condition is more complex.

In the existing reinforcement learning method in the flexible job shop scheduling problem, the state characteristics cannot completely describe the scheduling freedom degree and the nonlinear task relation of the quality inspection task scheduling problem, and the reward function cannot reflect the multi-factor influences of task sequence, sample scheduling, equipment scheduling and the like, so that the existing scheduling algorithm cannot be directly applied.

Disclosure of Invention

In order to overcome the defects of the prior art, one of the purposes of the invention is to provide a quality inspection task scheduling method, which constructs a quality inspection task scheduling model based on reinforcement learning, so as to improve the sample detection efficiency in the quality inspection task scheduling process.

One of the purposes of the invention is realized by adopting the following technical scheme:

the quality inspection task scheduling method is characterized by comprising the following steps of:

s1, initializing model training parameters, wherein the model is a reinforcement learning model;

s2, constructing scheduling state characteristics, wherein the scheduling state characteristics are obtained through splicing of task processing time channels, sample-equipment occupancy rate and sample-equipment available time channels;

s3, outputting a corresponding action according to the current scheduling state, and decoding the scheduling state to obtain a sample and equipment corresponding to the action;

s4, calculating a reward value and updating the training parameters according to the action and the decoding result;

s5, judging whether the scheduling task is completed or not:

when the scheduling task is completed and the training step number is reached, training is finished, otherwise, returning to the step S2;

and when the dispatching task is not completed, entering a next dispatching state and returning to the step S2.

Further, the training parameters comprise set batch, training step number, playback buffer, playback time and experience super-parameters.

Further, after calculating the reward value and updating the training parameter according to the action and the decoding result, the method further comprises:

and storing the scheduling state, the action, the decoding result and the rewarding value into a cache pool in a one-to-one correspondence mode, wherein the cache pool is used for experience playback during training.

Further, when the scheduling task is not completed, entering a next scheduling state and returning to step S2, and further including:

judging whether experience playback is needed, if so, performing experience playback, otherwise, entering a next scheduling state and returning to the step S2.

Further, the task processing time channel is a three-dimensional matrix of (n+1) × (m+1) ×j, where n is the number of samples, m is the number of devices, j is the number of quality inspection items, and the three-dimensional matrix of the processing time channel includes matrix elements p _a,b,c 、p _a,m,c P _n,b,c Wherein p is _a,b,c Representing the processing time, p, required for quality inspection task c to be completed by sample a and device b _a,m,c And p _n,b,c Representing the feasibility of the quality inspection task c to process on sample a and device b;

the sample-device occupancy rate channel is a two-dimensional matrix of (n+1) ×9m+1), and the sample-device occupancy rate channel matrix comprises matrix elements u _a,b 、u _a,m U _n,b Wherein u is _a,b Indicating the time of cumulative execution of quality inspection tasks of sample a on device b, u _a,m And u _n, Accumulated processing time for sample a and device b, respectively;

the sample-device available time channel is a two-dimensional matrix of (n+1) × (m+1), and the sample-device available time channel matrix comprises matrix elements l _a,b 、l _a,m L _n,b Wherein l is _a,b Indicating the end time, l, of the task last performed by sample a on device b _a,m And l _n,b The release time is occupied finally for the sample a and the device b respectively;

the spliced scheduling state features are the scheduling state feature representations of the dimensions (n+1× (m+1× (j+2).

Further, corresponding actions are output according to the current scheduling state, and the scheduling state representing principle is satisfied: a, a _i ＝π(S _i )，a _i ＝S _i+1 -S _i ，r _i ＝R(a _i ,S _i ,S _i+1 (whereina) _i S is the current action _i R is the current state _i Rewarding for the current action, R is a rewarding function, and pi is an action selection strategy.

Further, in S3, the direct output action is replaced by an action selection rule, where the action selection rule includes:

(1) Selecting the task with the shortest processing time;

(2) Selecting a task with the longest processing time;

(3) Selecting the task with the least available samples;

(4) Selecting serial tasks;

(5) Selecting parallel tasks;

(6) Selecting a preamble task in the mutually exclusive task pair;

(7) Selecting a subsequent task in the mutually exclusive task pair;

(8) Selecting an unconstrained task;

the decoded heuristic rules include:

rule one: heuristic sample selection is carried out on chromosomes of each individual, samples with shortest test completion time are selected according to the test sequence from front to back, and if a plurality of samples meet the selection conditions, the samples with shortest test completion time are selected;

rule II: heuristic equipment selection is carried out on the chromosome of each individual, the equipment with the shortest completion time is selected based on the selected sample, and if a plurality of equipment meets the selection condition, the equipment with the smallest load is selected.

Further, the calculation of the prize value satisfies:

R＝αU-βE，

wherein R is a reward value, alpha and beta are experience parameters, and U, E is the scheduling environment utilization rate and the hole time respectively;

the calculation of the scheduling environment utilization rate satisfies the following conditions:

wherein, the liquid crystal display device comprises a liquid crystal display device,U _N 、U _M for the sample, the device utilization, u _n,M 、u _N,m For the accumulated processing time of the sample n and the device m in the sample-device occupancy rate channel, C _max The current longest processing time;

the calculation of the cavity time satisfies the following conditions:

wherein E is _N 、E _M For sample, device hole time, l _n,M 、l _N,m For the last occupied release time of sample n, device m in the sample-device usable time channel, u _n,M 、u _N,m The accumulated processing time of the sample n and the device m in the sample-device occupancy channel is obtained.

Another object of the present invention is to provide an electronic device for performing one of the objects of the present invention, which includes a processor, a storage medium, and a computer program stored in the storage medium, wherein the computer program implements the quality inspection task scheduling method described above when executed by the processor.

It is a further object of the present invention to provide a computer readable storage medium storing one of the objects of the present invention, on which a computer program is stored which, when executed by a processor, implements the quality inspection task scheduling method described above.

Compared with the prior art, the invention has the beneficial effects that:

the invention builds the quality inspection task scheduling model based on reinforcement learning, enhances the learning ability of an algorithm to a scheduling state, replaces an intelligent agent to directly learn action decisions through action selection rules, improves the algorithm convergence speed by utilizing heuristic rules, enhances the interpretability of model action selection, can completely describe the scheduling freedom degree and nonlinear task relation of quality inspection task scheduling problems, and can be applied to quality inspection of volume transmission.

Drawings

FIG. 1 is a flow chart of a quality inspection task scheduling method according to an embodiment;

FIG. 2 is a flow chart of an embodiment one quality inspection task scheduling method after adding experience playback;

fig. 3 is a block diagram of the electronic device of the third embodiment.

Detailed Description

The invention will now be described in more detail with reference to the accompanying drawings, to which it should be noted that the description is given below by way of illustration only and not by way of limitation. Various embodiments may be combined with one another to form further embodiments not shown in the following description.

Example 1

An embodiment I provides a quality inspection task scheduling method, which aims at constructing a scheduling model of quality inspection tasks by using reinforcement learning, aiming at serial, parallel and mutually exclusive characteristics existing among the quality inspection tasks, and replacing an agent to directly learn action decisions by using a reinforcement learning action selection rule mechanism.

The quality inspection task scheduling algorithm needs to determine the processing order of quality inspection tasks, and allocate samples and devices for each quality inspection task to obtain the shortest longest processing time, which is essentially a continuous decision process with limited choice. The core of applying reinforcement learning to quality inspection task scheduling algorithms is to convert scheduling problems into a markov process or a semi-markov process, i.e., define states, actions, transition probabilities, and rewards functions.

Aiming at the characteristic that a batch of quality inspection tasks are completed on a batch of samples and machines in the quality inspection task scheduling problem, the embodiment provides a scheduling state representation method suitable for reinforcement learning, and the learning capacity of an algorithm on the scheduling state is enhanced. Referring to fig. 1, a quality inspection task scheduling method includes the following steps:

the training parameters comprise set batch, training step number, playback buffer, playback time and experience super parameters.

in order to enhance the model effect and improve the strategy diversity of training data, a cache pool is added during model training. Specifically, the scheduling state, the action, the decoding result and the rewarding value are stored in a buffer pool in a one-to-one correspondence mode, and the buffer pool is used for experience playback during training. And (3) playing back experience of the buffer pool, namely storing states, actions and feedback as a group of experience samples, and re-reading the experience samples as training data after a certain training step number so as to improve strategy diversity of the training data.

Referring to fig. 2, when the task is not completed, the process enters the next scheduling state and returns to step S2, and further includes:

The next scheduling state, i.e. the new state to which the current state is transferred after the action selection, includes: a queue of tasks to be processed, processed tasks, sample device load, idle time, etc.

When the scheduling status feature is selected in S2, the following principle is followed:

(1) The state characteristics should contain all the information needed for action decisions to fully describe the scheduling context, i.e. satisfy: a, a _i ＝π(S _i ) Wherein a is _i S is the current action _i In the event of a current state,

pi represents an action strategy, namely, inputting a scheduling state and outputting a target action;

(2) The actions corresponding to the transition relation between adjacent scheduling states should be unique, namely, the following is satisfied: a, a _i ＝S _i+1 -S _i ；

(3) The rewards of actions are only related to the front and back states, i.e. the following: r is (r) _i ＝R(a _i ,S _i ,S _i+1 ) Wherein, the method comprises the steps of, wherein,

r _i rewarding for the current action, R is a rewarding function.

The selection of the status features should be related to the scheduling objectives, reducing redundancy of feature information.

Specifically, the task processing time channel is a three-dimensional matrix of (n+1) × (m+1) ×j, where n is the number of samples, m is the number of devices, j is the number of quality inspection items, and the three-dimensional matrix of the processing time channel includes matrix elements p _a,b,c 、p _a,m,c P _n,,c Wherein p is _a,b,c Representing the processing time, p, required for quality inspection task c to be completed by sample a and device b _a,m,c And p _n,b,c Indicating the feasibility of the quality inspection task c to process on the sample a and the device b, wherein 0 indicates that the quality inspection task c cannot be processed on the sample (device) or does not need to be repeatedly executed, and 1 indicates that the quality inspection task c is not executed and can be processed on the sample (device);

a two-dimensional matrix of sample-device occupancy channels 9n+1) × (m+1), the sample-device occupancy channel matrix comprising matrix elements u _a,b 、u _a,m U _n,b Wherein u is _a,b Indicating the time of cumulative execution of quality inspection tasks of sample a on device b, u _a,m And u _n,b Accumulated processing time for sample a and device b, respectively;

the sample-device usable time channel is a two-dimensional matrix of (n+1) × (m+1), the sample-device usable time channel matrix comprising matrix elements l _a,b 、l _a,m L _n,b Wherein l is _a,b Indicating the end time, l, of the task last performed by sample a on device b _a,m And l _n,b The release time is occupied finally for the sample a and the device b respectively;

and splicing the task processing time channel, the sample-equipment occupancy rate and the sample-equipment available time channel to finally obtain the scheduling state characteristic representation with the dimension of (n+1× (m+1× (j+2).

in order to reduce the difficulty of reinforcement learning training and improve the stability of the algorithm, the direct output action is replaced by the action selection rule in the step S3, and one of the following action selection rules is output in the step S3:

(1) Selecting the task with the shortest processing time;

(2) Selecting a task with the longest processing time;

(3) Selecting the task with the least available samples;

(4) Selecting serial tasks;

(5) Selecting parallel tasks;

(6) Selecting a preamble task in the mutually exclusive task pair;

(7) Selecting a subsequent task in the mutually exclusive task pair;

(8) Selecting an unconstrained task;

obtaining an action to be processed based on the selected rule, and decoding the action according to the following heuristic rule:

the calculation of the prize value satisfies the following conditions:

R＝αU-βE，

wherein U is _N 、U _M For the sample, the device utilization, u _n,M 、u _N,m For sample n, device in sample-device occupancy channelm cumulative processing time, C _max The current longest processing time;

the calculation of the cavity time satisfies the following conditions:

wherein E is _N 、E _M For sample, device hole time, l _n,M 、l _N,m For the last occupied release time of sample n, device m in the sample-device usable time channel, u _n,M 、u _N,m The difference value is the cavity time, which is the accumulated processing time of the sample n and the equipment m in the sample-equipment occupancy rate channel.

S5, judging whether the scheduling task is completed or not:

And S5, when the task queue to be processed is empty, the task is scheduled to be completed.

In summary, the embodiment provides a reward function integrating the sample utilization rate and the equipment cavity aiming at the problem of sparse reward of quality inspection task scheduling; aiming at serial, parallel and mutually exclusive characteristics existing among quality inspection tasks, a group of action selection rules are provided to replace an intelligent agent to directly learn action decisions, the convergence speed of an algorithm is improved by using heuristic rules, and the interpretability of model action selection is enhanced; in order to improve strategy diversity of model training samples, the results of scheduling states, action decisions and the like are added into a playback buffer pool, and experience playback is performed according to the training step number.

The trained model is used for inputting the sample, the equipment and the number of test items when the quality inspection task scheduling strategy is adopted, and outputting the sample, the equipment and the test item processing sequence and the corresponding sample and equipment.

Example two

The second embodiment is an experimental result of the quality inspection task scheduling method described in the first embodiment, so as to prove the effectiveness of the method.

In this embodiment, the real data from a mass transfer laboratory of a certain electric network company is adopted for experimental verification, including 56 pieces of experimental item data, 21 pieces of (26 pieces of) equipment data and sample data, and in consideration of confidentiality requirements of related data, the test and the equipment name are represented by serial numbers, which are specifically shown in table 1.

Table 1 test-time-device information table

/>

The nonlinear relationships of serial, parallel, mutual exclusion and device mutual exclusion among 56 experimental terms are also represented by symbols, as shown in table 2:

TABLE 2 nonlinear relationship table

Nonlinear relation	Experimental item
		Task serialization	[53,60]
Task parallelism	[[9],[3]]
		Task mutex	[[33,34],[35,36,37]]
Device mutual exclusion	[6,15],[16,17]

Wherein task 53 and task 60 must be performed sequentially on the same sample; task 9 must be performed simultaneously on 3 samples; any sample, after performing tasks 33 and 34, can no longer perform tasks 35, 36, 37, but is otherwise not limited; devices 6 and 15, and devices 16 and 17 cannot be operated simultaneously.

The experimental parameters were set as follows:

the processors Intel (R), xeon (R) Silver 4110, CPU 2.10GHz, memory 128GB, and display card GTX1080Ti are adopted to adapt to Ubuntu operating system.

The quality inspection task scheduling algorithm parameters based on reinforcement learning are shown in table 3, the training scale is 8000, the experience playback buffer pool is 100000, the target network parameter updating step number is 200, the reward function parameter alpha is set to 0.8, and the parameter beta is set to 1.0.

Analysis of experimental results

The difficulty in solving the quality inspection task scheduling problem is directly related to the quantity of quality inspection tasks, according to the existing data conditions, data sets of different scales are divided for testing, the quantity of test items is set to be 10, 20, 30, 40 and 50 respectively, the number of samples is kept to be 5, and equipment data are determined according to the used test numbers. To verify the effectiveness of the algorithm, the present embodiment selects the classic greedy algorithm MWKR to select the operation with the longest remaining processing time, and compares the operation with the classic Genetic Algorithm (GA) and the particle swarm algorithm (PSO), and the results are shown in table 3 below.

Table 3 single batch algorithm validation table

Table 3 shows the results of quality control task scheduling for single batch samples, respectively completing the 10, 20, 30, 40, 50 order of magnitude experimental items. 10 experiments were performed for each example, and the shortest longest completion time was recorded as an average of the objective function and the algorithm time, which did not contain the model loading time, to measure the algorithm performance. In the aspect of objective function values, the quality of the OURS algorithm is improved by 12.10% compared with the average of the MWKR algorithm, is improved by 2.07% compared with the GA algorithm, and is improved by 3.40% compared with the PSO algorithm. In terms of algorithm time, the efficiency of the OURS algorithm and the MWKR algorithm is improved by more than 99% compared with that of the GA algorithm and the PSO algorithm, and the efficiency of a mode of randomly generating chromosomes and populations is obviously far lower than that of model reasoning without repetition. From experimental results, the OURS algorithm is superior to the existing algorithm in scheduling quality and solving time, and the effectiveness of the OURS algorithm in solving the quality inspection scheduling problem is fully verified.

Example III

Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention, where, as shown in fig. 3, the electronic device includes a processor 210, a memory 220, an input device 230, and an output device 240; the number of processors 210 in the computer device may be one or more, one processor 210 being taken as an example in fig. 3; the processor 210, memory 220, input device 230, and output device 240 in the electronic device may be connected by a bus or other means, for example in fig. 3.

The memory 220 is a computer-readable storage medium that can be used to store software programs, computer-executable programs, and modules. The processor 210 executes the software programs, instructions and modules stored in the memory 220 to perform various functional applications and data processing of the electronic device, i.e., implement the quality inspection task scheduling methods of the first to second embodiments.

The memory 220 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for functions; the storage data area may store data created according to the use of the terminal, etc. In addition, memory 220 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, memory 220 may further include memory remotely located relative to processor 210, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 230 may be used to receive input user identity information, sample data, training parameters, and the like. The output means 240 may comprise a display device such as a display screen.

Example IV

The fourth embodiment of the present invention also provides a storage medium containing computer executable instructions, where the storage medium may be used for a computer to execute a quality inspection task scheduling method, where the method includes:

s5, judging whether the scheduling task is completed or not:

Of course, the storage medium containing the computer executable instructions provided in the embodiments of the present invention is not limited to the above-described method operations, but may also perform the related operations in the quality inspection task scheduling method provided in any embodiment of the present invention.

From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, etc., and include several instructions for causing an electronic device (which may be a mobile phone, a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present invention.

It will be apparent to those skilled in the art from this disclosure that various other changes and modifications can be made which are within the scope of the invention as defined in the appended claims.

Claims

1. The quality inspection task scheduling method is characterized by comprising the following steps of:

s5, judging whether the scheduling task is completed or not:

2. The quality inspection task scheduling method of claim 1, wherein the training parameters include set batch, number of training steps, playback buffer, playback time, experience super parameters.

3. The quality inspection task scheduling method according to claim 2, further comprising, after calculating a reward value and updating the training parameter according to the action and decoding result:

4. A quality inspection task scheduling method according to claim 3, wherein when the scheduled task is not completed, entering a next scheduling state and returning to step S2, further comprising:

5. The quality inspection task scheduling method of claim 1, wherein the task processing time channel is a three-dimensional matrix of (n+1) × (m+1) ×j, where n is the number of samples, m is the number of devices, j is the number of quality inspection items, and the three-dimensional matrix of the processing time channel includes matrix elements p _a，b，c 、p _a，m，c P _n，b，c Wherein p is _a，b，c Representing the processing time, p, required for quality inspection task c to be completed by sample a and device b _a，m，c And p _n，b，c Representing the feasibility of the quality inspection task c to process on sample a and device b;

the sample-device occupancy channel is a two-dimensional matrix of (n+1) × (m+1), the sample-device occupancy channel matrix comprising matrix elements u _a，b 、u _a，m U _n，b Wherein u is _a，b Indicating the time of cumulative execution of quality inspection tasks of sample a on device b, u _a，m And u _n，b Accumulated processing time for sample a and device b, respectively;

the sample-device usable time channel is a two-dimensional matrix of (n+1) × (m+1), the sample-the device-available time channel matrix comprises matrix elements l _a，b 、l _a，m L _n，b Wherein l is _a，b Indicating the end time, l, of the task last performed by sample a on device b _a，m And l _n，b The release time is occupied finally for the sample a and the device b respectively;

the spliced scheduling state characteristics are scheduling state characteristic representations of the dimensions (n+1) x (m+1) x (j+2).

6. The quality inspection task scheduling method of claim 1, wherein the corresponding actions are output according to the current scheduling state, and the scheduling state representation principle is satisfied: a, a _i ＝π(S _i )，a _i ＝S _i+1 -S _i ，r _i ＝R(a _i ，S _i ，S _i+1 ) Wherein a is _i S is the current action _i R is the current state _i Rewarding for the current action, R is a rewarding function, and pi is an action selection strategy.

7. The quality inspection task scheduling method of claim 1 or 6, wherein S3 replaces the direct output action with an action selection rule, the action selection rule comprising:

(1) Selecting the task with the shortest processing time;

(2) Selecting the Qianzhen with the longest processing time;

(3) Selecting the task with the least available samples;

(4) Selecting serial tasks;

(5) Selecting parallel tasks;

(6) Selecting a preamble task in the mutually exclusive task pair;

(7) Selecting a subsequent task in the mutually exclusive task pair;

(8) Selecting an unconstrained task;

the decoded heuristic rules include:

8. The quality inspection task scheduling method of claim 1 wherein the calculation of the prize value satisfies:

R＝αU-βE，

wherein U is _N 、U _M For the sample, the device utilization, u _n,M 、u _N,m For the accumulated processing time of the sample n and the device m in the sample-device occupancy rate channel, C _max The current longest processing time;

the calculation of the cavity time satisfies the following conditions:

9. An electronic device comprising a processor, a storage medium and a computer program stored in the storage medium, characterized in that the computer program, when executed by the processor, implements the quality inspection task scheduling method of any one of claims 1 to 8.

10. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the quality inspection task scheduling method of any one of claims 1 to 8.