CN116128334A - Quality inspection task scheduling method, equipment and medium - Google Patents

Quality inspection task scheduling method, equipment and medium Download PDF

Info

Publication number
CN116128334A
CN116128334A CN202211572850.XA CN202211572850A CN116128334A CN 116128334 A CN116128334 A CN 116128334A CN 202211572850 A CN202211572850 A CN 202211572850A CN 116128334 A CN116128334 A CN 116128334A
Authority
CN
China
Prior art keywords
sample
task
scheduling
quality inspection
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211572850.XA
Other languages
Chinese (zh)
Inventor
杨依睿
杨思洁
徐韬
陈欢军
徐开
章江铭
袁健
佘清顺
黄俊杰
姜伟昊
谢泽楠
刘思
周佑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Marketing Service Center of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
Zhejiang University ZJU
Marketing Service Center of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU, Marketing Service Center of State Grid Zhejiang Electric Power Co Ltd filed Critical Zhejiang University ZJU
Priority to CN202211572850.XA priority Critical patent/CN116128334A/en
Publication of CN116128334A publication Critical patent/CN116128334A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a quality inspection task scheduling method, which relates to the technical field of bone mass transfer inspection and is used for solving the problem of lack of a scheduling algorithm in the prior art, and the method comprises the following steps: s1, initializing model training parameters, wherein the model is a reinforcement learning model; s2, constructing scheduling state characteristics; s3, outputting a corresponding action according to the current scheduling state, and decoding the scheduling state to obtain a sample and equipment corresponding to the action; s4, calculating a reward value and updating the training parameters; s5, judging whether the scheduling task is completed or not: when the scheduling task is completed and the training step number is reached, training is finished, otherwise, returning to the step S2; and when the dispatching task is not completed, entering a next dispatching state and returning to the step S2. The invention also discloses quality inspection task scheduling electronic equipment and a computer storage medium. The invention models based on reinforcement learning, and further obtains a scheduling model with good effect.

Description

Quality inspection task scheduling method, equipment and medium
Technical Field
The invention relates to the technical field of bone mass transfer detection, in particular to a quality inspection task scheduling method, equipment and medium based on reinforcement learning.
Background
Quality control by mass transfer is a key work in electric power metering, and automatic quality control task scheduling becomes a natural choice for improving the detection efficiency and accuracy of various meter devices. However, unlike the existing flexible job shop scheduling problem that the working procedure required by each workpiece is fixed, each sample in the quality inspection task scheduling problem has no fixed quality inspection item, a batch of quality inspection tasks are finished on a batch of samples, the optimization space is larger, meanwhile, the quality inspection tasks have nonlinear relations such as serial, parallel, mutual exclusion and the like, and the constraint condition is more complex.
In the existing reinforcement learning method in the flexible job shop scheduling problem, the state characteristics cannot completely describe the scheduling freedom degree and the nonlinear task relation of the quality inspection task scheduling problem, and the reward function cannot reflect the multi-factor influences of task sequence, sample scheduling, equipment scheduling and the like, so that the existing scheduling algorithm cannot be directly applied.
Disclosure of Invention
In order to overcome the defects of the prior art, one of the purposes of the invention is to provide a quality inspection task scheduling method, which constructs a quality inspection task scheduling model based on reinforcement learning, so as to improve the sample detection efficiency in the quality inspection task scheduling process.
One of the purposes of the invention is realized by adopting the following technical scheme:
the quality inspection task scheduling method is characterized by comprising the following steps of:
s1, initializing model training parameters, wherein the model is a reinforcement learning model;
s2, constructing scheduling state characteristics, wherein the scheduling state characteristics are obtained through splicing of task processing time channels, sample-equipment occupancy rate and sample-equipment available time channels;
s3, outputting a corresponding action according to the current scheduling state, and decoding the scheduling state to obtain a sample and equipment corresponding to the action;
s4, calculating a reward value and updating the training parameters according to the action and the decoding result;
s5, judging whether the scheduling task is completed or not:
when the scheduling task is completed and the training step number is reached, training is finished, otherwise, returning to the step S2;
and when the dispatching task is not completed, entering a next dispatching state and returning to the step S2.
Further, the training parameters comprise set batch, training step number, playback buffer, playback time and experience super-parameters.
Further, after calculating the reward value and updating the training parameter according to the action and the decoding result, the method further comprises:
and storing the scheduling state, the action, the decoding result and the rewarding value into a cache pool in a one-to-one correspondence mode, wherein the cache pool is used for experience playback during training.
Further, when the scheduling task is not completed, entering a next scheduling state and returning to step S2, and further including:
judging whether experience playback is needed, if so, performing experience playback, otherwise, entering a next scheduling state and returning to the step S2.
Further, the task processing time channel is a three-dimensional matrix of (n+1) × (m+1) ×j, where n is the number of samples, m is the number of devices, j is the number of quality inspection items, and the three-dimensional matrix of the processing time channel includes matrix elements p a,b,c 、p a,m,c P n,b,c Wherein p is a,b,c Representing the processing time, p, required for quality inspection task c to be completed by sample a and device b a,m,c And p n,b,c Representing the feasibility of the quality inspection task c to process on sample a and device b;
the sample-device occupancy rate channel is a two-dimensional matrix of (n+1) ×9m+1), and the sample-device occupancy rate channel matrix comprises matrix elements u a,b 、u a,m U n,b Wherein u is a,b Indicating the time of cumulative execution of quality inspection tasks of sample a on device b, u a,m And u n, Accumulated processing time for sample a and device b, respectively;
the sample-device available time channel is a two-dimensional matrix of (n+1) × (m+1), and the sample-device available time channel matrix comprises matrix elements l a,b 、l a,m L n,b Wherein l is a,b Indicating the end time, l, of the task last performed by sample a on device b a,m And l n,b The release time is occupied finally for the sample a and the device b respectively;
the spliced scheduling state features are the scheduling state feature representations of the dimensions (n+1× (m+1× (j+2).
Further, corresponding actions are output according to the current scheduling state, and the scheduling state representing principle is satisfied: a, a i =π(S i ),a i =S i+1 -S i ,r i =R(a i ,S i ,S i+1 (whereina) i S is the current action i R is the current state i Rewarding for the current action, R is a rewarding function, and pi is an action selection strategy.
Further, in S3, the direct output action is replaced by an action selection rule, where the action selection rule includes:
(1) Selecting the task with the shortest processing time;
(2) Selecting a task with the longest processing time;
(3) Selecting the task with the least available samples;
(4) Selecting serial tasks;
(5) Selecting parallel tasks;
(6) Selecting a preamble task in the mutually exclusive task pair;
(7) Selecting a subsequent task in the mutually exclusive task pair;
(8) Selecting an unconstrained task;
the decoded heuristic rules include:
rule one: heuristic sample selection is carried out on chromosomes of each individual, samples with shortest test completion time are selected according to the test sequence from front to back, and if a plurality of samples meet the selection conditions, the samples with shortest test completion time are selected;
rule II: heuristic equipment selection is carried out on the chromosome of each individual, the equipment with the shortest completion time is selected based on the selected sample, and if a plurality of equipment meets the selection condition, the equipment with the smallest load is selected.
Further, the calculation of the prize value satisfies:
R=αU-βE,
wherein R is a reward value, alpha and beta are experience parameters, and U, E is the scheduling environment utilization rate and the hole time respectively;
the calculation of the scheduling environment utilization rate satisfies the following conditions:
Figure BDA0003988790460000041
wherein, the liquid crystal display device comprises a liquid crystal display device,U N 、U M for the sample, the device utilization, u n,M 、u N,m For the accumulated processing time of the sample n and the device m in the sample-device occupancy rate channel, C max The current longest processing time;
the calculation of the cavity time satisfies the following conditions:
Figure BDA0003988790460000042
wherein E is N 、E M For sample, device hole time, l n,M 、l N,m For the last occupied release time of sample n, device m in the sample-device usable time channel, u n,M 、u N,m The accumulated processing time of the sample n and the device m in the sample-device occupancy channel is obtained.
Another object of the present invention is to provide an electronic device for performing one of the objects of the present invention, which includes a processor, a storage medium, and a computer program stored in the storage medium, wherein the computer program implements the quality inspection task scheduling method described above when executed by the processor.
It is a further object of the present invention to provide a computer readable storage medium storing one of the objects of the present invention, on which a computer program is stored which, when executed by a processor, implements the quality inspection task scheduling method described above.
Compared with the prior art, the invention has the beneficial effects that:
the invention builds the quality inspection task scheduling model based on reinforcement learning, enhances the learning ability of an algorithm to a scheduling state, replaces an intelligent agent to directly learn action decisions through action selection rules, improves the algorithm convergence speed by utilizing heuristic rules, enhances the interpretability of model action selection, can completely describe the scheduling freedom degree and nonlinear task relation of quality inspection task scheduling problems, and can be applied to quality inspection of volume transmission.
Drawings
FIG. 1 is a flow chart of a quality inspection task scheduling method according to an embodiment;
FIG. 2 is a flow chart of an embodiment one quality inspection task scheduling method after adding experience playback;
fig. 3 is a block diagram of the electronic device of the third embodiment.
Detailed Description
The invention will now be described in more detail with reference to the accompanying drawings, to which it should be noted that the description is given below by way of illustration only and not by way of limitation. Various embodiments may be combined with one another to form further embodiments not shown in the following description.
Example 1
An embodiment I provides a quality inspection task scheduling method, which aims at constructing a scheduling model of quality inspection tasks by using reinforcement learning, aiming at serial, parallel and mutually exclusive characteristics existing among the quality inspection tasks, and replacing an agent to directly learn action decisions by using a reinforcement learning action selection rule mechanism.
The quality inspection task scheduling algorithm needs to determine the processing order of quality inspection tasks, and allocate samples and devices for each quality inspection task to obtain the shortest longest processing time, which is essentially a continuous decision process with limited choice. The core of applying reinforcement learning to quality inspection task scheduling algorithms is to convert scheduling problems into a markov process or a semi-markov process, i.e., define states, actions, transition probabilities, and rewards functions.
Aiming at the characteristic that a batch of quality inspection tasks are completed on a batch of samples and machines in the quality inspection task scheduling problem, the embodiment provides a scheduling state representation method suitable for reinforcement learning, and the learning capacity of an algorithm on the scheduling state is enhanced. Referring to fig. 1, a quality inspection task scheduling method includes the following steps:
s1, initializing model training parameters, wherein the model is a reinforcement learning model;
the training parameters comprise set batch, training step number, playback buffer, playback time and experience super parameters.
S2, constructing scheduling state characteristics, wherein the scheduling state characteristics are obtained through splicing of task processing time channels, sample-equipment occupancy rate and sample-equipment available time channels;
in order to enhance the model effect and improve the strategy diversity of training data, a cache pool is added during model training. Specifically, the scheduling state, the action, the decoding result and the rewarding value are stored in a buffer pool in a one-to-one correspondence mode, and the buffer pool is used for experience playback during training. And (3) playing back experience of the buffer pool, namely storing states, actions and feedback as a group of experience samples, and re-reading the experience samples as training data after a certain training step number so as to improve strategy diversity of the training data.
Referring to fig. 2, when the task is not completed, the process enters the next scheduling state and returns to step S2, and further includes:
judging whether experience playback is needed, if so, performing experience playback, otherwise, entering a next scheduling state and returning to the step S2.
The next scheduling state, i.e. the new state to which the current state is transferred after the action selection, includes: a queue of tasks to be processed, processed tasks, sample device load, idle time, etc.
When the scheduling status feature is selected in S2, the following principle is followed:
(1) The state characteristics should contain all the information needed for action decisions to fully describe the scheduling context, i.e. satisfy: a, a i =π(S i ) Wherein a is i S is the current action i In the event of a current state,
pi represents an action strategy, namely, inputting a scheduling state and outputting a target action;
(2) The actions corresponding to the transition relation between adjacent scheduling states should be unique, namely, the following is satisfied: a, a i =S i+1 -S i
(3) The rewards of actions are only related to the front and back states, i.e. the following: r is (r) i =R(a i ,S i ,S i+1 ) Wherein, the method comprises the steps of, wherein,
r i rewarding for the current action, R is a rewarding function.
The selection of the status features should be related to the scheduling objectives, reducing redundancy of feature information.
Specifically, the task processing time channel is a three-dimensional matrix of (n+1) × (m+1) ×j, where n is the number of samples, m is the number of devices, j is the number of quality inspection items, and the three-dimensional matrix of the processing time channel includes matrix elements p a,b,c 、p a,m,c P n,,c Wherein p is a,b,c Representing the processing time, p, required for quality inspection task c to be completed by sample a and device b a,m,c And p n,b,c Indicating the feasibility of the quality inspection task c to process on the sample a and the device b, wherein 0 indicates that the quality inspection task c cannot be processed on the sample (device) or does not need to be repeatedly executed, and 1 indicates that the quality inspection task c is not executed and can be processed on the sample (device);
a two-dimensional matrix of sample-device occupancy channels 9n+1) × (m+1), the sample-device occupancy channel matrix comprising matrix elements u a,b 、u a,m U n,b Wherein u is a,b Indicating the time of cumulative execution of quality inspection tasks of sample a on device b, u a,m And u n,b Accumulated processing time for sample a and device b, respectively;
the sample-device usable time channel is a two-dimensional matrix of (n+1) × (m+1), the sample-device usable time channel matrix comprising matrix elements l a,b 、l a,m L n,b Wherein l is a,b Indicating the end time, l, of the task last performed by sample a on device b a,m And l n,b The release time is occupied finally for the sample a and the device b respectively;
and splicing the task processing time channel, the sample-equipment occupancy rate and the sample-equipment available time channel to finally obtain the scheduling state characteristic representation with the dimension of (n+1× (m+1× (j+2).
S3, outputting a corresponding action according to the current scheduling state, and decoding the scheduling state to obtain a sample and equipment corresponding to the action;
in order to reduce the difficulty of reinforcement learning training and improve the stability of the algorithm, the direct output action is replaced by the action selection rule in the step S3, and one of the following action selection rules is output in the step S3:
(1) Selecting the task with the shortest processing time;
(2) Selecting a task with the longest processing time;
(3) Selecting the task with the least available samples;
(4) Selecting serial tasks;
(5) Selecting parallel tasks;
(6) Selecting a preamble task in the mutually exclusive task pair;
(7) Selecting a subsequent task in the mutually exclusive task pair;
(8) Selecting an unconstrained task;
obtaining an action to be processed based on the selected rule, and decoding the action according to the following heuristic rule:
rule one: heuristic sample selection is carried out on chromosomes of each individual, samples with shortest test completion time are selected according to the test sequence from front to back, and if a plurality of samples meet the selection conditions, the samples with shortest test completion time are selected;
rule II: heuristic equipment selection is carried out on the chromosome of each individual, the equipment with the shortest completion time is selected based on the selected sample, and if a plurality of equipment meets the selection condition, the equipment with the smallest load is selected.
S4, calculating a reward value and updating the training parameters according to the action and the decoding result;
the calculation of the prize value satisfies the following conditions:
R=αU-βE,
wherein R is a reward value, alpha and beta are experience parameters, and U, E is the scheduling environment utilization rate and the hole time respectively;
the calculation of the scheduling environment utilization rate satisfies the following conditions:
Figure BDA0003988790460000091
wherein U is N 、U M For the sample, the device utilization, u n,M 、u N,m For sample n, device in sample-device occupancy channelm cumulative processing time, C max The current longest processing time;
the calculation of the cavity time satisfies the following conditions:
Figure BDA0003988790460000092
wherein E is N 、E M For sample, device hole time, l n,M 、l N,m For the last occupied release time of sample n, device m in the sample-device usable time channel, u n,M 、u N,m The difference value is the cavity time, which is the accumulated processing time of the sample n and the equipment m in the sample-equipment occupancy rate channel.
S5, judging whether the scheduling task is completed or not:
when the scheduling task is completed and the training step number is reached, training is finished, otherwise, returning to the step S2;
and when the dispatching task is not completed, entering a next dispatching state and returning to the step S2.
And S5, when the task queue to be processed is empty, the task is scheduled to be completed.
In summary, the embodiment provides a reward function integrating the sample utilization rate and the equipment cavity aiming at the problem of sparse reward of quality inspection task scheduling; aiming at serial, parallel and mutually exclusive characteristics existing among quality inspection tasks, a group of action selection rules are provided to replace an intelligent agent to directly learn action decisions, the convergence speed of an algorithm is improved by using heuristic rules, and the interpretability of model action selection is enhanced; in order to improve strategy diversity of model training samples, the results of scheduling states, action decisions and the like are added into a playback buffer pool, and experience playback is performed according to the training step number.
The trained model is used for inputting the sample, the equipment and the number of test items when the quality inspection task scheduling strategy is adopted, and outputting the sample, the equipment and the test item processing sequence and the corresponding sample and equipment.
Example two
The second embodiment is an experimental result of the quality inspection task scheduling method described in the first embodiment, so as to prove the effectiveness of the method.
In this embodiment, the real data from a mass transfer laboratory of a certain electric network company is adopted for experimental verification, including 56 pieces of experimental item data, 21 pieces of (26 pieces of) equipment data and sample data, and in consideration of confidentiality requirements of related data, the test and the equipment name are represented by serial numbers, which are specifically shown in table 1.
Table 1 test-time-device information table
Figure BDA0003988790460000101
/>
Figure BDA0003988790460000111
The nonlinear relationships of serial, parallel, mutual exclusion and device mutual exclusion among 56 experimental terms are also represented by symbols, as shown in table 2:
TABLE 2 nonlinear relationship table
Nonlinear relation Experimental item
Task serialization [53,60]
Task parallelism [[9],[3]]
Task mutex [[33,34],[35,36,37]]
Device mutual exclusion [6,15],[16,17]
Wherein task 53 and task 60 must be performed sequentially on the same sample; task 9 must be performed simultaneously on 3 samples; any sample, after performing tasks 33 and 34, can no longer perform tasks 35, 36, 37, but is otherwise not limited; devices 6 and 15, and devices 16 and 17 cannot be operated simultaneously.
The experimental parameters were set as follows:
the processors Intel (R), xeon (R) Silver 4110, CPU 2.10GHz, memory 128GB, and display card GTX1080Ti are adopted to adapt to Ubuntu operating system.
The quality inspection task scheduling algorithm parameters based on reinforcement learning are shown in table 3, the training scale is 8000, the experience playback buffer pool is 100000, the target network parameter updating step number is 200, the reward function parameter alpha is set to 0.8, and the parameter beta is set to 1.0.
Analysis of experimental results
The difficulty in solving the quality inspection task scheduling problem is directly related to the quantity of quality inspection tasks, according to the existing data conditions, data sets of different scales are divided for testing, the quantity of test items is set to be 10, 20, 30, 40 and 50 respectively, the number of samples is kept to be 5, and equipment data are determined according to the used test numbers. To verify the effectiveness of the algorithm, the present embodiment selects the classic greedy algorithm MWKR to select the operation with the longest remaining processing time, and compares the operation with the classic Genetic Algorithm (GA) and the particle swarm algorithm (PSO), and the results are shown in table 3 below.
Table 3 single batch algorithm validation table
Figure BDA0003988790460000121
Table 3 shows the results of quality control task scheduling for single batch samples, respectively completing the 10, 20, 30, 40, 50 order of magnitude experimental items. 10 experiments were performed for each example, and the shortest longest completion time was recorded as an average of the objective function and the algorithm time, which did not contain the model loading time, to measure the algorithm performance. In the aspect of objective function values, the quality of the OURS algorithm is improved by 12.10% compared with the average of the MWKR algorithm, is improved by 2.07% compared with the GA algorithm, and is improved by 3.40% compared with the PSO algorithm. In terms of algorithm time, the efficiency of the OURS algorithm and the MWKR algorithm is improved by more than 99% compared with that of the GA algorithm and the PSO algorithm, and the efficiency of a mode of randomly generating chromosomes and populations is obviously far lower than that of model reasoning without repetition. From experimental results, the OURS algorithm is superior to the existing algorithm in scheduling quality and solving time, and the effectiveness of the OURS algorithm in solving the quality inspection scheduling problem is fully verified.
Example III
Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention, where, as shown in fig. 3, the electronic device includes a processor 210, a memory 220, an input device 230, and an output device 240; the number of processors 210 in the computer device may be one or more, one processor 210 being taken as an example in fig. 3; the processor 210, memory 220, input device 230, and output device 240 in the electronic device may be connected by a bus or other means, for example in fig. 3.
The memory 220 is a computer-readable storage medium that can be used to store software programs, computer-executable programs, and modules. The processor 210 executes the software programs, instructions and modules stored in the memory 220 to perform various functional applications and data processing of the electronic device, i.e., implement the quality inspection task scheduling methods of the first to second embodiments.
The memory 220 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for functions; the storage data area may store data created according to the use of the terminal, etc. In addition, memory 220 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, memory 220 may further include memory remotely located relative to processor 210, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 230 may be used to receive input user identity information, sample data, training parameters, and the like. The output means 240 may comprise a display device such as a display screen.
Example IV
The fourth embodiment of the present invention also provides a storage medium containing computer executable instructions, where the storage medium may be used for a computer to execute a quality inspection task scheduling method, where the method includes:
s1, initializing model training parameters, wherein the model is a reinforcement learning model;
s2, constructing scheduling state characteristics, wherein the scheduling state characteristics are obtained through splicing of task processing time channels, sample-equipment occupancy rate and sample-equipment available time channels;
s3, outputting a corresponding action according to the current scheduling state, and decoding the scheduling state to obtain a sample and equipment corresponding to the action;
s4, calculating a reward value and updating the training parameters according to the action and the decoding result;
s5, judging whether the scheduling task is completed or not:
when the scheduling task is completed and the training step number is reached, training is finished, otherwise, returning to the step S2;
and when the dispatching task is not completed, entering a next dispatching state and returning to the step S2.
Of course, the storage medium containing the computer executable instructions provided in the embodiments of the present invention is not limited to the above-described method operations, but may also perform the related operations in the quality inspection task scheduling method provided in any embodiment of the present invention.
From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, etc., and include several instructions for causing an electronic device (which may be a mobile phone, a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present invention.
It will be apparent to those skilled in the art from this disclosure that various other changes and modifications can be made which are within the scope of the invention as defined in the appended claims.

Claims (10)

1. The quality inspection task scheduling method is characterized by comprising the following steps of:
s1, initializing model training parameters, wherein the model is a reinforcement learning model;
s2, constructing scheduling state characteristics, wherein the scheduling state characteristics are obtained through splicing of task processing time channels, sample-equipment occupancy rate and sample-equipment available time channels;
s3, outputting a corresponding action according to the current scheduling state, and decoding the scheduling state to obtain a sample and equipment corresponding to the action;
s4, calculating a reward value and updating the training parameters according to the action and the decoding result;
s5, judging whether the scheduling task is completed or not:
when the scheduling task is completed and the training step number is reached, training is finished, otherwise, returning to the step S2;
and when the dispatching task is not completed, entering a next dispatching state and returning to the step S2.
2. The quality inspection task scheduling method of claim 1, wherein the training parameters include set batch, number of training steps, playback buffer, playback time, experience super parameters.
3. The quality inspection task scheduling method according to claim 2, further comprising, after calculating a reward value and updating the training parameter according to the action and decoding result:
and storing the scheduling state, the action, the decoding result and the rewarding value into a cache pool in a one-to-one correspondence mode, wherein the cache pool is used for experience playback during training.
4. A quality inspection task scheduling method according to claim 3, wherein when the scheduled task is not completed, entering a next scheduling state and returning to step S2, further comprising:
judging whether experience playback is needed, if so, performing experience playback, otherwise, entering a next scheduling state and returning to the step S2.
5. The quality inspection task scheduling method of claim 1, wherein the task processing time channel is a three-dimensional matrix of (n+1) × (m+1) ×j, where n is the number of samples, m is the number of devices, j is the number of quality inspection items, and the three-dimensional matrix of the processing time channel includes matrix elements p a,b,c 、p a,m,c P n,b,c Wherein p is a,b,c Representing the processing time, p, required for quality inspection task c to be completed by sample a and device b a,m,c And p n,b,c Representing the feasibility of the quality inspection task c to process on sample a and device b;
the sample-device occupancy channel is a two-dimensional matrix of (n+1) × (m+1), the sample-device occupancy channel matrix comprising matrix elements u a,b 、u a,m U n,b Wherein u is a,b Indicating the time of cumulative execution of quality inspection tasks of sample a on device b, u a,m And u n,b Accumulated processing time for sample a and device b, respectively;
the sample-device usable time channel is a two-dimensional matrix of (n+1) × (m+1), the sample-the device-available time channel matrix comprises matrix elements l a,b 、l a,m L n,b Wherein l is a,b Indicating the end time, l, of the task last performed by sample a on device b a,m And l n,b The release time is occupied finally for the sample a and the device b respectively;
the spliced scheduling state characteristics are scheduling state characteristic representations of the dimensions (n+1) x (m+1) x (j+2).
6. The quality inspection task scheduling method of claim 1, wherein the corresponding actions are output according to the current scheduling state, and the scheduling state representation principle is satisfied: a, a i =π(S i ),a i =S i+1 -S i ,r i =R(a i ,S i ,S i+1 ) Wherein a is i S is the current action i R is the current state i Rewarding for the current action, R is a rewarding function, and pi is an action selection strategy.
7. The quality inspection task scheduling method of claim 1 or 6, wherein S3 replaces the direct output action with an action selection rule, the action selection rule comprising:
(1) Selecting the task with the shortest processing time;
(2) Selecting the Qianzhen with the longest processing time;
(3) Selecting the task with the least available samples;
(4) Selecting serial tasks;
(5) Selecting parallel tasks;
(6) Selecting a preamble task in the mutually exclusive task pair;
(7) Selecting a subsequent task in the mutually exclusive task pair;
(8) Selecting an unconstrained task;
the decoded heuristic rules include:
rule one: heuristic sample selection is carried out on chromosomes of each individual, samples with shortest test completion time are selected according to the test sequence from front to back, and if a plurality of samples meet the selection conditions, the samples with shortest test completion time are selected;
rule II: heuristic equipment selection is carried out on the chromosome of each individual, the equipment with the shortest completion time is selected based on the selected sample, and if a plurality of equipment meets the selection condition, the equipment with the smallest load is selected.
8. The quality inspection task scheduling method of claim 1 wherein the calculation of the prize value satisfies:
R=αU-βE,
wherein R is a reward value, alpha and beta are experience parameters, and U, E is the scheduling environment utilization rate and the hole time respectively;
the calculation of the scheduling environment utilization rate satisfies the following conditions:
Figure FDA0003988790450000031
wherein U is N 、U M For the sample, the device utilization, u n,M 、u N,m For the accumulated processing time of the sample n and the device m in the sample-device occupancy rate channel, C max The current longest processing time;
the calculation of the cavity time satisfies the following conditions:
Figure FDA0003988790450000041
wherein E is N 、E M For sample, device hole time, l n,M 、l N,m For the last occupied release time of sample n, device m in the sample-device usable time channel, u n,M 、u N,m The accumulated processing time of the sample n and the device m in the sample-device occupancy channel is obtained.
9. An electronic device comprising a processor, a storage medium and a computer program stored in the storage medium, characterized in that the computer program, when executed by the processor, implements the quality inspection task scheduling method of any one of claims 1 to 8.
10. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the quality inspection task scheduling method of any one of claims 1 to 8.
CN202211572850.XA 2022-12-08 2022-12-08 Quality inspection task scheduling method, equipment and medium Pending CN116128334A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211572850.XA CN116128334A (en) 2022-12-08 2022-12-08 Quality inspection task scheduling method, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211572850.XA CN116128334A (en) 2022-12-08 2022-12-08 Quality inspection task scheduling method, equipment and medium

Publications (1)

Publication Number Publication Date
CN116128334A true CN116128334A (en) 2023-05-16

Family

ID=86303551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211572850.XA Pending CN116128334A (en) 2022-12-08 2022-12-08 Quality inspection task scheduling method, equipment and medium

Country Status (1)

Country Link
CN (1) CN116128334A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237241A (en) * 2023-11-15 2023-12-15 湖南自兴智慧医疗科技有限公司 Chromosome enhancement parameter adjustment method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237241A (en) * 2023-11-15 2023-12-15 湖南自兴智慧医疗科技有限公司 Chromosome enhancement parameter adjustment method and device
CN117237241B (en) * 2023-11-15 2024-02-06 湖南自兴智慧医疗科技有限公司 Chromosome enhancement parameter adjustment method and device

Similar Documents

Publication Publication Date Title
Shen et al. Mathematical modeling and multi-objective evolutionary algorithms applied to dynamic flexible job shop scheduling problems
Li et al. Mathematical model and metaheuristics for simultaneous balancing and sequencing of a robotic mixed-model assembly line
Xu et al. SATzilla-07: The design and analysis of an algorithm portfolio for SAT
Lei Simplified multi-objective genetic algorithms for stochastic job shop scheduling
Sun et al. Automatically evolving cnn architectures based on blocks
CN113792924A (en) Single-piece job shop scheduling method based on Deep reinforcement learning of Deep Q-network
CN112016691B (en) Quantum circuit construction method and device
Wei et al. Constrained differential evolution with multiobjective sorting mutation operators for constrained optimization
CN115357554B (en) Graph neural network compression method and device, electronic equipment and storage medium
Kallestad et al. A general deep reinforcement learning hyperheuristic framework for solving combinatorial optimization problems
CN109685204A (en) Pattern search method and device, image processing method and device
CN116128334A (en) Quality inspection task scheduling method, equipment and medium
CN114580678A (en) Product maintenance resource scheduling method and system
CN115758761A (en) Quality inspection task scheduling method, equipment and medium based on genetic algorithm
CN114895773A (en) Energy consumption optimization method, system and device of heterogeneous multi-core processor and storage medium
Wang et al. Large-scale inventory optimization: A recurrent neural networks–inspired simulation approach
CN114881301A (en) Simulation scheduling method and system for production line, terminal device and storage medium
Ozsoydan et al. A reinforcement learning based computational intelligence approach for binary optimization problems: The case of the set-union knapsack problem
CN114297934A (en) Model parameter parallel simulation optimization method and device based on proxy model
Khan et al. Optimization of constrained function using genetic algorithm
CN116823468A (en) SAC-based high-frequency quantitative transaction control method, system and storage medium
Zhang et al. MRLM: A meta-reinforcement learning-based metaheuristic for hybrid flow-shop scheduling problem with learning and forgetting effects
Mencia et al. Efficient repairs of infeasible job shop problems by evolutionary algorithms
Sun A genetic algorithm for a re-entrant job-shop scheduling problem with sequence-dependent setup times
Kołodziej et al. Control sharing analysis and simulation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination