CN116128334A - Quality inspection task scheduling method, equipment and medium - Google Patents
Quality inspection task scheduling method, equipment and medium Download PDFInfo
- Publication number
- CN116128334A CN116128334A CN202211572850.XA CN202211572850A CN116128334A CN 116128334 A CN116128334 A CN 116128334A CN 202211572850 A CN202211572850 A CN 202211572850A CN 116128334 A CN116128334 A CN 116128334A
- Authority
- CN
- China
- Prior art keywords
- sample
- task
- scheduling
- quality inspection
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06395—Quality analysis or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Software Systems (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Operations Research (AREA)
- Marketing (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Quality & Reliability (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Game Theory and Decision Science (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a quality inspection task scheduling method, which relates to the technical field of bone mass transfer inspection and is used for solving the problem of lack of a scheduling algorithm in the prior art, and the method comprises the following steps: s1, initializing model training parameters, wherein the model is a reinforcement learning model; s2, constructing scheduling state characteristics; s3, outputting a corresponding action according to the current scheduling state, and decoding the scheduling state to obtain a sample and equipment corresponding to the action; s4, calculating a reward value and updating the training parameters; s5, judging whether the scheduling task is completed or not: when the scheduling task is completed and the training step number is reached, training is finished, otherwise, returning to the step S2; and when the dispatching task is not completed, entering a next dispatching state and returning to the step S2. The invention also discloses quality inspection task scheduling electronic equipment and a computer storage medium. The invention models based on reinforcement learning, and further obtains a scheduling model with good effect.
Description
Technical Field
The invention relates to the technical field of bone mass transfer detection, in particular to a quality inspection task scheduling method, equipment and medium based on reinforcement learning.
Background
Quality control by mass transfer is a key work in electric power metering, and automatic quality control task scheduling becomes a natural choice for improving the detection efficiency and accuracy of various meter devices. However, unlike the existing flexible job shop scheduling problem that the working procedure required by each workpiece is fixed, each sample in the quality inspection task scheduling problem has no fixed quality inspection item, a batch of quality inspection tasks are finished on a batch of samples, the optimization space is larger, meanwhile, the quality inspection tasks have nonlinear relations such as serial, parallel, mutual exclusion and the like, and the constraint condition is more complex.
In the existing reinforcement learning method in the flexible job shop scheduling problem, the state characteristics cannot completely describe the scheduling freedom degree and the nonlinear task relation of the quality inspection task scheduling problem, and the reward function cannot reflect the multi-factor influences of task sequence, sample scheduling, equipment scheduling and the like, so that the existing scheduling algorithm cannot be directly applied.
Disclosure of Invention
In order to overcome the defects of the prior art, one of the purposes of the invention is to provide a quality inspection task scheduling method, which constructs a quality inspection task scheduling model based on reinforcement learning, so as to improve the sample detection efficiency in the quality inspection task scheduling process.
One of the purposes of the invention is realized by adopting the following technical scheme:
the quality inspection task scheduling method is characterized by comprising the following steps of:
s1, initializing model training parameters, wherein the model is a reinforcement learning model;
s2, constructing scheduling state characteristics, wherein the scheduling state characteristics are obtained through splicing of task processing time channels, sample-equipment occupancy rate and sample-equipment available time channels;
s3, outputting a corresponding action according to the current scheduling state, and decoding the scheduling state to obtain a sample and equipment corresponding to the action;
s4, calculating a reward value and updating the training parameters according to the action and the decoding result;
s5, judging whether the scheduling task is completed or not:
when the scheduling task is completed and the training step number is reached, training is finished, otherwise, returning to the step S2;
and when the dispatching task is not completed, entering a next dispatching state and returning to the step S2.
Further, the training parameters comprise set batch, training step number, playback buffer, playback time and experience super-parameters.
Further, after calculating the reward value and updating the training parameter according to the action and the decoding result, the method further comprises:
and storing the scheduling state, the action, the decoding result and the rewarding value into a cache pool in a one-to-one correspondence mode, wherein the cache pool is used for experience playback during training.
Further, when the scheduling task is not completed, entering a next scheduling state and returning to step S2, and further including:
judging whether experience playback is needed, if so, performing experience playback, otherwise, entering a next scheduling state and returning to the step S2.
Further, the task processing time channel is a three-dimensional matrix of (n+1) × (m+1) ×j, where n is the number of samples, m is the number of devices, j is the number of quality inspection items, and the three-dimensional matrix of the processing time channel includes matrix elements p a,b,c 、p a,m,c P n,b,c Wherein p is a,b,c Representing the processing time, p, required for quality inspection task c to be completed by sample a and device b a,m,c And p n,b,c Representing the feasibility of the quality inspection task c to process on sample a and device b;
the sample-device occupancy rate channel is a two-dimensional matrix of (n+1) ×9m+1), and the sample-device occupancy rate channel matrix comprises matrix elements u a,b 、u a,m U n,b Wherein u is a,b Indicating the time of cumulative execution of quality inspection tasks of sample a on device b, u a,m And u n, Accumulated processing time for sample a and device b, respectively;
the sample-device available time channel is a two-dimensional matrix of (n+1) × (m+1), and the sample-device available time channel matrix comprises matrix elements l a,b 、l a,m L n,b Wherein l is a,b Indicating the end time, l, of the task last performed by sample a on device b a,m And l n,b The release time is occupied finally for the sample a and the device b respectively;
the spliced scheduling state features are the scheduling state feature representations of the dimensions (n+1× (m+1× (j+2).
Further, corresponding actions are output according to the current scheduling state, and the scheduling state representing principle is satisfied: a, a i =π(S i ),a i =S i+1 -S i ,r i =R(a i ,S i ,S i+1 (whereina) i S is the current action i R is the current state i Rewarding for the current action, R is a rewarding function, and pi is an action selection strategy.
Further, in S3, the direct output action is replaced by an action selection rule, where the action selection rule includes:
(1) Selecting the task with the shortest processing time;
(2) Selecting a task with the longest processing time;
(3) Selecting the task with the least available samples;
(4) Selecting serial tasks;
(5) Selecting parallel tasks;
(6) Selecting a preamble task in the mutually exclusive task pair;
(7) Selecting a subsequent task in the mutually exclusive task pair;
(8) Selecting an unconstrained task;
the decoded heuristic rules include:
rule one: heuristic sample selection is carried out on chromosomes of each individual, samples with shortest test completion time are selected according to the test sequence from front to back, and if a plurality of samples meet the selection conditions, the samples with shortest test completion time are selected;
rule II: heuristic equipment selection is carried out on the chromosome of each individual, the equipment with the shortest completion time is selected based on the selected sample, and if a plurality of equipment meets the selection condition, the equipment with the smallest load is selected.
Further, the calculation of the prize value satisfies:
R=αU-βE,
wherein R is a reward value, alpha and beta are experience parameters, and U, E is the scheduling environment utilization rate and the hole time respectively;
the calculation of the scheduling environment utilization rate satisfies the following conditions:
wherein, the liquid crystal display device comprises a liquid crystal display device,U N 、U M for the sample, the device utilization, u n,M 、u N,m For the accumulated processing time of the sample n and the device m in the sample-device occupancy rate channel, C max The current longest processing time;
the calculation of the cavity time satisfies the following conditions:
wherein E is N 、E M For sample, device hole time, l n,M 、l N,m For the last occupied release time of sample n, device m in the sample-device usable time channel, u n,M 、u N,m The accumulated processing time of the sample n and the device m in the sample-device occupancy channel is obtained.
Another object of the present invention is to provide an electronic device for performing one of the objects of the present invention, which includes a processor, a storage medium, and a computer program stored in the storage medium, wherein the computer program implements the quality inspection task scheduling method described above when executed by the processor.
It is a further object of the present invention to provide a computer readable storage medium storing one of the objects of the present invention, on which a computer program is stored which, when executed by a processor, implements the quality inspection task scheduling method described above.
Compared with the prior art, the invention has the beneficial effects that:
the invention builds the quality inspection task scheduling model based on reinforcement learning, enhances the learning ability of an algorithm to a scheduling state, replaces an intelligent agent to directly learn action decisions through action selection rules, improves the algorithm convergence speed by utilizing heuristic rules, enhances the interpretability of model action selection, can completely describe the scheduling freedom degree and nonlinear task relation of quality inspection task scheduling problems, and can be applied to quality inspection of volume transmission.
Drawings
FIG. 1 is a flow chart of a quality inspection task scheduling method according to an embodiment;
FIG. 2 is a flow chart of an embodiment one quality inspection task scheduling method after adding experience playback;
fig. 3 is a block diagram of the electronic device of the third embodiment.
Detailed Description
The invention will now be described in more detail with reference to the accompanying drawings, to which it should be noted that the description is given below by way of illustration only and not by way of limitation. Various embodiments may be combined with one another to form further embodiments not shown in the following description.
Example 1
An embodiment I provides a quality inspection task scheduling method, which aims at constructing a scheduling model of quality inspection tasks by using reinforcement learning, aiming at serial, parallel and mutually exclusive characteristics existing among the quality inspection tasks, and replacing an agent to directly learn action decisions by using a reinforcement learning action selection rule mechanism.
The quality inspection task scheduling algorithm needs to determine the processing order of quality inspection tasks, and allocate samples and devices for each quality inspection task to obtain the shortest longest processing time, which is essentially a continuous decision process with limited choice. The core of applying reinforcement learning to quality inspection task scheduling algorithms is to convert scheduling problems into a markov process or a semi-markov process, i.e., define states, actions, transition probabilities, and rewards functions.
Aiming at the characteristic that a batch of quality inspection tasks are completed on a batch of samples and machines in the quality inspection task scheduling problem, the embodiment provides a scheduling state representation method suitable for reinforcement learning, and the learning capacity of an algorithm on the scheduling state is enhanced. Referring to fig. 1, a quality inspection task scheduling method includes the following steps:
s1, initializing model training parameters, wherein the model is a reinforcement learning model;
the training parameters comprise set batch, training step number, playback buffer, playback time and experience super parameters.
S2, constructing scheduling state characteristics, wherein the scheduling state characteristics are obtained through splicing of task processing time channels, sample-equipment occupancy rate and sample-equipment available time channels;
in order to enhance the model effect and improve the strategy diversity of training data, a cache pool is added during model training. Specifically, the scheduling state, the action, the decoding result and the rewarding value are stored in a buffer pool in a one-to-one correspondence mode, and the buffer pool is used for experience playback during training. And (3) playing back experience of the buffer pool, namely storing states, actions and feedback as a group of experience samples, and re-reading the experience samples as training data after a certain training step number so as to improve strategy diversity of the training data.
Referring to fig. 2, when the task is not completed, the process enters the next scheduling state and returns to step S2, and further includes:
judging whether experience playback is needed, if so, performing experience playback, otherwise, entering a next scheduling state and returning to the step S2.
The next scheduling state, i.e. the new state to which the current state is transferred after the action selection, includes: a queue of tasks to be processed, processed tasks, sample device load, idle time, etc.
When the scheduling status feature is selected in S2, the following principle is followed:
(1) The state characteristics should contain all the information needed for action decisions to fully describe the scheduling context, i.e. satisfy: a, a i =π(S i ) Wherein a is i S is the current action i In the event of a current state,
pi represents an action strategy, namely, inputting a scheduling state and outputting a target action;
(2) The actions corresponding to the transition relation between adjacent scheduling states should be unique, namely, the following is satisfied: a, a i =S i+1 -S i ;
(3) The rewards of actions are only related to the front and back states, i.e. the following: r is (r) i =R(a i ,S i ,S i+1 ) Wherein, the method comprises the steps of, wherein,
r i rewarding for the current action, R is a rewarding function.
The selection of the status features should be related to the scheduling objectives, reducing redundancy of feature information.
Specifically, the task processing time channel is a three-dimensional matrix of (n+1) × (m+1) ×j, where n is the number of samples, m is the number of devices, j is the number of quality inspection items, and the three-dimensional matrix of the processing time channel includes matrix elements p a,b,c 、p a,m,c P n,,c Wherein p is a,b,c Representing the processing time, p, required for quality inspection task c to be completed by sample a and device b a,m,c And p n,b,c Indicating the feasibility of the quality inspection task c to process on the sample a and the device b, wherein 0 indicates that the quality inspection task c cannot be processed on the sample (device) or does not need to be repeatedly executed, and 1 indicates that the quality inspection task c is not executed and can be processed on the sample (device);
a two-dimensional matrix of sample-device occupancy channels 9n+1) × (m+1), the sample-device occupancy channel matrix comprising matrix elements u a,b 、u a,m U n,b Wherein u is a,b Indicating the time of cumulative execution of quality inspection tasks of sample a on device b, u a,m And u n,b Accumulated processing time for sample a and device b, respectively;
the sample-device usable time channel is a two-dimensional matrix of (n+1) × (m+1), the sample-device usable time channel matrix comprising matrix elements l a,b 、l a,m L n,b Wherein l is a,b Indicating the end time, l, of the task last performed by sample a on device b a,m And l n,b The release time is occupied finally for the sample a and the device b respectively;
and splicing the task processing time channel, the sample-equipment occupancy rate and the sample-equipment available time channel to finally obtain the scheduling state characteristic representation with the dimension of (n+1× (m+1× (j+2).
S3, outputting a corresponding action according to the current scheduling state, and decoding the scheduling state to obtain a sample and equipment corresponding to the action;
in order to reduce the difficulty of reinforcement learning training and improve the stability of the algorithm, the direct output action is replaced by the action selection rule in the step S3, and one of the following action selection rules is output in the step S3:
(1) Selecting the task with the shortest processing time;
(2) Selecting a task with the longest processing time;
(3) Selecting the task with the least available samples;
(4) Selecting serial tasks;
(5) Selecting parallel tasks;
(6) Selecting a preamble task in the mutually exclusive task pair;
(7) Selecting a subsequent task in the mutually exclusive task pair;
(8) Selecting an unconstrained task;
obtaining an action to be processed based on the selected rule, and decoding the action according to the following heuristic rule:
rule one: heuristic sample selection is carried out on chromosomes of each individual, samples with shortest test completion time are selected according to the test sequence from front to back, and if a plurality of samples meet the selection conditions, the samples with shortest test completion time are selected;
rule II: heuristic equipment selection is carried out on the chromosome of each individual, the equipment with the shortest completion time is selected based on the selected sample, and if a plurality of equipment meets the selection condition, the equipment with the smallest load is selected.
S4, calculating a reward value and updating the training parameters according to the action and the decoding result;
the calculation of the prize value satisfies the following conditions:
R=αU-βE,
wherein R is a reward value, alpha and beta are experience parameters, and U, E is the scheduling environment utilization rate and the hole time respectively;
the calculation of the scheduling environment utilization rate satisfies the following conditions:
wherein U is N 、U M For the sample, the device utilization, u n,M 、u N,m For sample n, device in sample-device occupancy channelm cumulative processing time, C max The current longest processing time;
the calculation of the cavity time satisfies the following conditions:
wherein E is N 、E M For sample, device hole time, l n,M 、l N,m For the last occupied release time of sample n, device m in the sample-device usable time channel, u n,M 、u N,m The difference value is the cavity time, which is the accumulated processing time of the sample n and the equipment m in the sample-equipment occupancy rate channel.
S5, judging whether the scheduling task is completed or not:
when the scheduling task is completed and the training step number is reached, training is finished, otherwise, returning to the step S2;
and when the dispatching task is not completed, entering a next dispatching state and returning to the step S2.
And S5, when the task queue to be processed is empty, the task is scheduled to be completed.
In summary, the embodiment provides a reward function integrating the sample utilization rate and the equipment cavity aiming at the problem of sparse reward of quality inspection task scheduling; aiming at serial, parallel and mutually exclusive characteristics existing among quality inspection tasks, a group of action selection rules are provided to replace an intelligent agent to directly learn action decisions, the convergence speed of an algorithm is improved by using heuristic rules, and the interpretability of model action selection is enhanced; in order to improve strategy diversity of model training samples, the results of scheduling states, action decisions and the like are added into a playback buffer pool, and experience playback is performed according to the training step number.
The trained model is used for inputting the sample, the equipment and the number of test items when the quality inspection task scheduling strategy is adopted, and outputting the sample, the equipment and the test item processing sequence and the corresponding sample and equipment.
Example two
The second embodiment is an experimental result of the quality inspection task scheduling method described in the first embodiment, so as to prove the effectiveness of the method.
In this embodiment, the real data from a mass transfer laboratory of a certain electric network company is adopted for experimental verification, including 56 pieces of experimental item data, 21 pieces of (26 pieces of) equipment data and sample data, and in consideration of confidentiality requirements of related data, the test and the equipment name are represented by serial numbers, which are specifically shown in table 1.
Table 1 test-time-device information table
The nonlinear relationships of serial, parallel, mutual exclusion and device mutual exclusion among 56 experimental terms are also represented by symbols, as shown in table 2:
TABLE 2 nonlinear relationship table
Nonlinear relation | Experimental item |
Task serialization | [53,60] |
Task parallelism | [[9],[3]] |
Task mutex | [[33,34],[35,36,37]] |
Device mutual exclusion | [6,15],[16,17] |
Wherein task 53 and task 60 must be performed sequentially on the same sample; task 9 must be performed simultaneously on 3 samples; any sample, after performing tasks 33 and 34, can no longer perform tasks 35, 36, 37, but is otherwise not limited; devices 6 and 15, and devices 16 and 17 cannot be operated simultaneously.
The experimental parameters were set as follows:
the processors Intel (R), xeon (R) Silver 4110, CPU 2.10GHz, memory 128GB, and display card GTX1080Ti are adopted to adapt to Ubuntu operating system.
The quality inspection task scheduling algorithm parameters based on reinforcement learning are shown in table 3, the training scale is 8000, the experience playback buffer pool is 100000, the target network parameter updating step number is 200, the reward function parameter alpha is set to 0.8, and the parameter beta is set to 1.0.
Analysis of experimental results
The difficulty in solving the quality inspection task scheduling problem is directly related to the quantity of quality inspection tasks, according to the existing data conditions, data sets of different scales are divided for testing, the quantity of test items is set to be 10, 20, 30, 40 and 50 respectively, the number of samples is kept to be 5, and equipment data are determined according to the used test numbers. To verify the effectiveness of the algorithm, the present embodiment selects the classic greedy algorithm MWKR to select the operation with the longest remaining processing time, and compares the operation with the classic Genetic Algorithm (GA) and the particle swarm algorithm (PSO), and the results are shown in table 3 below.
Table 3 single batch algorithm validation table
Table 3 shows the results of quality control task scheduling for single batch samples, respectively completing the 10, 20, 30, 40, 50 order of magnitude experimental items. 10 experiments were performed for each example, and the shortest longest completion time was recorded as an average of the objective function and the algorithm time, which did not contain the model loading time, to measure the algorithm performance. In the aspect of objective function values, the quality of the OURS algorithm is improved by 12.10% compared with the average of the MWKR algorithm, is improved by 2.07% compared with the GA algorithm, and is improved by 3.40% compared with the PSO algorithm. In terms of algorithm time, the efficiency of the OURS algorithm and the MWKR algorithm is improved by more than 99% compared with that of the GA algorithm and the PSO algorithm, and the efficiency of a mode of randomly generating chromosomes and populations is obviously far lower than that of model reasoning without repetition. From experimental results, the OURS algorithm is superior to the existing algorithm in scheduling quality and solving time, and the effectiveness of the OURS algorithm in solving the quality inspection scheduling problem is fully verified.
Example III
Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention, where, as shown in fig. 3, the electronic device includes a processor 210, a memory 220, an input device 230, and an output device 240; the number of processors 210 in the computer device may be one or more, one processor 210 being taken as an example in fig. 3; the processor 210, memory 220, input device 230, and output device 240 in the electronic device may be connected by a bus or other means, for example in fig. 3.
The memory 220 is a computer-readable storage medium that can be used to store software programs, computer-executable programs, and modules. The processor 210 executes the software programs, instructions and modules stored in the memory 220 to perform various functional applications and data processing of the electronic device, i.e., implement the quality inspection task scheduling methods of the first to second embodiments.
The memory 220 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for functions; the storage data area may store data created according to the use of the terminal, etc. In addition, memory 220 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, memory 220 may further include memory remotely located relative to processor 210, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 230 may be used to receive input user identity information, sample data, training parameters, and the like. The output means 240 may comprise a display device such as a display screen.
Example IV
The fourth embodiment of the present invention also provides a storage medium containing computer executable instructions, where the storage medium may be used for a computer to execute a quality inspection task scheduling method, where the method includes:
s1, initializing model training parameters, wherein the model is a reinforcement learning model;
s2, constructing scheduling state characteristics, wherein the scheduling state characteristics are obtained through splicing of task processing time channels, sample-equipment occupancy rate and sample-equipment available time channels;
s3, outputting a corresponding action according to the current scheduling state, and decoding the scheduling state to obtain a sample and equipment corresponding to the action;
s4, calculating a reward value and updating the training parameters according to the action and the decoding result;
s5, judging whether the scheduling task is completed or not:
when the scheduling task is completed and the training step number is reached, training is finished, otherwise, returning to the step S2;
and when the dispatching task is not completed, entering a next dispatching state and returning to the step S2.
Of course, the storage medium containing the computer executable instructions provided in the embodiments of the present invention is not limited to the above-described method operations, but may also perform the related operations in the quality inspection task scheduling method provided in any embodiment of the present invention.
From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, etc., and include several instructions for causing an electronic device (which may be a mobile phone, a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present invention.
It will be apparent to those skilled in the art from this disclosure that various other changes and modifications can be made which are within the scope of the invention as defined in the appended claims.
Claims (10)
1. The quality inspection task scheduling method is characterized by comprising the following steps of:
s1, initializing model training parameters, wherein the model is a reinforcement learning model;
s2, constructing scheduling state characteristics, wherein the scheduling state characteristics are obtained through splicing of task processing time channels, sample-equipment occupancy rate and sample-equipment available time channels;
s3, outputting a corresponding action according to the current scheduling state, and decoding the scheduling state to obtain a sample and equipment corresponding to the action;
s4, calculating a reward value and updating the training parameters according to the action and the decoding result;
s5, judging whether the scheduling task is completed or not:
when the scheduling task is completed and the training step number is reached, training is finished, otherwise, returning to the step S2;
and when the dispatching task is not completed, entering a next dispatching state and returning to the step S2.
2. The quality inspection task scheduling method of claim 1, wherein the training parameters include set batch, number of training steps, playback buffer, playback time, experience super parameters.
3. The quality inspection task scheduling method according to claim 2, further comprising, after calculating a reward value and updating the training parameter according to the action and decoding result:
and storing the scheduling state, the action, the decoding result and the rewarding value into a cache pool in a one-to-one correspondence mode, wherein the cache pool is used for experience playback during training.
4. A quality inspection task scheduling method according to claim 3, wherein when the scheduled task is not completed, entering a next scheduling state and returning to step S2, further comprising:
judging whether experience playback is needed, if so, performing experience playback, otherwise, entering a next scheduling state and returning to the step S2.
5. The quality inspection task scheduling method of claim 1, wherein the task processing time channel is a three-dimensional matrix of (n+1) × (m+1) ×j, where n is the number of samples, m is the number of devices, j is the number of quality inspection items, and the three-dimensional matrix of the processing time channel includes matrix elements p a,b,c 、p a,m,c P n,b,c Wherein p is a,b,c Representing the processing time, p, required for quality inspection task c to be completed by sample a and device b a,m,c And p n,b,c Representing the feasibility of the quality inspection task c to process on sample a and device b;
the sample-device occupancy channel is a two-dimensional matrix of (n+1) × (m+1), the sample-device occupancy channel matrix comprising matrix elements u a,b 、u a,m U n,b Wherein u is a,b Indicating the time of cumulative execution of quality inspection tasks of sample a on device b, u a,m And u n,b Accumulated processing time for sample a and device b, respectively;
the sample-device usable time channel is a two-dimensional matrix of (n+1) × (m+1), the sample-the device-available time channel matrix comprises matrix elements l a,b 、l a,m L n,b Wherein l is a,b Indicating the end time, l, of the task last performed by sample a on device b a,m And l n,b The release time is occupied finally for the sample a and the device b respectively;
the spliced scheduling state characteristics are scheduling state characteristic representations of the dimensions (n+1) x (m+1) x (j+2).
6. The quality inspection task scheduling method of claim 1, wherein the corresponding actions are output according to the current scheduling state, and the scheduling state representation principle is satisfied: a, a i =π(S i ),a i =S i+1 -S i ,r i =R(a i ,S i ,S i+1 ) Wherein a is i S is the current action i R is the current state i Rewarding for the current action, R is a rewarding function, and pi is an action selection strategy.
7. The quality inspection task scheduling method of claim 1 or 6, wherein S3 replaces the direct output action with an action selection rule, the action selection rule comprising:
(1) Selecting the task with the shortest processing time;
(2) Selecting the Qianzhen with the longest processing time;
(3) Selecting the task with the least available samples;
(4) Selecting serial tasks;
(5) Selecting parallel tasks;
(6) Selecting a preamble task in the mutually exclusive task pair;
(7) Selecting a subsequent task in the mutually exclusive task pair;
(8) Selecting an unconstrained task;
the decoded heuristic rules include:
rule one: heuristic sample selection is carried out on chromosomes of each individual, samples with shortest test completion time are selected according to the test sequence from front to back, and if a plurality of samples meet the selection conditions, the samples with shortest test completion time are selected;
rule II: heuristic equipment selection is carried out on the chromosome of each individual, the equipment with the shortest completion time is selected based on the selected sample, and if a plurality of equipment meets the selection condition, the equipment with the smallest load is selected.
8. The quality inspection task scheduling method of claim 1 wherein the calculation of the prize value satisfies:
R=αU-βE,
wherein R is a reward value, alpha and beta are experience parameters, and U, E is the scheduling environment utilization rate and the hole time respectively;
the calculation of the scheduling environment utilization rate satisfies the following conditions:
wherein U is N 、U M For the sample, the device utilization, u n,M 、u N,m For the accumulated processing time of the sample n and the device m in the sample-device occupancy rate channel, C max The current longest processing time;
the calculation of the cavity time satisfies the following conditions:
wherein E is N 、E M For sample, device hole time, l n,M 、l N,m For the last occupied release time of sample n, device m in the sample-device usable time channel, u n,M 、u N,m The accumulated processing time of the sample n and the device m in the sample-device occupancy channel is obtained.
9. An electronic device comprising a processor, a storage medium and a computer program stored in the storage medium, characterized in that the computer program, when executed by the processor, implements the quality inspection task scheduling method of any one of claims 1 to 8.
10. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the quality inspection task scheduling method of any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211572850.XA CN116128334A (en) | 2022-12-08 | 2022-12-08 | Quality inspection task scheduling method, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211572850.XA CN116128334A (en) | 2022-12-08 | 2022-12-08 | Quality inspection task scheduling method, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116128334A true CN116128334A (en) | 2023-05-16 |
Family
ID=86303551
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211572850.XA Pending CN116128334A (en) | 2022-12-08 | 2022-12-08 | Quality inspection task scheduling method, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116128334A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117237241A (en) * | 2023-11-15 | 2023-12-15 | 湖南自兴智慧医疗科技有限公司 | Chromosome enhancement parameter adjustment method and device |
-
2022
- 2022-12-08 CN CN202211572850.XA patent/CN116128334A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117237241A (en) * | 2023-11-15 | 2023-12-15 | 湖南自兴智慧医疗科技有限公司 | Chromosome enhancement parameter adjustment method and device |
CN117237241B (en) * | 2023-11-15 | 2024-02-06 | 湖南自兴智慧医疗科技有限公司 | Chromosome enhancement parameter adjustment method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shen et al. | Mathematical modeling and multi-objective evolutionary algorithms applied to dynamic flexible job shop scheduling problems | |
Li et al. | Mathematical model and metaheuristics for simultaneous balancing and sequencing of a robotic mixed-model assembly line | |
Xu et al. | SATzilla-07: The design and analysis of an algorithm portfolio for SAT | |
Lei | Simplified multi-objective genetic algorithms for stochastic job shop scheduling | |
Sun et al. | Automatically evolving cnn architectures based on blocks | |
CN113792924A (en) | Single-piece job shop scheduling method based on Deep reinforcement learning of Deep Q-network | |
CN112016691B (en) | Quantum circuit construction method and device | |
Wei et al. | Constrained differential evolution with multiobjective sorting mutation operators for constrained optimization | |
CN115357554B (en) | Graph neural network compression method and device, electronic equipment and storage medium | |
Kallestad et al. | A general deep reinforcement learning hyperheuristic framework for solving combinatorial optimization problems | |
CN109685204A (en) | Pattern search method and device, image processing method and device | |
CN116128334A (en) | Quality inspection task scheduling method, equipment and medium | |
CN114580678A (en) | Product maintenance resource scheduling method and system | |
CN115758761A (en) | Quality inspection task scheduling method, equipment and medium based on genetic algorithm | |
CN114895773A (en) | Energy consumption optimization method, system and device of heterogeneous multi-core processor and storage medium | |
Wang et al. | Large-scale inventory optimization: A recurrent neural networks–inspired simulation approach | |
CN114881301A (en) | Simulation scheduling method and system for production line, terminal device and storage medium | |
Ozsoydan et al. | A reinforcement learning based computational intelligence approach for binary optimization problems: The case of the set-union knapsack problem | |
CN114297934A (en) | Model parameter parallel simulation optimization method and device based on proxy model | |
Khan et al. | Optimization of constrained function using genetic algorithm | |
CN116823468A (en) | SAC-based high-frequency quantitative transaction control method, system and storage medium | |
Zhang et al. | MRLM: A meta-reinforcement learning-based metaheuristic for hybrid flow-shop scheduling problem with learning and forgetting effects | |
Mencia et al. | Efficient repairs of infeasible job shop problems by evolutionary algorithms | |
Sun | A genetic algorithm for a re-entrant job-shop scheduling problem with sequence-dependent setup times | |
Kołodziej et al. | Control sharing analysis and simulation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |