CN109902954B - Flexible job shop dynamic scheduling method based on industrial big data - Google Patents

Flexible job shop dynamic scheduling method based on industrial big data Download PDF

Info

Publication number
CN109902954B
CN109902954B CN201910144370.5A CN201910144370A CN109902954B CN 109902954 B CN109902954 B CN 109902954B CN 201910144370 A CN201910144370 A CN 201910144370A CN 109902954 B CN109902954 B CN 109902954B
Authority
CN
China
Prior art keywords
scheduling
data
machine
rule
workpiece
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910144370.5A
Other languages
Chinese (zh)
Other versions
CN109902954A (en
Inventor
汤洪涛
费永辉
闫伟杰
陈程
梁佳炯
程晓雅
王丹南
李晋青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201910144370.5A priority Critical patent/CN109902954B/en
Publication of CN109902954A publication Critical patent/CN109902954A/en
Application granted granted Critical
Publication of CN109902954B publication Critical patent/CN109902954B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

A flexible job shop dynamic scheduling method based on industrial big data comprises the following steps: the method comprises the following steps: using data acquisition tools Sqoop and flute to acquire scheduling data from a database or a file system and storing the scheduling data in an HDFS file system; step two: dividing the scheduling data by taking a scheduling scheme as a unit through a data warehouse tool Hive; step three: converting the scheduling data into a training example by using a Spark calculation framework, and storing the training example into Hbase in a form of taking a scheduling scheme as a unit; step four: screening the indexes to obtain a scheduling data set generated in the execution of a well-behaved scheduling scheme; step five: clustering the scheduling related historical data based on disturbance attributes; step six: mining a random forest scheduling rule by adopting an improved random forest algorithm; step seven: and guiding the dynamic scheduling of the flexible job shop by using the mined scheduling rule. The method has the advantages of high practical operability and high calculation efficiency, and can quickly respond to the workshop disturbance in real time.

Description

Flexible job shop dynamic scheduling method based on industrial big data
Technical Field
The invention relates to a flexible job shop dynamic scheduling method based on industrial big data
Background
Scheduling plays an important role in the manufacturing system, and scheduling quality will affect the competitiveness of the manufacturing enterprise itself. The scientific and reasonable scheduling scheme is formulated for workshops, so that the production efficiency can be improved, the process cost can be reduced, the life cycle of products can be shortened, and meanwhile, the delivery of products on time and quality guarantee can be guaranteed. The flexible operation workshop has flexible process routes and rapid strain capacity for market demands, and can well meet the production demands of various products and small batches, so that the flexible operation workshop becomes a widely used production mode. The flexible job shop dynamic scheduling considers the disturbance of the actual production environment on the basis of static scheduling, and is more in line with the actual production environment, so that the flexible job shop dynamic scheduling has more research significance.
With the continuous change of product requirements to individuation, manufacturing processes are more diversified, actual scheduling problems become more complex, and the solution of the workshop scheduling problems of manufacturing enterprises puts higher requirements on the aspects of actual operability, computational efficiency, real-time response capability to workshop disturbance and the like. The priority scheduling rule is a simple heuristic rule, has high calculation efficiency and strong actual operability, can be used for real-time scheduling, and is suitable for complex and dynamic scheduling environments. However, the performance of the priority scheduling rule is affected by the actual environment change, and a single scheduling rule cannot have good scheduling performance in all disturbance environments. To meet the requirements of actual job shop scheduling, one feasible idea is to mine scheduling knowledge about scheduling rules from scheduling-related historical data to guide actual shop scheduling activities. The research for solving the scheduling problem through data mining is mainly divided into a method for combining the existing priority scheduling rule and a method for mining the scheduling rule from the scheduling-related historical data.
In the aspect of combining the existing priority scheduling rules, the WANG Shuang-Xi et al (A hybrid scheduling model using a decision tree and a neural network for selecting scheduling rules of a semiconductor final decision factor, 2005) provides a method for mining a priority scheduling rule selection mechanism from scheduling-related historical data by combining a decision tree and a neural network, and the selection mechanism can obtain the most suitable priority scheduling rule under the current environment. SHIUE y.r. et al (Data-based scheduling rule selection mechanism for a dispatch control system using a Supported Vector Machine (SVM) propose a method for mining a priority dispatch rule selection mechanism from dispatch-related historical Data, and make a real-time dispatch decision based on the method. Mouelhi (Training a neural network to selected scheduling rules in real time, 2009) and the like propose a scheduling rule selection method combined with a neural network, and the method excavates a scheduling rule real-time selection method from scheduling related historical data generated by simulation through the neural network.
The prior scheduling rule makes scheduling decision with only a small amount of information, which may result in unsatisfactory scheduling result, and thus it is another idea to extract a new scheduling rule from the scheduling-related history data. LI X et al (partitioning scheduling rules using marking, 2005) propose a method of obtaining a brand new scheduling rule from scheduling-related historical data using a decision tree, and it is proved through experiments that the extracted scheduling rule can be well fitted to the original scheduling scheme. A two-stage scheduling knowledge Learning method is provided in (Learning effective new single machine scheduling from scheduling data, 2010) of SIGURDUR OLAFSSON, and the like. Wangchangong et al (research on excavation methods of job shop scheduling rules, 2015) propose a scheduling rule excavation method combining a branch-and-bound algorithm and a decision tree algorithm of Petri network modeling, and the extracted scheduling rules can be used for guiding the scheduling of static job shops.
In summary, the current method for mining the scheduling rules from the historical data related to scheduling mainly aims at the problem of static scheduling of workshops, and is less applied to the problem of dynamic scheduling of flexible job workshops. In addition, the scheduling related historical data used by the method is biased to theoretical data, however, with the large use of intelligent sensing equipment in the inter-vehicle space, the workshop begins to develop towards intellectualization, and the workshop scheduling related historical data has the characteristics of large scale, low value, continuous sampling, high-dimensional and other industrial large data.
Disclosure of Invention
In order to solve the problems that the existing flexible job workshop dynamic scheduling method is low in actual operability, full in calculation efficiency and insufficient in real-time response capability to workshop disturbance, the invention provides the flexible job workshop dynamic scheduling method which is high in actual operability and calculation efficiency and can respond to the workshop disturbance in real time.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a flexible job shop dynamic scheduling method based on industrial big data comprises the following steps:
step one, data acquisition: historical data related to scheduling is collected from an existing information system by using a data collection tool under a Hadoop ecosystem, and is stored in an HDFS file system.
Step two, data integration: scheduling data set D in HDFS file system by using SQL statement through data warehouse tool HivehThe scheduling scheme is used as a unit for division, namely, scheduling related historical data generated in the execution of the primary scheduling scheme is divided together.
Step three, data conversion: and (3) converting the integrated data into a form of a training example by using Spark, so that a data mining algorithm can conveniently mine the scheduling rule.
Step four, data screening: and considering the historical scheduling scheme from three indexes of the maximum completion time, the total deadline time and the total load of the machine, and screening to obtain a scheduling-related historical data set generated in the execution of the scheduling scheme with good performance. The method specifically comprises the following steps:
step 4.1: and on the maximum completion time index, the maximum completion time of the scheduling scheme generated by only using the SPT rule under the same condition is used as a screening standard.
Step 4.2: and on the total delay time index, using the total delay time of using the EDD rule and combining the SPT rule to complete the scheduling task under the same condition as a screening standard.
Step 4.3: on the index of the total load of the machine, the total load of the machine which completes scheduling tasks by combining LMWT and SPT rules under the same condition is taken as a screening standard, and a scheduling data set which can simultaneously meet the three indexes and is generated in the execution of a scheduling scheme is taken as the input of a scheduling rule mining algorithm.
Step five; clustering based on disturbance attributes: and (3) clustering the screened scheduling related historical data by adopting a DBSCAN clustering method and taking a scheduling scheme as a unit (namely, data generated by one scheduling scheme is taken as an object), and clustering based on the disturbance attribute. The method specifically comprises the following steps:
step 5.1: and (3) performing data standardization on disturbance data when the schemes are executed, wherein if the data of certain disturbance attributes in each scheme are X1, X2, X3., Xn, the disturbance attributes need to be transformed according to the formula (1).
Figure BDA0001979526120000031
In the formula (1)
Figure BDA0001979526120000032
A mean value representing the attribute; s is expressed as the standard deviation; y1, Y2, Y3., Yn is the normalized data.
Step 5.2: and determining a parameter domain radius Eps of the DBSCAN algorithm and the number MinPts of at least objects contained in the core object domain radius.
Step 5.3: and randomly finding out a core object p, and creating a new cluster with p as the core object. Objects reachable from p direct density are repeatedly found and grouped into clusters.
Step 5.4: step 5.3 is repeated until no new points can be added to any cluster, and the process ends.
Step six, mining a random forest scheduling rule: and respectively mining a forest scheduling rule 1 for solving the problem of selecting a machine for workpieces and a random forest scheduling 2 for solving the problem of selecting and processing the workpieces by idle machines from each cluster after clustering by adopting an improved random forest algorithm. The method specifically comprises the following steps:
step 6.1: and for each clustered cluster, extracting training examples from the clusters in a replacement manner to form k new training example sets for constructing a decision tree.
Step 6.2: and randomly selecting m characteristic attributes, calculating an optimal splitting mode, and respectively training to obtain k decision trees.
Step 6.3: the classification performance of the decision tree is tested using the unselected training instances in the cluster.
Step 6.4: and judging whether similar decision trees exist or not, if so, reserving the existing decision trees of the table in the test, and forming a random forest.
Step 6.5: and finally, calculating the weight w and h of each decision tree according to a Bayesian voting mechanism to obtain a forest scheduling rule 1 and a random forest scheduling rule 2.
Step seven, the scheduling rule is used: and guiding the dynamic scheduling of the flexible job shop by the mined random forest scheduling rules. The method specifically comprises the following steps:
and 7.1, finding a random forest scheduling rule 1 or a random forest scheduling rule 2 corresponding to the cluster to which the disturbance environment of the current flexible job shop belongs according to the problem of selecting a workpiece machine for solving or selecting a workpiece for processing by an idle machine.
And 7.2, selecting an optimal method through pairwise comparison according to the selected random forest scheduling rule, and selecting the most appropriate workpiece or machine from the candidate machine set M or the candidate workpiece set J.
The technical conception of the invention is as follows: the relevant historical data of the workshop scheduling has the characteristics of large scale, low value, continuous sampling, high dimension and other industrial big data, so that the preprocessing of the relevant historical data of the scheduling is completed by combining the big data. FIG. 2 shows a data pre-processing model incorporating big data technology. The dynamic scheduling problem of the data preprocessing flexible job shop is to solve the problem of selecting a machine of a workpiece and the problem of selecting a workpiece of an idle machine in a disturbed environment, so that a collected data set DhThe method comprises the following three parts: d1 timing disturbance information of system related to disturbance for scheduling scheme; d2 when a machining machine is selected for a certain process of a workpiece, each machine in the set of machines that can currently machine the processStatus information of the device; d3 is the status information of each workpiece in the current queue for which the idle machine needs to select a workpiece to process in the wait queue. Scheduling data set DhThe data form in the method is disordered, cannot be directly used for the subsequent data screening, clustering and scheduling rule mining work, and the scheduling data set D needs to be sorted through data integration and conversionhThe data of (1). In scheduling data set DhThe method has the advantages that a large amount of effective information reflecting the characteristics of the actual scheduling environment and scheduling knowledge is hidden, and meanwhile, a plurality of useless or wrong rules or modes are accompanied. Therefore, the multi-index data screening mechanism of fig. 3 is adopted to consider the historical scheduling scheme from three aspects of maximum completion time, total deadline time and total load of the machine, and retain data generated in the execution of the scheduling historical scheme meeting the three indexes.
The random forest algorithm is used as a mining algorithm of the scheduling rule, the finally obtained scheduling rule is a random forest constructed by the algorithm, the random forest is essentially a plurality of trained C4.5 decision trees, the scheduling performance of the scheduling rule depends on the classification performance of the decision trees, and the calculation efficiency and the complexity degree of the scheduling rule depend on the branch number of the decision trees. Clustering optimal scheduling data D through DBSCANbThe method has the advantages that reasonable division is carried out, data generated by scheduling decisions made under different disturbance environments are distinguished, scheduling rules aiming at different disturbance environments are obtained from each divided region, classification performance of decision trees in the obtained random forest scheduling rules can be enhanced, and the number of branches is reduced, so that the complexity of the scheduling rules is lower, the calculation efficiency is higher, and the scheduling performance is better.
Learning a scheduling rule f from historical scheduling related data through a random forest algorithm, wherein f is an estimation of a real scheduling rule y
Figure BDA0001979526120000051
Therefore, it is not only easy to use
Figure BDA0001979526120000052
And y is a certain error. The error comprises three parts: noise(s)2Square, squareDifference (D)
Figure BDA0001979526120000053
And deviation from
Figure BDA0001979526120000054
Wherein the noise is2Is inevitable, but can be reduced by reducing the variance
Figure BDA0001979526120000055
Or deviation of
Figure BDA0001979526120000056
The error of the algorithm is reduced, and therefore the performance of the random forest algorithm is improved. Meanwhile, the variance can be reduced by reducing the correlation rho between the decision trees, so that if the similarity between two decision numbers is too large, the decision number with good test performance is reserved, and the correlation rho between the decision trees is reduced. A traditional random forest algorithm adopts a voting mechanism that a minority obeys a majority, and the classification performance of a decision tree in a random forest has the same weight no matter how good the classification performance is. Such a mechanism results in decision trees with poor classification performance having the same degree of influence on the final result as decision trees with good classification performance. So the bayesian voting mechanism is adopted in this document. The mechanism sets a weight value based on the classification and representation of each decision tree in the test, and then votes according to the weight value.
The invention has the following beneficial effects: the method for mining the scheduling rules from the scheduling-related historical data with the characteristics of industrial big data to guide scheduling is used as a main framework, a data preprocessing model combined with a big data technology is established, the speed and the accuracy of data preprocessing are improved, a clustering mechanism based on disturbance attributes is established, the complexity of the scheduling rules is reduced, the higher calculation efficiency and the scheduling performance of the scheduling rules are improved, a scheduling mining model based on an improved random forest algorithm is established, and the generalization capability and the scheduling performance of the scheduling rules are improved.
Drawings
FIG. 1 is a scheduling rules mining overall architecture of the present invention.
FIG. 2 is a model of the present invention for scheduling data pre-processing in conjunction with big data technology.
FIG. 3 is a multi-index data screening mechanism of the present invention.
FIG. 4 is a flow chart of the improved random forest algorithm mining dispatch rules of the present invention.
FIG. 5 is a scheduling scheme resulting from the use of the flexible job shop dynamic scheduling method based on industrial big data of the present invention.
Detailed Description
Referring to fig. 1 to 5, a flexible job shop dynamic scheduling method based on industrial big data, the overall framework of which refers to fig. 1, is specifically divided into three parts: the first part is a scheduling data preprocessing model combined with a big data technology, and refers to fig. 2, which is specifically divided into data acquisition, data integration, data transformation and data screening; a first part that clusters a policy based on a perturbation attribute; and in the third part, mining the model based on the dispatching rule of the improved random forest algorithm. The general technical steps are as follows: step one, data acquisition: historical data related to scheduling are collected from existing information systems such as MES, ERP, SCADA and the like by using data collection tools Sqoop and Flume under a Hadoop ecosystem, and are stored in an HDFS file system. The collected data includes three parts DhD1, d2, d3 }: d1 timing disturbance information of system related to disturbance for scheduling scheme; d2 is the state information of each machine in the set of machines which can process the working procedure at present when selecting the processing machine for the working procedure of the workpiece; d3 is the status information of each workpiece in the current queue for which the idle machine needs to select a workpiece to process in the wait queue.
Step two, data integration: scheduling data set D in HDFS file system by using SQL statement through data warehouse tool HivehThe scheduling scheme is used as a unit for division, namely, scheduling related historical data generated in the execution of the primary scheduling scheme is divided together.
Step three, data conversion: and d2 and d3 parts in the integrated data are converted into a form of training examples by using Spark, so that the scheduling rule mining of a data mining algorithm is facilitated. The method specifically comprises the following steps:
step 3.1: for the collected scheduling data set DhThe part d2 of (a), regarding the actually selected machine m1 in a certain historical scheduling scheme as the most suitable machine, comparing it with the machines in the alternative machine set { m2, m3.
Step 3.2: for the collected scheduling data set DhThe d3 section of (a), regarding the actually selected workpiece j1 in a certain historical scheduling scheme as the most suitable machine, and comparing it with the workpieces in other workpiece sets { j2, j3.. } waiting for processing one by one to form training examples.
Step four, data screening: considering the historical scheduling scheme from three indexes of maximum completion time, total delay time and total load of machines, and screening to obtain a scheduling-related historical data set D generated in the execution of the scheduling scheme with good performanceb. The method specifically comprises the following steps:
step 4.1: and on the maximum completion time index, the maximum completion time of the scheduling scheme generated by only using the SPT rule under the same condition is used as a screening standard. The use of only the SPT rule means that the workpiece selects the workpiece with the fastest machining time and the idle machine selects the workpiece with the shortest machining time. And 4.2, entering the scheduling scheme with the maximum completion time meeting the index, and eliminating the scheduling scheme if the maximum completion time does not meet the index.
Step 4.2: and on the total delay time index, using the total delay time of using the EDD rule and combining the SPT rule to complete the scheduling task under the same condition as a screening standard. The SPT + EDD rule refers to that the workpiece is selected to process the workpiece most quickly and the idle machine is selected to deliver the workpiece with the earliest delivery date. And 4.3, entering the scheduling scheme with the total deadline time meeting the index into step 4.3, and eliminating the scheduling scheme if the total deadline time does not meet the index.
Step 4.3: on the machine total load index, the machine total load of a scheduling task is combined with LMWT and SPT rules under the same condition to serve as a screening standard, and the LMWT + SPT rules refer to that the machine with the longest idle time is selected as the workpiece and the idle machine selects the workpiece with the shortest processing time. The scheduling data set generated during the execution of the scheduling scheme which can simultaneously meet the three indexes is used as the input of the scheduling rule mining algorithm.
Step five; clustering based on disturbance attributes: using DBSCAN to DbIn units of scheduling schemes (i.e. data generated by one scheduling scheme as one object), according to DbThe system disturbance attribute (part d 1) in the recipe creation in (1) is subjected to clustering based on the disturbance attribute. The method specifically comprises the following steps:
step 5.1: the d1 partial data are normalized, if the data of a certain disturbance attribute in each scheme are X1, X2, X3., Xn, then they need to be transformed as formula (1).
Figure BDA0001979526120000071
In the formula (1)
Figure BDA0001979526120000072
A mean value representing the attribute; s is expressed as the standard deviation; y1, Y2, Y3., Yn is the normalized data.
Step 5.2: and determining a parameter domain radius Eps of the DBSCAN algorithm and the number MinPts of at least objects contained in the core object domain radius.
Step 5.3: and randomly finding an unprocessed (not classified into a certain cluster or marked as noise) core object p (the number of objects contained in the domain radius is not less than MinPts), establishing a new cluster C, and adding all objects in the p neighborhood radius Eps into a candidate set N.
Step 5.4: and randomly finding out the object q which is not processed in one candidate set N. If q is a core object, adding an unprocessed and not added object to N within the q neighborhood radius Eps to N. If q does not belong to any cluster, q is added to C.
Step 5.5: repeat step 5.4 until N is empty.
Step 5.6: steps 5.3, 5.4, 5.5 are repeated until no new objects can be added to any cluster, and the process ends
Step six, mining a random forest scheduling rule: and respectively mining a forest scheduling rule 1 for solving the problem of selecting a machine for workpieces and a random forest scheduling 2 for solving the problem of selecting and processing the workpieces by idle machines from each cluster after clustering by adopting an improved random forest algorithm. The method specifically comprises the following steps:
step 6.1: and for each clustered cluster, extracting training examples from d2 (mining random forest scheduling rule 1) and d3 (mining random forest scheduling rule 2) in the cluster in a return mode, and respectively forming k new training example sets P1 and P2 for constructing a decision tree.
Step 6.2: and (3) randomly selecting m characteristic attributes from d2 and d3 respectively by P1 and P2, calculating an optimal splitting mode, and training respectively to obtain k decision trees T1 and T2.
The construction process of the decision tree comprises the following steps:
step 6.2.1: a root node N is created.
Step 6.2.2: and judging whether the training example set has residual training examples, if not, returning to the node N, and if so, carrying out the next step.
Step 6.2.3: and judging whether the scheduling decisions of the rest training examples in the training example set are all C, if so, returning to the node N and marking as class C, and if so, carrying out the next step.
Step 6.2.4: and judging whether the production attribute list is empty or not, if so, marking the class with the most occurrence in the sample, and otherwise, carrying out the next step.
Step 6.2.5: and checking whether the attributes in the attribute class table are continuous or not, and obtaining the attribute separation mode with the maximum attribute gain G (D, A) by the continuous attributes through dichotomy. (all attribute values of the attribute can be divided into two parts by the dichotomy, which has N-1 dividing methods, and the dividing threshold of the dichotomy is the average value of two adjacent points at the selected dichotomy. the information gain is calculated by the formulas (2), (3) and (4)).
G(D,A)=H(D)-H(D|A)(2)
Figure BDA0001979526120000081
Figure BDA0001979526120000082
In the formula (2), G (D, A) represents the information gain of the attribute A; entropy of H (D, A) class information in formula (3); in the formula (4), H (D | A) represents conditional entropy; furthermore, D represents the training instance dataset, | D | represents the number of training instances of D, and D has K classes Ck,k=1,2;|CkI is represented in category CkThe number of training examples in (2). D can be divided into n subsets D by the attribute A1,D2,…,Dn,|DiL is DiThe number of training examples. DiIn the class CkIs Dik,|DikL is DikThe number of training examples.
Step 6.2.6: and selecting the attribute marking node N with the largest information gain rate, wherein the calculation formulas of the information gain rate are shown as formulas (5) and (6), and returning to the step 6.2.2.
GR(D,A)=G(D,A)/H(A)(5)
Figure BDA0001979526120000083
GR (D, a) in equation (5) represents an information gain ratio; h (A) indicates split information; the other symbols have the same meanings as above.
Step 6.3: using the unselected training examples in d2 and d3, the classification performance of the decision trees in T1 and T2, respectively, was tested.
Step 6.4: and (3) calculating the similarity S between the decision trees in the T1 or the T2, wherein the calculation formula is shown as the formula (7), and if the similarity between the decision trees is more than 60%, comparing the test performances in the step 6.3, reserving the good decision trees and forming a random forest.
Figure BDA0001979526120000091
DT in formula (7)1And DT2Representing two decision trees for similarity calculation; k represents DT1And DT2The test cases are classified for the same times; r is1nAnd r2nRepresents the n-th classification resultSame, DT1And DT2C represents the classification result; when r is1n=r2nWhen is DT1And DT2When the same classification result is obtained with the same feature attributes, I (r)1n.c,r2nC) 1, otherwise 0, Nt being the number of test cases.
Step 6.5: and respectively calculating the weight w and h of each decision tree in T1 and T2 through a Bayesian voting mechanism, wherein the calculation formulas are as formulas (8) and (9), and thus obtaining a forest scheduling rule 1 and a random forest scheduling rule 2.
Figure BDA0001979526120000092
Figure BDA0001979526120000093
V in the formulas (8) and (9) represents the number of times the test case is correctly classified by the decision tree; m represents the number of times of error classification of the test case;
step seven, the scheduling rule is used: and guiding the dynamic scheduling of the flexible job shop by the mined random forest scheduling rules. The method specifically comprises the following steps:
and 7.1, finding a random forest scheduling rule 1 or a random forest scheduling rule 2 corresponding to the cluster to which the disturbance environment of the current flexible job shop belongs according to the problem of selecting a workpiece machine for solving or selecting a workpiece for processing by an idle machine.
And 7.2, selecting an optimal method through pairwise comparison according to the selected random forest scheduling rule, and selecting the most appropriate workpiece or machine from the candidate machine set M or the candidate workpiece set J.
Step 7.2.1, for the workpiece machine selection problem, if M1 and M2 are two machines in M, according to the random forest scheduling rule 1 selected in step 7.1, the selection result of each decision tree in the random forest scheduling rule is calculated, and the results include selection 1 and selection 2 (selection 1 represents that M1 is proper, and selection 2 represents that M2 is proper). For the problem of selecting a workpiece by an idle machine, if J1 and J2 are two workpieces in J, the selection result of each decision tree in the random forest scheduling rules is calculated according to the random forest scheduling rule 2 selected in step 7.1, and the results include decision 1 and decision 2 (decision 1 represents that J1 is appropriate, and decision 2 represents that J2 is appropriate).
Step 7.2.2: and obtaining a weighted selection result WR of each decision tree through a Bayesian voting mechanism, wherein the WR is calculated according to a formula (10), and obtaining an average value AWR of the weighted results, wherein if the AWR is less than 1.5, the former m1 or j1 is proper, and if the AWR is more than 1.5, the latter m2 or j2 is proper.
WR=wC+hR(10)
C in the formula (10) represents a classification result given by the decision tree; r represents the mean value of the classification results given by all decision trees, and the calculation formulas of w and h are shown in the formulas (8) and (9).
Example (c): in a certain scheduling task, workpieces JT1, JT2, are processed, JT8 has 100 pieces, namely 10 batches, the delivery dates of the workpieces are 20.0, 22.0, 14.0, 21.0, 19.0, 22.0, 18.0 and 23.0 processing unit time respectively, and the processing time of each process of the workpieces on each machine is as shown in table one. And a machine failure occurred at time 4, and it was found that a material shortage occurred in the second step of JT1 after the completion of the first step, and at time 10, the processing time of the workpiece was increased by 10% in total.
Table-workpiece processing time table
Figure BDA0001979526120000101
Figure BDA0001979526120000111
The scheduling scheme obtained by the flexible job shop dynamic scheduling method based on the industrial big data is shown in fig. 5, wherein the abscissa represents time, the ordinate represents a machine, the percentile number in the gantt chart represents the type of a workpiece, and the unit number represents the work order number. The maximum completion time of the final scheme is 21.8 processing unit time, the total delay time is 5.3 processing unit time, and the total load of the machine is 96.4 processing unit time.
The patent method can smoothly solve the problem of dynamic scheduling of the flexible job shop, and the scheduling rules mined by the method are used for guiding the scheduling of the flexible job shop, so that the method has the characteristics of strong practical feasibility, high calculation efficiency, no need of modeling the scheduling problem, real-time response to the disturbance of the shop and the like.

Claims (1)

1. A flexible job shop dynamic scheduling method based on industrial big data comprises the following steps:
step one, data acquisition: collecting historical data related to scheduling from the existing information systems MES, ERP and SCADA by using data collection tools Sqoop and Flume under a Hadoop ecosystem, and storing the historical data in an HDFS file system; the collected data includes three parts DhD1, d2, d3 }: d1 timing disturbance information of system related to disturbance for scheduling scheme; d2 is the state information of each machine in the set of machines which can process the working procedure at present when selecting the processing machine for the working procedure of the workpiece; d3 is the state information of each workpiece in the current queue when the idle machine needs to select the workpiece in the waiting queue for processing;
step two, data integration: scheduling data set D in HDFS file system by using SQL statement through data warehouse tool HivehDividing by taking a scheduling scheme as a unit, namely dividing scheduling related historical data generated in the execution of the primary scheduling scheme together;
step three, data conversion: d2 and d3 parts in the integrated data are converted into a form of a training example by Spark, so that a data mining algorithm can conveniently mine the scheduling rules; the method specifically comprises the following steps:
step 3.1: for the collected scheduling data set DhThe part d2 of (1), regarding the actually selected machine m1 in a certain historical scheduling scheme as the most suitable machine, and comparing the most suitable machine with machines in other alternative machine sets { m2, m3. } which can process the process one by one to form training examples;
step 3.2: for miningSet of scheduling data DhThe part d3 of (a), regarding the actually selected workpiece j1 in a certain historical scheduling scheme as the most suitable machine, and comparing the most suitable machine with the workpieces in other workpiece sets { j2, j3.. } waiting for processing one by one to form training examples;
step four, data screening: considering the historical scheduling scheme from three indexes of maximum completion time, total delay time and total load of machines, and screening to obtain a scheduling-related historical data set D generated in the execution of the scheduling scheme with good performanceb(ii) a The method specifically comprises the following steps:
step 4.1: on the maximum completion time index, the maximum completion time of the scheduling scheme generated by only using the SPT rule under the same condition is used as a screening standard; only using the SPT rule means that the workpiece selects the workpiece with the shortest processing time from the machine with the fastest processing and the idle machine; the scheduling scheme with the maximum completion time meeting the index enters step 4.2, and is eliminated if the maximum completion time does not meet the index;
step 4.2: on the total delay time index, using the total delay time of scheduling tasks under the same condition to finish by using an EDD rule and an SPT rule under the same condition as a screening standard; the SPT + EDD rule refers to that a workpiece selects a machine with the fastest processing and an idle machine selects a workpiece with the earliest delivery date; entering the step 4.3 if the total deadline time meets the index, and eliminating if the total deadline time does not meet the index;
step 4.3: on the basis of the total load index of the machine, the total load of the machine which completes scheduling tasks by combining LMWT and SPT rules under the same condition is taken as a screening standard, and the LMWT + SPT rules refer to that the machine with the longest idle time is selected for the workpiece and the idle machine selects the workpiece with the shortest processing time; the scheduling data set generated in the execution of the scheduling scheme which can simultaneously meet the three indexes is used as the input of a scheduling rule mining algorithm;
step five; clustering based on disturbance attributes: using DBSCAN to DbData generated in units of scheduling schemes, i.e. one scheduling scheme, as one object, according to DbThe system disturbance attribute in the scheme (1), namely the d1 part, is clustered based on the disturbance attribute; the method specifically comprises the following steps:
step 5.1: normalizing d1 partial data, if the data of a certain disturbance attribute in each scheme is X1, X2, X3., Xn, they need to be transformed as formula (1);
Figure FDA0002663437110000021
in the formula (1)
Figure FDA0002663437110000022
A mean value representing the attribute; s is expressed as the standard deviation; y1, Y2, Y3., Yn being normalized data;
step 5.2: determining a parameter field radius Eps of the DBSCAN algorithm and the number MinPts of at least objects contained in the core object field radius;
step 5.3: randomly finding out an unprocessed core object p, namely the core object p which is not classified into a certain cluster or marked as noise, wherein the number of objects contained in the radius of the field of the core object p is not less than MinPts, establishing a new cluster C, and adding all objects in the p neighborhood radius Eps into a candidate set N;
step 5.4: randomly finding out an unprocessed object q in a candidate set N; if q is a core object, adding an unprocessed object which is not added into N in the q neighborhood radius Eps into N; if q does not belong to any cluster, adding q to C;
step 5.5: repeating step 5.4 until N is empty;
step 5.6: steps 5.3, 5.4, 5.5 are repeated until no new objects can be added to any cluster, and the process ends
Step six, mining a random forest scheduling rule: respectively mining a random forest scheduling rule 1 for solving the problem of selecting a machine for workpieces and a random forest scheduling rule 2 for solving the problem of selecting the machine for machining the workpieces by using an improved random forest algorithm from each cluster after clustering; the method specifically comprises the following steps:
step 6.1: for each clustered cluster, extracting training examples from d2 and d3 in the cluster in a replacement manner to form k new training example sets P1 and P2 respectively for constructing a decision tree;
step 6.2: randomly selecting m characteristic attributes from d2 and d3 respectively by P1 and P2, calculating an optimal splitting mode, and respectively training k decision trees T1 and T2;
the construction process of the decision tree comprises the following steps:
step 6.2.1: creating a root node N;
step 6.2.2: judging whether the training example set has residual training examples, if not, returning to the node N, and if so, carrying out the next step;
step 6.2.3: judging whether the scheduling decisions of the rest training examples in the training example set are all C, if so, returning to the node N and marking as class C, and if so, carrying out the next step;
step 6.2.4: judging whether the production attribute list is empty or not, if so, marking the empty production attribute list as the most classes in the sample, and otherwise, carrying out the next step;
step 6.2.5: checking whether the attribute in the attribute class table is continuous or not, wherein the continuous attribute obtains an attribute separation mode with the maximum attribute gain G (D, A) through a dichotomy; all attribute values of the attributes can be divided into two parts through a dichotomy, the method has N-1 dividing methods, and a dividing threshold value of the dichotomy is an average value of two adjacent points at a selected dichotomy; the information gain calculation method is as follows, formulas (2), (3) and (4);
G(D,A)=H(D)-H(D|A) (2)
Figure FDA0002663437110000031
Figure FDA0002663437110000032
in the formula (2), G (D, A) represents the information gain of the attribute A; entropy of H (D, A) class information in formula (3); in the formula (4), H (D | A) represents conditional entropy; furthermore, D represents the training instance dataset, | D | represents the number of training instances of D, and D has K classes Ck,k=1,2;|CkI is represented in category CkIn (1)The number of training examples; d can be divided into n subsets D by the attribute A1,D2,…,Dn,|DiL is DiThe number of training instances of (c); diIn the class CkIs Dik,|DikL is DikThe number of training instances of (c);
step 6.2.6: selecting the attribute marking node N with the largest information gain rate, wherein the calculation formulas of the information gain rate are shown as formulas (5) and (6), and returning to the step 6.2.2;
GR(D,A)=G(D,A)/H(A) (5)
Figure FDA0002663437110000033
GR (D, a) in equation (5) represents an information gain ratio; h (A) indicates split information; other symbols have the same meanings as above;
step 6.3: testing classification performance of decision trees in T1 and T2 respectively by using unselected training examples in d2 and d 3;
step 6.4: calculating the similarity S between decision trees in T1 or T2, wherein the calculation formula is shown as formula (7), if the similarity between the decision trees is more than 60%, comparing the test performances in the step 6.3, keeping the good decision trees, and forming a random forest;
Figure FDA0002663437110000041
DT in formula (7)1And DT2Representing two decision trees for similarity calculation; k represents DT1And DT2The test cases are classified for the same times; r is1nAnd r2nIndicating the same result of the nth classification, DT1And DT2C represents the classification result; when r is1n=r2nWhen is DT1And DT2When the same classification result is obtained with the same feature attributes, I (r)1n.c,r2nC) 1, otherwise 0, Nt being the number of test cases;
step 6.5: respectively calculating the weight w and h of each decision tree in T1 and T2 through a Bayesian voting mechanism, wherein the calculation formulas are as formulas (8) and (9), and obtaining a random forest scheduling rule 1 and a random forest scheduling rule 2;
Figure FDA0002663437110000042
Figure FDA0002663437110000043
v in the formulas (8) and (9) represents the number of times the test case is correctly classified by the decision tree; m represents the number of times of error classification of the test case;
step seven, the scheduling rule is used: guiding the dynamic scheduling of the flexible job shop by the mined random forest scheduling rule; the method specifically comprises the following steps:
step 7.1, according to the problem to be solved, selecting a workpiece machine or selecting a workpiece to be processed by an idle machine, and finding a random forest scheduling rule 1 or a random forest scheduling rule 2 corresponding to a cluster to which a disturbance environment of the current flexible job shop belongs;
7.2, selecting an optimal method through pairwise comparison according to the selected random forest scheduling rule, and selecting the most appropriate workpiece or machine from the candidate machine set M or the candidate workpiece set J;
step 7.2.1, for the workpiece machine selection problem, if M1 and M2 are two machines in M, calculating to obtain the selection result of each decision tree in the random forest scheduling rules according to the random forest scheduling rule 1 selected in the step 7.1, wherein the results comprise selection 1 and selection 2, the selection 1 represents that the former M1 is proper, and the selection 2 represents that the latter M2 is proper; for the problem of selecting workpieces by idle machines, if J1 and J2 are two workpieces in J, the selection result of each decision tree in the random forest scheduling rules is calculated according to the random forest scheduling rule 2 selected in the step 7.1, the results include decision 1 and decision 2, decision 1 represents that J1 is proper, and decision 2 represents that J2 is proper;
step 7.2.2: obtaining a weighted selection result WR of each decision tree through a Bayesian voting mechanism, wherein the WR is calculated according to a formula (10), and obtaining an average value AWR of the weighted results, wherein if the AWR is less than 1.5, the former m1 or j1 is proper, and if the AWR is more than 1.5, the latter m2 or j2 is proper;
WR=wC+hR (10)
c in the formula (10) represents a classification result given by the decision tree; r represents the mean value of the classification results given by all decision trees, and the calculation formulas of w and h are shown in the formulas (8) and (9).
CN201910144370.5A 2019-02-27 2019-02-27 Flexible job shop dynamic scheduling method based on industrial big data Active CN109902954B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910144370.5A CN109902954B (en) 2019-02-27 2019-02-27 Flexible job shop dynamic scheduling method based on industrial big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910144370.5A CN109902954B (en) 2019-02-27 2019-02-27 Flexible job shop dynamic scheduling method based on industrial big data

Publications (2)

Publication Number Publication Date
CN109902954A CN109902954A (en) 2019-06-18
CN109902954B true CN109902954B (en) 2020-11-13

Family

ID=66945563

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910144370.5A Active CN109902954B (en) 2019-02-27 2019-02-27 Flexible job shop dynamic scheduling method based on industrial big data

Country Status (1)

Country Link
CN (1) CN109902954B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111047215B (en) * 2019-12-09 2023-06-23 中国兵器科学研究院 Method for determining classification of field replaceable units based on random forest
CN111401427B (en) * 2020-03-12 2022-11-08 华中科技大学 Product cost evaluation method and system based on industrial big data
CN111476466B (en) * 2020-03-25 2022-06-03 重庆邮电大学 Digital workshop electric energy management research method based on context awareness
CN111766839B (en) * 2020-05-09 2023-08-29 同济大学 Computer-implemented system for self-adaptive update of intelligent workshop scheduling knowledge
CN112712289B (en) * 2021-01-18 2022-11-22 上海交通大学 Adaptive method, system, and medium based on temporal information entropy
CN112904818B (en) * 2021-01-19 2022-07-15 东华大学 Prediction-reaction type scheduling method for complex structural member processing workshop
CN112883640B (en) * 2021-02-04 2023-06-09 西南交通大学 Digital twin station system, job scheduling method based on system and application
CN115357570A (en) * 2022-08-24 2022-11-18 安徽维德工业自动化有限公司 Workshop optimization scheduling management method based on random forest algorithm
CN117010671B (en) * 2023-10-07 2023-12-05 中国信息通信研究院 Distributed flexible workshop scheduling method and device based on block chain

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106611232A (en) * 2016-02-04 2017-05-03 四川用联信息技术有限公司 Layered optimization algorithm for solving multi-technical-route workshop scheduling
CN107862411A (en) * 2017-11-09 2018-03-30 西南交通大学 A kind of extensive flexible job shop scheduling optimization method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104914835A (en) * 2015-05-22 2015-09-16 齐鲁工业大学 Flexible job-shop scheduling multi-objective method
CN106094757B (en) * 2016-07-15 2018-12-21 郑州航空工业管理学院 A kind of dynamic flexible solving job shop scheduling problem control method based on data-driven
CN108733003B (en) * 2017-04-20 2020-11-13 南京理工大学 Method and system for predicting working hours of rotary part working procedures based on kmeans clustering algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106611232A (en) * 2016-02-04 2017-05-03 四川用联信息技术有限公司 Layered optimization algorithm for solving multi-technical-route workshop scheduling
CN107862411A (en) * 2017-11-09 2018-03-30 西南交通大学 A kind of extensive flexible job shop scheduling optimization method

Also Published As

Publication number Publication date
CN109902954A (en) 2019-06-18

Similar Documents

Publication Publication Date Title
CN109902954B (en) Flexible job shop dynamic scheduling method based on industrial big data
CN101788819B (en) Dispatching method based on iterative decomposition and flow relaxation in large-scale production process
CN107451747B (en) Workshop scheduling system based on self-adaptive non-dominated genetic algorithm and working method thereof
CN103310285A (en) Performance prediction method applicable to dynamic scheduling for semiconductor production line
CN112446526B (en) Production scheduling system and method
CN108694502A (en) A kind of robot building unit self-adapting dispatching method based on XGBoost algorithms
Sun et al. Formulations, features of solution space, and algorithms for line-pure seru system conversion
Zahmani et al. Extraction of dispatching rules for single machine total weighted tardiness using a modified genetic algorithm and data mining
CN112308298B (en) Multi-scenario performance index prediction method and system for semiconductor production line
Shmeleva et al. Industrial management decision support system: From design to software
CN115185247A (en) Machine tool running state prediction and workshop scheduling method based on product processing quality
Maquee et al. Clustering and association rules in analyzing the efficiency of maintenance system of an urban bus network
CN109615115A (en) A kind of integrated production task dispatching method of oriented mission reliability
CN113570118A (en) Workshop scheduling and analyzing method based on scheduling rule
CN110597796B (en) Big data real-time modeling method and system based on full life cycle
Liu et al. Integrated optimization of mixed-model assembly line balancing and buffer allocation based on operation time complexity
Li et al. An efficient adaptive dispatching method for semiconductor wafer fabrication facility
Wenjing et al. Data mining based dynamic scheduling approach for semiconductor manufacturing system
Mosavi et al. Intelligent energy management using data mining techniques at Bosch Car Multimedia Portugal facilities
Illgen et al. Digital assistance system for target date planning in the initiation phase of large-scale projects
Park et al. A Generation and Repair Approach to Scheduling Semiconductor Packaging Facilities Using Case-Based Reasoning
Wang Flexible job shop scheduling rules mining based on random forest
Liu et al. Manufacturing capability match and evaluation for outsourcing decision-making in one-of-a-kind production
Job et al. Performance comparison of process and adaptive cellular layouts using simulation
Zahmani et al. A real time data mining rules selection model for the job shop scheduling problem

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant