CN115357570A

CN115357570A - Workshop optimization scheduling management method based on random forest algorithm

Info

Publication number: CN115357570A
Application number: CN202211018773.3A
Authority: CN
Inventors: 宋旭东; 汪春侠; 赵跃东; 郑哲; 郭警中; 吴小松; 罗毅
Original assignee: Anhui Wdt Industrial Automation Co ltd
Current assignee: Anhui Wdt Industrial Automation Co ltd
Priority date: 2022-08-24
Filing date: 2022-08-24
Publication date: 2022-11-18

Abstract

The invention discloses a workshop optimal scheduling management method based on a random forest algorithm, which relates to the technical field of optimal scheduling of workshops, and is characterized in that a scheduling data set generated by a historical scheduling scheme is obtained in advance, and then a cleaning rule is set for the scheduling data set and is integrated; then, according to the comparison between the scheduling data set and the traditional scheduling scheme, eliminating the scheduling data set inferior to the traditional scheduling scheme; corresponding to the rest scheduling data sets, constructing a plurality of decision trees by extracting training examples and randomly selecting a plurality of characteristic attributes from each training example; testing the decision tree by using unselected training examples, and analyzing the classification performance; obtaining a random forest scheduling rule with the highest weight through similarity analysis elimination and a Bayesian voting mechanism; the problem of workshop machine and the optimal scheduling of work piece is solved.

Description

Workshop optimization scheduling management method based on random forest algorithm

Technical Field

The invention belongs to the field of workshop scheduling management, relates to a random forest technology, and particularly relates to a workshop optimal scheduling management method based on a random forest algorithm.

Background

With the continuous change of product requirements to individuation, manufacturing processes are more diversified, actual scheduling problems become more complex, and the solution of the workshop scheduling problems of manufacturing enterprises puts higher requirements on the aspects of actual operability, computational efficiency, real-time response capability to workshop disturbance and the like. In order to meet the requirements of actual job shop scheduling, corresponding data about scheduling rules can be mined from historical production scheduling data, and comprehensive judgment is performed by combining the existing priority scheduling rules;

in the past research on scheduling rule mining, a decision tree is a widely adopted mining method, but the decision tree is weak in generalization capability and is not suitable for large-scale, high-dimensional and noisy scheduling-related historical data set scheduling rule mining.

Therefore, a workshop optimization scheduling management method based on a random forest algorithm is provided.

Disclosure of Invention

The present invention is directed to solving at least one of the problems of the prior art. The workshop optimization scheduling management method based on the random forest algorithm acquires a scheduling data set generated by a historical scheduling scheme in advance, sets a cleaning rule for the scheduling data set and integrates the scheduling data set; then, according to the comparison between the scheduling data set and the traditional scheduling scheme, eliminating the scheduling data set inferior to the traditional scheduling scheme; corresponding to the rest scheduling data sets, constructing a plurality of decision trees by extracting training examples and randomly selecting a plurality of characteristic attributes from each training example; testing the decision tree by using unselected training examples, and analyzing the classification performance; then, obtaining a random forest scheduling rule with the highest weight through similarity analysis and elimination and a Bayes voting mechanism; the problem of workshop machine and the optimal scheduling of work piece is solved.

In order to achieve the above object, an embodiment according to a first aspect of the present invention provides a workshop optimization scheduling management method based on a random forest algorithm, including the following steps:

the method comprises the following steps: acquiring a scheduling data set from a database or a file system;

step two: setting a filtering rule to clean data in the scheduling data set;

step three: data integration, namely dividing the data in the filtered scheduling data set by taking a scheduling scheme as a unit, and converting the data into an expression form suitable for a data mining technology;

step four: screening out a scheduling data set generated by a proper historical scheduling scheme from all historical scheduling schemes according to a preset index;

step five: data mining of the scheduling rules, namely taking the screened scheduling data set as the input of an improved random forest algorithm to obtain a scheduling scheme of a machine selection problem of a workpiece with the best final performance and a workpiece selection problem of an idle machine;

the content of the scheduling data set comprises information such as machines, workpieces, personnel and the like when the scheduling scheme is executed; considering that the flexible job shop dynamic scheduling problem is to solve the problem of selecting a workpiece in a disturbed environment and the problem of selecting a workpiece in an idle machine, the collected historical data set Dh generated by scheduling related to each time can be expressed as Dh = { d1, d2, d3}; wherein Dh is a scheduling data set; d1, system disturbance information related to disturbance, such as the number of machine faults and the number of reworked workpieces in a workshop, is customized for a scheduling scheme; d2, when a processing machine is selected for a certain process of the workpiece, the state information of each machine in the machine set which can process the process currently, such as the number of products being processed before the machine, the time for processing the process by the machine, and the like; d3, state information of each workpiece in the current queue, such as the time of the workpiece processing on the machine, whether the workpiece is a reworked workpiece and the like, when the idle machine needs to select the workpiece to process in the waiting queue;

the method comprises the steps that a filtering rule base and a data cleaning rule are established according to actual experience in scheduling data cleaning, the filtering rule needed to be used by each production attribute and data processing logic in the filtering rule are defined, and cleaning work is conducted on collected multi-source scheduling history related data; based on the filtering rule base and the data cleaning rules, each piece of collected data can inquire a data cleaning method in the data cleaning rules according to the type of the collected data which belongs to the scheduling data set, selects corresponding filtering rules from the rule base to combine, and judges whether the filtering rules accord with the rules or not through the big data processing system to finish cleaning work;

the data integration comprises the following steps:

step S1: dividing a scheduling data set Dh in the HDFS file system by taking a scheduling scheme as a unit through a data warehouse tool by using a database query statement, namely dividing scheduling related historical data generated in the execution of a scheduling scheme at one time; the warehouse tool may be Hive;

step S2: converting the d2 and d3 parts in the divided data into training examples by using a big data processing system for mining the scheduling rules;

the scheduling data set generated by screening the appropriate historical scheduling scheme includes the following steps:

step P1: for each historical scheduling scheme, judging whether the maximum completion time of the scheduling scheme is smaller than the completion time only using the average flow rule, if not, eliminating the historical scheduling scheme; otherwise, executing the step P2;

step P2: for each remaining historical scheduling scheme, judging whether the total deadline of the historical scheduling scheme is smaller than the total deadline of a rule which combines an average flow rule and a working rule with the tightest priority selected completion deadline, and if the total deadline of the historical scheduling scheme is not smaller than the total deadline of the rule, eliminating the historical scheduling scheme; otherwise, executing the step P3;

and step P3: for each remaining historical scheduling scheme, judging whether the total load of the machine is smaller than that of the machine using a rule combining the longest waiting time scheduling rule of the equipment and the average flow rule, and if not, eliminating the historical scheduling scheme; otherwise, taking the rest historical scheduling schemes as the screened proper historical scheduling schemes;

the scheduling rule data mining comprises the following steps:

step Q1: d2 data in the scheduling data set Dh is used as input data for solving the problem of machine selection of the workpiece; d3 data in the scheduling data set Dh is used as input data for solving the workpiece selection problem of the idle machine; constructing a random forest by using the d2 data and the d3 data, and respectively generating a random forest scheduling rule 1 and a random forest scheduling rule 2;

the generation of the random forest scheduling rule 1 and the random forest scheduling rule 2 comprises the following steps:

step X1: extracting training examples which are put back from the d2/d3 data to form k new training example sets for constructing k decision trees; wherein k is the number of decision trees set according to actual experience;

step X2: randomly selecting m characteristic attributes from d2/d3 data by each training example set, and calculating an optimal splitting mode to obtain k decision trees; wherein m is the number of characteristic attributes set according to actual experience;

step X3: testing and recording the classification performance of each decision tree in the random forest by using the unselected training examples in the d2/d3 data;

step X4: calculating the similarity between each decision tree, if the similarity between two decision trees is more than 60%, determining that the decision trees are similar, and eliminating one course with poor test performance;

step X5: calculating the weight of each decision tree in the finally random forest left after elimination by using a Bayesian voting mechanism;

step X6: selecting a decision tree with the highest weight as a random forest scheduling rule 1/a random forest scheduling rule 2;

step Q2: and taking the random forest scheduling rule 1 and the random forest scheduling rule 2 as schemes for selecting proper workpieces for processing and selecting proper machines for processing.

Compared with the prior art, the invention has the beneficial effects that:

acquiring a scheduling data set generated by a historical scheduling scheme in advance, setting a cleaning rule for the scheduling data set, and integrating; then, according to the comparison between the scheduling data set and the traditional scheduling scheme, eliminating the scheduling data set inferior to the traditional scheduling scheme; corresponding to the rest scheduling data sets, constructing a plurality of decision trees by extracting training examples and randomly selecting a plurality of characteristic attributes from each training example; testing the decision tree by using unselected training examples, and analyzing the classification performance; obtaining a random forest scheduling rule with the highest weight through similarity analysis elimination and a Bayesian voting mechanism; the problem of workshop machine and the optimal scheduling of work piece is solved.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

As shown in fig. 1, the workshop optimization scheduling management method based on the random forest algorithm includes the following steps:

step two: setting a filtering rule to clean data in the scheduling data set;

the scheduling data set content comprises information such as machines, workpieces, personnel and the like when a scheduling scheme is executed; considering that the flexible job shop dynamic scheduling problem is to solve the problem of selecting a workpiece by a machine and the problem of selecting a workpiece by an idle machine in a disturbed environment, a collected historical data set Dh generated by scheduling related to each time can be expressed as Dh = { d1, d2, d3}; wherein Dh is a scheduling data set; d1, system disturbance information related to disturbance, such as the number of machine faults, the number of reworked workpieces and the like in a workshop, is customized for a scheduling scheme; d2, when a processing machine is selected for a certain process of the workpiece, the state information of each machine in the machine set which can process the process currently, such as the number of products being processed before the machine, the time for processing the process by the machine, and the like; d3, state information of each workpiece in the current queue, such as the time of the workpiece processing on the machine, whether the workpiece is a reworked workpiece and the like, when the idle machine needs to select the workpiece to process in the waiting queue;

it can be understood that the scheduling history data is bound to have repeated records, so that data conflicts, data repetition and other conditions exist in the scheduling data set Dh acquired from a plurality of service systems, and a foundation can be laid for subsequent scheduling rule mining only by performing data cleaning on complex and various scheduling data;

in a preferred embodiment, the scheduling data cleaning establishes a filtering rule base and a data cleaning rule according to actual experience, defines the filtering rule needed to be used by each production attribute and the data processing logic in the filtering rule, and performs cleaning work on the collected related data of the multi-source scheduling history; based on the filtering rule base and the data cleaning rules, each piece of collected data can inquire the data cleaning method in the data cleaning rules according to the type of the data cleaning rule which belongs to the scheduling data set, select corresponding filtering rules from the rule base to combine, and judge whether the filtering rules are met through the big data processing system to finish the cleaning work; the big data processing system may be Spark;

it can be understood that the data form in the scheduling data set is chaotic and cannot be directly used for the following data screening, clustering and scheduling rule mining work, and the data form of the scheduling data set needs to be sorted by data integration;

in a preferred embodiment, the data integration comprises the steps of:

a large amount of effective information reflecting the characteristics of the actual scheduling environment and scheduling knowledge is hidden in the scheduling data set Dh, and a plurality of useless or wrong rules or modes are accompanied; meanwhile, the quality of the scheduling scheme influences how much valuable scheduling knowledge can be extracted from scheduling data generated when the scheduling scheme is executed; therefore, a scheduling data set generated by a proper and effective scheduling scheme needs to be screened out from all historical scheduling schemes;

in a preferred embodiment, the screening of the set of scheduling data generated by the suitable historical scheduling scheme comprises the steps of:

step P2: for each remaining historical scheduling scheme, judging whether the total pull-off period of the historical scheduling scheme is smaller than that of a rule using the combination of the average flow rule and the most tight work rule of the preferred selection completion period, and if not, eliminating the historical scheduling scheme; otherwise, executing the step P3;

and step P3: for each remaining historical scheduling scheme, judging whether the total load of a machine is smaller than that of the machine using a rule combining the longest waiting time scheduling rule of the equipment and the average flow rule, and if not, eliminating the historical scheduling scheme; otherwise, taking the rest historical scheduling schemes as the screened proper historical scheduling schemes;

in a preferred embodiment, the scheduling rule data mining comprises the steps of:

step Q1: d2 data in the scheduling data set Dh is used as input data for solving the problem of machine selection of the workpieces; d3 data in the scheduling data set Dh is used as input data for solving the workpiece selection problem of the idle machine; constructing a random forest by using the d2 data and the d3 data, and respectively generating a random forest scheduling rule 1 and a random forest scheduling rule 2;

step X2: randomly selecting m characteristic attributes from d2/d3 data by each training instance set, and calculating an optimal splitting mode to obtain k decision trees; wherein m is the number of characteristic attributes set according to actual experience;

step X4: calculating the similarity between each decision tree, if the similarity between the two decision trees is more than 60%, determining that the decision trees are similar, and eliminating a course with poor test performance;

Although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the spirit and scope of the present invention.

Claims

1. The workshop optimization scheduling management method based on the random forest algorithm is characterized by comprising the following steps of:

step two: setting a filtering rule to clean data in the scheduling data set;

step five: and (4) carrying out data mining on the scheduling rules, namely taking the screened scheduling data set as the input of an improved random forest algorithm to obtain a scheduling scheme for finally performing the best machine selection problem of the workpiece and the workpiece selection problem of the idle machine.

2. The random forest algorithm based workshop optimized scheduling management method according to claim 1, wherein the scheduling data set content comprises machine, workpiece and personnel information when a scheduling scheme is executed.

3. The method for managing optimized workshop scheduling based on the random forest algorithm according to claim 1, wherein the collected historical data set Dh generated by scheduling related to each time is expressed as Dh = { d1, d2, d3}; wherein Dh is a scheduling data set; d1, system disturbance information related to disturbance is set for a scheduling scheme; d2, when a processing machine is selected for a certain procedure of the workpiece, the state information of each machine in the machine set which can process the procedure at present; d3 is the state information of each workpiece in the current queue when the idle machine needs to select the workpiece in the waiting queue for processing.

4. The workshop optimization scheduling management method based on the random forest algorithm is characterized in that a filtering rule base and a data cleaning rule are established according to actual experience in scheduling data cleaning, the filtering rule needed to be used by each production attribute and data processing logic in the filtering rule are defined, and cleaning work is carried out on collected multi-source scheduling history related data; based on the filtering rule base and the data cleaning rules, each piece of collected data can inquire the data cleaning method in the data cleaning rules according to the type of the collected data which belongs to the scheduling data set, corresponding filtering rules are selected from the rule base to be combined, and whether the filtering rules are met or not is judged through the big data processing system to finish cleaning work.

5. The random forest algorithm based workshop optimization scheduling management method according to claim 1, wherein the data integration comprises the following steps:

step S1: dividing a scheduling data set Dh in the HDFS file system by taking a scheduling scheme as a unit through a data warehouse tool by using a database query statement, namely dividing scheduling related historical data generated in the execution of a scheduling scheme at one time;

step S2: and converting d2 and d3 data in the divided data into training examples by using a big data processing system for mining the scheduling rules.

6. The method for optimizing and scheduling workshops based on random forest algorithm according to claim 1, wherein the step of screening the scheduling data sets generated by the suitable historical scheduling schemes comprises the following steps:

step P1: for each historical scheduling scheme, judging whether the maximum completion time of the historical scheduling scheme is smaller than the completion time of only using the average flow rule, if not, eliminating the historical scheduling scheme; otherwise, executing the step P2;

step P3: for each remaining historical scheduling scheme, judging whether the total load of the machine is smaller than that of the machine using a rule combining the longest waiting time scheduling rule of the equipment and the average flow rule, and if not, eliminating the historical scheduling scheme; otherwise, the rest historical scheduling schemes are used as the proper historical scheduling schemes after screening.

7. The random forest algorithm based workshop optimization scheduling management method according to claim 1, wherein the scheduling rule data mining comprises the following steps:

8. The random forest algorithm based workshop optimization scheduling management method according to claim 7, wherein the generation of the random forest scheduling rules 1 and 2 comprises the following steps:

step X6: and selecting the decision tree with the highest weight value as a random forest scheduling rule 1/a random forest scheduling rule 2.