CN113268936A - Key quality characteristic identification method based on multi-target evolution random forest characteristic selection - Google Patents
Key quality characteristic identification method based on multi-target evolution random forest characteristic selection Download PDFInfo
- Publication number
- CN113268936A CN113268936A CN202110752786.2A CN202110752786A CN113268936A CN 113268936 A CN113268936 A CN 113268936A CN 202110752786 A CN202110752786 A CN 202110752786A CN 113268936 A CN113268936 A CN 113268936A
- Authority
- CN
- China
- Prior art keywords
- quality
- algorithm
- random forest
- characteristic
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000007637 random forest analysis Methods 0.000 title claims abstract description 47
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 71
- 238000012360 testing method Methods 0.000 claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 28
- 238000004519 manufacturing process Methods 0.000 claims abstract description 10
- 238000001514 detection method Methods 0.000 claims abstract description 4
- 230000008569 process Effects 0.000 claims description 10
- 238000012795 verification Methods 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 4
- 230000008901 benefit Effects 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 238000002790 cross-validation Methods 0.000 claims 1
- 238000005457 optimization Methods 0.000 abstract description 4
- 238000003908 quality control method Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 8
- 230000035772 mutation Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 4
- 108090000623 proteins and genes Proteins 0.000 description 4
- 238000012163 sequencing technique Methods 0.000 description 4
- 238000004445 quantitative analysis Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 238000011065 in-situ storage Methods 0.000 description 2
- 238000009776 industrial production Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004451 qualitative analysis Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/06—Multi-objective optimisation, e.g. Pareto optimisation using simulated annealing [SA], ant colony algorithms or genetic algorithms [GA]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/08—Probabilistic or stochastic CAD
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Physiology (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a key feature identification method based on multi-objective evolution random forest feature selection, which comprises the following steps: firstly, acquiring multivariate quality characteristic data information in a production process through digital detection of a workshop to form a product quality characteristic data set; and then, primarily selecting the quality characteristics participating in classification by utilizing a Relief (recursive F) algorithm, and dividing the primarily selected data set into two parts: a product quality characteristic training data set and a product quality characteristic testing data set; inputting the training data set into a multi-objective optimization random forest feature selection algorithm to obtain a key quality characteristic set; and finally, verifying the obtained key quality characteristic set by using the test data set. The method considers the complex influence of the multivariate quality characteristics on the final quality of the product, accurately analyzes the key quality characteristics in the product, provides reference for identifying the key quality characteristics, provides support for quality control and improves the product quality prediction capability.
Description
Technical Field
The invention provides a key quality characteristic identification method based on multi-objective evolution random forest characteristic selection, and belongs to the field of quality management.
Background
In modern industrial production, a large amount of process data and quality characteristic data are generated in the production process of a product, including production environment data, product characteristic data, assembly characteristic data, customer demand characteristic data, and the like. Some of the quality characteristics have very important influence on the quality of the product, and some have little influence on the quality of the product, so that the identification of the key quality characteristics closely related to the quality of the product has very important significance on the continuous improvement of the product, the quality prediction of the product and the quality control of the product.
The traditional key quality characteristic identification method comprises key characteristic expansion and quality function expansion, wherein the key characteristic expansion method decomposes the whole product layer by layer, expands the product layer by layer from the aspects of product characteristics, part characteristics, process characteristics and the like, and then applies a qualitative and quantitative analysis method to identify the key quality characteristics. The quality function expansion method embodies customer demand guidance, focuses on the part of quality characteristics most focused by customers, but easily ignores potential key quality characteristics purely according to customer demands. The complex product comprises a large number of quality characteristics of component levels, the influence relationship among the quality characteristics is complex, and the mutual influence and the key degree of the quality characteristics are difficult to determine by the traditional qualitative and quantitative method.
The invention provides a method for identifying key quality characteristics by using quality characteristic data in industrial production on the basis of combining a multi-objective optimization algorithm and a machine learning model. The method is used for improving the product continuity in the production process, and improving the capabilities of product quality control, product quality prediction and the like.
Disclosure of Invention
(1) Objects of the invention
The invention aims to provide a key quality characteristic identification method based on multi-objective evolution random forest characteristic selection, and aims to solve the problem that the conventional qualitative and quantitative method is difficult to identify and determine the key quality characteristic.
(2) Technical scheme
The invention provides a key quality characteristic identification method based on multi-objective evolution random forest characteristic selection for solving the problems, as shown in the attached figure 1, the method comprises the following steps:
step 1: acquiring multivariate quality characteristic data information in a production process through digital detection of a workshop, wherein the multivariate quality characteristic data information comprises a plurality of process parameters, product size parameters, product grade classification and other quality characteristics which are important factors influencing the overall quality level of a product, so that a product quality characteristic data set is formed;
step 2: the method comprises the following steps of carrying out primary selection on quality characteristics participating in classification by utilizing a Relief (recursive F) algorithm, and dividing a data set after the primary selection into two parts: a product quality characteristic training data set and a product quality characteristic testing data set;
and step 3: dividing the training data set into an internal training set and an internal test set, wherein the internal training set is used for training a random forest model classifier, and the internal test set is used for evaluating the selected quality characteristic set generated by the algorithmsThe partial objective function value of (2). Then inputting a multi-target evolution random forest feature selection algorithm, establishing a plurality of corresponding algorithm targets, generating an initial population, and setting an iterative population algebra to obtain an advantage key quality characteristic set;
and 4, step 4: and verifying and evaluating the obtained key quality characteristic set by using the random forest classifier trained by the test data set and the key quality characteristic set.
The term "product quality characteristic data set" in step 1 refers to a data set having a certain number of quality characteristics (characteristic attributes), a certain number of samples (sampling products), and a definite classification for each sample (sampling product) for the same object (product) as shown in fig. 2.
In step 2, the Relief F algorithm performs initial selection on the quality characteristics participating in classification, the specific algorithm flow is shown in fig. 3, and the specific method is as follows:
2-1: extracting an individual from a sample of a certain typeESearching in homogeneous and heterogeneous samples, respectivelyFindingkThe nearest neighbor samples form a homogeneous neighbor sample setFAnd heterogeneous neighbor sample setsG;
2-2: then useEAndFandGdefining feature weight by the difference of average difference of each feature of the intermediate sampleW. For any featuremTo finishnSub-sampled feature weightsW m The calculation formula is as follows:
in the formula:
ca sample class representing a heterogeneous sample;
E[m]representing an individualEFeature(s)mA value of (d);
F j [m]is shown asjThe value of the nearest homogeneous sample feature m;
p(c) Indicates the heterogeneous sample class ascThe probability of (d);
class(E) Representing an individualEA category of (1);
P(class(E) Represents the probability that the sample class is the same as E;
G(c)j[m]is shown asjIs the nearest tocClass sample characterizationmA value of (d);
the larger the weight of the characteristic is, the larger the inter-class distance and the small intra-class distance of the sample are caused by the characteristic, and the large identification effect on the class is achieved;
2-3: eliminating the quality characteristic that the distance between classes is smaller than the distance in the classes, and dividing the initially selected data set into two parts: a training data set of product quality characteristics and a testing data set of product quality characteristics, the training and testing sets being adapted tokCross-folding authentication method in which data is divided equally by sample sizekIn part, takek-1Part is a training set, and part 1 is a testing set.
The flow of the multi-objective evolution random forest feature selection algorithm in the step 3 is shown in fig. 4. The algorithm flow consists of two parts, namely a multi-objective evolutionary algorithm, an NSGA II algorithm is selected and Matlab software is used for realizing the multi-objective evolutionary algorithm. And the second is a random forest classifier which is realized by utilizing Python. The whole algorithm implementation process is realized by Matlab and Python interaction.
Wherein, in the multi-target evolution random forest characteristic selection algorithm in the step 3, the multi-target evolution algorithm selects the NSGA II algorithm, and the individual gene coding mode in the population adopts a binary coding method to solve the problemsIs coded asCThen, thenC=(c 1 ,c 2 ,c 3 ,c 4 ,c 5 ,…,c N ) Is 1NThe vector of (2).NFor the total mass characteristic quantity, each elementc i ∈{0,1}(i=1,2,…,n) Represents the firstiOne feature is selected or not selected, if it is '1', and not selected if it is '0'. Each code corresponds to a solution, i.e. a subset of quality characteristics.
In the algorithm of the multi-objective evolution random forest feature selection algorithm in the step 3, the population genetic mode is binary tournament selection, two individuals are selected from a parent generation population each time, the two individuals are compared (using a crowding comparison operator), and more superior individuals are added into a child generation population.
In the multi-target evolution random forest feature selection algorithm in the step 3, a single-point crossing method is selected as a crossing method among individuals in the population, and the individuals are crossedC 1 =(c 11 ,c 12 ,c 13 ,c 14 ,c 15 ,…,c 1N ),C 2 =(c 21 ,c 22 ,c 23 ,c 24 ,c 25 ,…,c 2N ) By cross probabilityp c Two new individuals were generated by performing crossover operations:C 1 =(c 11 ,c 12 ,c 13 ,…,c 1e-1 ,c 2e ,…,c 2N ),C 2 =(c 21 ,c 22 ,c 23 ,…,c 2e-1 ,c 1e ,…,c 1N )。
in the algorithm of the multi-objective evolution random forest feature selection algorithm in the step 3, a multi-objective variation method is adopted for variation modes of individuals in the population, and the individuals adopt the multi-objective variation methodC=(c 1 ,c 2 ,c 3 ,c 4 ,c 5 ,…,c N ) Each gene has mutation probabilityp m Carrying out mutation operation to generate a new individual: if the in situ is '0', the mutation is '1', and if the in situ is '1', the mutation is '0'.
In the multi-objective evolution random forest feature selection algorithm in the step 3, the multi-objective evolution algorithm selects the NSGA II algorithm, and the algorithm target is set by the actual production requirement, including but not limited to
Min F(s)={f 1 (s),f 2 (s),f 3 (s)},f 1 In order to classify the error rate of the data,f 2 under the algorithm of Relief FsThe inverse of the sum of the weights of (c),f 3 is the quality characteristic subset size.
In the algorithm of the multi-objective evolution random forest feature selection algorithm in the step 3, the non-dominant sequencing basis of individuals in the population is as follows: for minimizing the multiobjective optimization problem, fornA target componentf i (s),(i=1,2,…,n) Any given two decision variablesX a ,X b If the following two conditions are satisfied, it is calledX a DominatingX b :
For any onei∈1,2,…,nAll are provided withf i (X a )≤f i (X b ) If true;
exist ofi∈1,2,…,nSo thatf i (X a )≤f i (X b ) If true;
if no other decision variables can dominate one decision variable, the decision variable is called as a non-dominated solution, in a group of solutions, the Pareto level of the non-dominated solution is defined as 1, the non-dominated solution is deleted from the solution set, the Pareto level of the rest solutions is defined as 2, and by analogy, the Pareto levels of all solutions in the solution set can be obtained, and the Pareto levels are sorted as shown in fig. 5.
In the multi-objective evolution random forest feature selection algorithm in the step 3, the crowding degree sequencing basis of the same non-dominant level individual in the population is as follows: the crowdedness represents the density of individuals around a given point in a population, and is defined asi d Representing, visually, the individualiThe length of the largest rectangle surrounding the individual i but not the rest, as shown in fig. 6.
The specific method in the step 3 is as follows:
3-1: initializing the population and randomly generating a population Pt,PtEach individual being a selected set of quality characteristicss;
3-2: for population PtProceeding heredity, crossover and variation to obtain population Pt’;
3-3: target function pair population R set by algorithmt=Pt +Pt’Carrying out fitness evaluation on each individual, and obtaining a target value for each individualf 1 (s),f 2 (s),f 3 (s)};
3-4: using fast non-dominant sorting method to RtEach individual is ranked in a non-dominated ranking order;
3-5: selecting the individual with the minimum current non-dominant grade to enter the selected population Pt+1Up to Pt+1Until the population cannot accommodate the next level;
3-6: carrying out congestion distance sorting on the next non-dominated level individuals by using a congestion distance distribution method;
3-7: selecting the individual with the largest crowding distance to enter an election group Pt+1Until population P is completedt+1;
3-8: and repeating the steps 3-2 to 3-7 until the algorithm termination condition is reached. Outputting the population individuals after the algorithm is terminated, and decoding to obtain an identified key quality characteristic set;
specifically, in step 3-3, the "fitness evaluation" specifically includes the following steps:
3-3-1: decoding each individual of R into a corresponding set of quality characteristics;
3-3-2: corresponding quality characteristic set after decoding, wherein the quantity of the quality characteristics is a fitness functionf 3 (s) value;
3-3-3: corresponding quality characteristic set after decoding and corresponding to a Relief F algorithmsIs a function of the inverse of the sum of the weights off 2 (s) A value of (d);
3-3-4: extracting quality characteristic data sets corresponding to the internal training set to train the random forest classifier respectively;
3-3-5: extracting quality characteristic data sets corresponding to the internal test set, respectively verifying and predicting the precision of the trained random forest classifier, and obtaining a fitness functionf 1 (s) The value of (c).
In the step 4, the specific verification method in verifying and evaluating the obtained key quality characteristic set by using the test data set is to extract the quality characteristic data sets corresponding to the test set and respectively verify and predict the accuracy of the trained random forest classifier, and the method also adoptskAnd (4) folding and crossing verification method.
Drawings
FIG. 1 is a block flow diagram of the method of the present invention.
Fig. 2 is a block diagram of a product quality characteristic data set.
Fig. 3 is a flowchart of the Relief F algorithm.
FIG. 4 is a flow chart of a multi-objective evolutionary random forest feature selection algorithm.
FIG. 5 is a graph of Pareto rank after non-dominated sorting.
Fig. 6 is a schematic diagram of the congestion degree ranking.
FIG. 7 is a population evolution mode diagram of a multi-objective evolution random forest feature selection algorithm.
Detailed Description
The invention provides a key quality characteristic identification method based on multi-objective evolution random forest characteristic selection, and the invention is further described in detail with reference to the attached drawings.
The term "product quality characteristic data set" as shown in fig. 2 refers to a data set that has a certain number of quality characteristics (characteristic attributes), a certain number of samples (sampling products), and a definite classification for each sample (sampling product) for the same research object (product).
The "Relief F algorithm performs initial selection on the quality characteristics participating in classification", the specific algorithm flow is shown in fig. 3, and the specific method is as follows:
2-1: extracting an individual from a sample of a certain typeEFinding out the same type and different type samples respectivelykThe nearest neighbor samples form a homogeneous neighbor sample setFAnd heterogeneous neighbor sample setsG;
2-2: then useEAndFandGdifference determination of average difference of each characteristic of medium sampleSemantic feature weightW. For any featuremTo finishnSub-sampled feature weightsW m The calculation formula is as follows:
in the formula:
ca sample class representing a heterogeneous sample;
E[m]representing an individualEFeature(s)mA value of (d);
F j [m]is shown asjFeatures of a nearest neighbor homogeneous samplemA value of (d);
p(c) Indicates the heterogeneous sample class ascThe probability of (d);
class(E) Representing an individualEA category of (1);
P(class(E) Represent sample classes andEthe same probability;
G(c)j[m]is shown asjIs the nearest tocClass sample characterizationmA value of (d);
the larger the weight of the characteristic is, the larger the inter-class distance and the small intra-class distance of the sample are caused by the characteristic, and the large identification effect on the class is achieved;
2-3: eliminating the quality characteristic that the distance between classes is smaller than the distance in the classes, and dividing the initially selected data set into two parts: a training data set of product quality characteristics and a testing data set of product quality characteristics, the training and testing sets being adapted tokCross-folding authentication method in which data is divided equally by sample sizekIn part, takek-1Part is a training set, and part 1 is a testing set.
The algorithm flow of the multi-target evolution random forest feature selection algorithm is shown in the attached figure 4. The algorithm flow consists of two parts, namely a multi-objective evolutionary algorithm, an NSGA II algorithm is selected and Matlab software is used for realizing the multi-objective evolutionary algorithm. And the second is a random forest classifier which is realized by utilizing Python. The whole algorithm implementation process is realized by Matlab and Python interaction. The specific method in step 3 is as follows:
3-1: initializing the population and randomly generating a population Pt,PtEach individual being a selected set of quality characteristicss. The individual gene coding mode in the population adopts a binary coding method to solvesIs coded asCThen, thenC=(c 1 ,c 2 ,c 3 , c 4 ,c 5 ,…,c N ) Is 1NThe vector of (2).NFor the total mass characteristic quantity, each elementc i ∈{0,1}(i=1,2,3,…,n) Represents the firstiOne feature is selected or not selected, if it is '1', and not selected if it is '0'. Each code corresponds to a solution, i.e. a subset of quality characteristics;
3-2: for population PtProceeding heredity, crossover and variation to obtain population Pt. The method comprises the following specific steps:
3-2-1: the genetic mode in the population is binary tournament selection, two individuals are selected from a parent generation population each time, the two individuals are compared (a crowding comparison operator is used), and more superior individuals are added into a child generation population;
3-2-2: the method for crossing individuals in the population adopts a single-point crossing method, and the individualsC 1 =(c 11 ,c 12 ,c 13 ,c 14 , c 15 ,…,c 1N ),C 2 =(c 21 ,c 22 ,c 23 ,c 24 ,c 25 ,…,c 2N ) By cross probabilityp c Two new individuals were generated by performing crossover operations:
C 1 =(c 11 ,c 12 ,c 13 ,…,c 1e-1 ,c 2e …,c 2N ),C 2 =(c 21 ,c 22 ,c 23 ,…,c 2e-1 ,c 1e …,c 1N );
3-2-3: the variation mode of individuals in the population adopts a multipoint variation method, and the individualsC=(c 1 ,c 2 ,c 3 ,c 4 ,c 5 ,…,c N ) Each gene has mutation probabilityp m Carrying out mutation operation to generate a new individual: if the original position is '0', the mutation is '1', and if the original position is '1', the mutation is '0';
3-3: target function pair population R set by algorithmt=Pt +Pt’The algorithm target is set by actual production requirements, including but not limited toMin F(s)={f 1 (s),f 2 (s),f 3 (s)},f 1 In order to classify the error rate of the data,f 2 under the algorithm of Relief FsThe inverse of the sum of the weights of (c),f 3 for the quality characteristic subset size, fitness evaluation is performed on each individual, and each individual obtains a target valuef 1 (s),f 2 (s),f 3 (s) The method comprises the following specific steps:
3-3-1: will be provided withREach individual is decoded into a corresponding set of quality characteristics;
3-3-2: set of corresponding quality characteristics after decoding, number of quality characteristicsQuantity as a fitness functionf 3 (s)A value of (d);
3-3-3: corresponding quality characteristic set after decoding and corresponding to a Relief F algorithmsIs a function of the inverse of the sum of the weights off 2 (s) A value of (d);
3-3-4: extracting quality characteristic data sets corresponding to the internal training set to train the random forest classifier respectively;
3-3-5: extracting quality characteristic data sets corresponding to the internal test set, respectively verifying and predicting the precision of the trained random forest classifier, and obtaining a fitness functionf 1 (s) A value of (d);
3-4: using fast non-dominated sorting method pairsR t Each individual is subjected to non-dominated ranking, and the specific steps are as follows:
the non-dominant ranking of individuals in the population is based on: for minimizing the multiobjective optimization problem, fornA target componentf i (s),(i=1,2,…,n) Any given two decision variablesX a ,X b If the following two conditions are satisfied, it is calledX a DominatingX b :
1, for anyi∈1,2,…,nAll are provided withf i (X a )≤f i (X b ) If true;
2, existence ofi∈1,2,…,nSo thatf i (X a )≤f i (X b ) If true;
if one decision variable does not have other decision variables capable of dominating the decision variable, the decision variable is called as a non-dominated solution, in a group of solutions, the Pareto level of the non-dominated solution is defined as 1, the non-dominated solution is deleted from the solution set, the Pareto level of the rest solutions is defined as 2, and by analogy, the Pareto levels of all solutions in the solution set can be obtained, and the Pareto levels are ranked as shown in fig. 5;
3-5: selecting the individuals with the smallest current non-dominant gradeEntry populationP t+1 Up toP t+1 Until the population cannot accommodate the next level;
3-6: and carrying out congestion distance sequencing on the next non-dominated level individual by using a congestion distance distribution method, wherein the specific method comprises the following steps:
the crowding degree sequencing basis of the individuals with the same non-dominant grade in the population is as follows: the crowdedness represents the density of individuals around a given point in a population, and is defined asi d Representing, visually, the individualiThe surroundings include the individualiBut does not include the length of the largest rectangle of the rest of the individuals, as shown in fig. 6;
3-7: selecting the individual with the largest crowding distance to enter the selected populationP t+1 Until population is completedP t+1 . FIG. 7 shows the evolution process of the population, including steps 3-4 to 3-7;
3-8: and repeating the steps 3-2 to 3-7 until the algorithm termination condition is reached. And outputting the population individuals after the algorithm is terminated, and decoding to obtain the identified key quality characteristic set.
And 4, verifying and evaluating the obtained key quality characteristic set by using the random forest classifier trained by the test data set and the key quality characteristic set, wherein the specific verification method is to extract the quality characteristic data sets corresponding to the test set and verify and predict the precision of the trained random forest classifier respectively, and the same method is adoptedkAnd (4) folding and crossing verification method.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the applicant has described the present invention in detail, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention and shall be covered by the claims of the present invention.
Claims (8)
1. A key quality characteristic identification method based on multi-objective evolution random forest characteristic selection is characterized by comprising the following steps: step 1: acquiring multivariate quality characteristic data information in a production process through digital detection of a workshop, wherein the multivariate quality characteristic data information comprises a plurality of process parameters, product size parameters, product grade classification and other quality characteristics which are important factors influencing the overall quality level of a product, so that a product quality characteristic data set is formed; step 2: performing primary selection on the quality characteristics participating in classification by using a Relief (recursive F) algorithm to obtain the algorithm weight of the quality characteristics, eliminating the quality characteristics of which the inter-class distance is less than the intra-class distance, and dividing the initially selected data set into two parts: a product quality characteristic training data set and a product quality characteristic testing data set; and step 3: inputting the training data set into a multi-objective evolution random forest feature selection algorithm, establishing a plurality of corresponding algorithm targets, generating an initial population, and setting an iterative population algebra to obtain an advantage key quality feature set; and 4, step 4: and verifying and evaluating the obtained key quality characteristic set by using the test data set.
2. The method for identifying key quality characteristics based on multi-objective evolution random forest characteristic selection as claimed in claim 1, is characterized in that: the "product quality characteristic data set" in step 1 refers to a data set having a certain number of quality characteristics (characteristic attributes), a certain number of samples (sampling products), and a definite classification for each sample (sampling products) for the same study object (product).
3. The method for identifying key quality characteristics based on multi-objective evolution random forest characteristic selection as claimed in claim 1, is characterized in that: the "Relief F" algorithm in step 2 is an extension of the Relief algorithm, and the specific process is as follows: extracting an individual from a sample of a certain typeEFinding out the same type and different type samples respectivelykThe nearest neighbor samples form a homogeneous neighbor sample setFAnd heterogeneous neighbor sample setsTThen, further withEAndFandGdefining feature weight by the difference of average difference of each feature of the intermediate sampleW。
4. The key quality characteristic identification method based on multi-objective evolution random forest characteristic selection as claimed in claim 1The method is characterized in that: step 2 the training set and the test set are adoptedkCross-folding authentication method in which data is divided equally by sample sizekIn part, takek-1Part is a training set, and part 1 is a testing set.
5. The method for identifying key quality characteristics based on multi-objective evolution random forest characteristic selection as claimed in claim 1, is characterized in that: the multi-objective evolutionary random forest feature selection algorithm comprises two parts, namely a multi-objective evolutionary algorithm, an NSGA II algorithm is selected and realized by Matlab software; and the random forest classifier is realized by utilizing Python, and the whole algorithm realization process is realized by Matlab and Python interaction.
6. The method for identifying key quality characteristics based on multi-objective evolution random forest characteristic selection as claimed in claim 1, is characterized in that: step 3, the multi-target evolution random forest feature selection algorithm is used for selecting the quality feature set of the population individualssThe objective function is given by production practice requirements, including but not limited to:
MinF(s)={f 1 (s),f 2 (s),f 3 (s)},f 1 in order to classify the error rate of the data,f 2 under the algorithm of Relief FsThe inverse of the sum of the weights of (c),f 3 is the feature subset size.
7. The method for identifying key quality characteristics based on multi-objective evolution random forest characteristic selection as claimed in claim 1, is characterized in that: in the step 3, the training set in the multi-objective evolution random forest feature selection algorithm is divided into an internal training set and an internal test set, wherein the internal training set is used for training a random forest model classifier, and the internal test set is used for evaluating the selected quality characteristic set generated by the algorithmsThe partial objective function value of (2).
8. A key quality characteristic identification method based on multi-objective evolution random forest characteristic selection is characterized by comprising the following steps: step 4, the verification and evaluation method adopts the stepskFold-cross validation method, system and methodkA test data set is co-processedkSecond verification, getkTarget average for secondary verification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110752786.2A CN113268936B (en) | 2021-07-03 | 2021-07-03 | Key quality characteristic identification method based on multi-objective evolution random forest characteristic selection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110752786.2A CN113268936B (en) | 2021-07-03 | 2021-07-03 | Key quality characteristic identification method based on multi-objective evolution random forest characteristic selection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113268936A true CN113268936A (en) | 2021-08-17 |
CN113268936B CN113268936B (en) | 2022-07-19 |
Family
ID=77236356
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110752786.2A Expired - Fee Related CN113268936B (en) | 2021-07-03 | 2021-07-03 | Key quality characteristic identification method based on multi-objective evolution random forest characteristic selection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113268936B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106845796A (en) * | 2016-12-28 | 2017-06-13 | 中南大学 | One kind is hydrocracked flow product quality on-line prediction method |
JP2017161991A (en) * | 2016-03-07 | 2017-09-14 | 三菱重工業株式会社 | Quality evaluation system, quality evaluation method and program |
CN109523086A (en) * | 2018-11-26 | 2019-03-26 | 浙江蓝卓工业互联网信息技术有限公司 | The qualitative forecasting method and system of chemical products based on random forest |
CN110288199A (en) * | 2019-05-29 | 2019-09-27 | 北京航空航天大学 | The method of product quality forecast |
CN110456756A (en) * | 2019-03-25 | 2019-11-15 | 中南大学 | A method of suitable for continuous production process overall situation operation conditions online evaluation |
CN110582091A (en) * | 2018-06-11 | 2019-12-17 | 中国移动通信集团浙江有限公司 | method and apparatus for locating wireless quality problems |
CN112418538A (en) * | 2020-11-30 | 2021-02-26 | 武汉科技大学 | Continuous casting billet inclusion prediction method based on random forest classification |
-
2021
- 2021-07-03 CN CN202110752786.2A patent/CN113268936B/en not_active Expired - Fee Related
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017161991A (en) * | 2016-03-07 | 2017-09-14 | 三菱重工業株式会社 | Quality evaluation system, quality evaluation method and program |
CN106845796A (en) * | 2016-12-28 | 2017-06-13 | 中南大学 | One kind is hydrocracked flow product quality on-line prediction method |
CN110582091A (en) * | 2018-06-11 | 2019-12-17 | 中国移动通信集团浙江有限公司 | method and apparatus for locating wireless quality problems |
CN109523086A (en) * | 2018-11-26 | 2019-03-26 | 浙江蓝卓工业互联网信息技术有限公司 | The qualitative forecasting method and system of chemical products based on random forest |
CN110456756A (en) * | 2019-03-25 | 2019-11-15 | 中南大学 | A method of suitable for continuous production process overall situation operation conditions online evaluation |
CN110288199A (en) * | 2019-05-29 | 2019-09-27 | 北京航空航天大学 | The method of product quality forecast |
CN112418538A (en) * | 2020-11-30 | 2021-02-26 | 武汉科技大学 | Continuous casting billet inclusion prediction method based on random forest classification |
Non-Patent Citations (3)
Title |
---|
QIAO SHI等: "The Application of Tobacco Product Quality Prediction Using Ensemble Learning Method", 《2019 IEEE 4TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC)》 * |
乔佩蕊: "基于改进LASSO-RF的复杂产品质量预测研究", 《CNKI优秀硕士学位论文全文库 经济与管理科学辑》 * |
伍薪烨: "复杂装备装配质量多特征决策与作业优化调度方法研究", 《CNKI优秀硕士学位论文全文库 工程科技Ⅰ辑》 * |
Also Published As
Publication number | Publication date |
---|---|
CN113268936B (en) | 2022-07-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110213222B (en) | Network intrusion detection method based on machine learning | |
CN108921604B (en) | Advertisement click rate prediction method based on cost-sensitive classifier integration | |
CN108898479B (en) | Credit evaluation model construction method and device | |
CN111414849B (en) | Face recognition method based on evolution convolutional neural network | |
CN112232413B (en) | High-dimensional data feature selection method based on graph neural network and spectral clustering | |
CN108681742B (en) | Analysis method for analyzing sensitivity of driver driving behavior to vehicle energy consumption | |
CN106446602A (en) | Prediction method and system for RNA binding sites in protein molecules | |
CN107016416B (en) | Data classification prediction method based on neighborhood rough set and PCA fusion | |
CN110222838B (en) | Document sorting method and device, electronic equipment and storage medium | |
CN112633337A (en) | Unbalanced data processing method based on clustering and boundary points | |
CN112906890A (en) | User attribute feature selection method based on mutual information and improved genetic algorithm | |
CN101923604A (en) | Classification method for weighted KNN oncogene expression profiles based on neighborhood rough set | |
CN106951728B (en) | Tumor key gene identification method based on particle swarm optimization and scoring criterion | |
CN115481841A (en) | Material demand prediction method based on feature extraction and improved random forest | |
CN113268936B (en) | Key quality characteristic identification method based on multi-objective evolution random forest characteristic selection | |
CN115481844A (en) | Distribution network material demand prediction system based on feature extraction and improved SVR model | |
CN117272025A (en) | High-dimensional data feature selection method based on fuzzy competition particle swarm multi-objective optimization | |
CN108305174B (en) | Resource processing method, device, storage medium and computer equipment | |
CN113657441A (en) | Classification algorithm based on weighted Pearson correlation coefficient and combined with feature screening | |
KR100727555B1 (en) | Creating method for decision tree using time-weighted entropy and recording medium thereof | |
Mahfuz et al. | Clustering heterogeneous categorical data using enhanced mini batch K-means with entropy distance measure | |
CN113269217A (en) | Radar target classification method based on Fisher criterion | |
CN112801197A (en) | K-means method based on user data distribution | |
CN115017125B (en) | Data processing method and device for improving KNN method | |
Irawan et al. | Accounts Receivable Seamless Prediction for Companies by Using Multiclass Data Mining Model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220719 |