CN113268936A - Key quality characteristic identification method based on multi-target evolution random forest characteristic selection - Google Patents

Key quality characteristic identification method based on multi-target evolution random forest characteristic selection Download PDF

Info

Publication number
CN113268936A
CN113268936A CN202110752786.2A CN202110752786A CN113268936A CN 113268936 A CN113268936 A CN 113268936A CN 202110752786 A CN202110752786 A CN 202110752786A CN 113268936 A CN113268936 A CN 113268936A
Authority
CN
China
Prior art keywords
quality
algorithm
random forest
characteristic
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110752786.2A
Other languages
Chinese (zh)
Other versions
CN113268936B (en
Inventor
赵永满
潘荣顺
余佳昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shihezi University
Original Assignee
Shihezi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shihezi University filed Critical Shihezi University
Priority to CN202110752786.2A priority Critical patent/CN113268936B/en
Publication of CN113268936A publication Critical patent/CN113268936A/en
Application granted granted Critical
Publication of CN113268936B publication Critical patent/CN113268936B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/06Multi-objective optimisation, e.g. Pareto optimisation using simulated annealing [SA], ant colony algorithms or genetic algorithms [GA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/08Probabilistic or stochastic CAD

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Physiology (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a key feature identification method based on multi-objective evolution random forest feature selection, which comprises the following steps: firstly, acquiring multivariate quality characteristic data information in a production process through digital detection of a workshop to form a product quality characteristic data set; and then, primarily selecting the quality characteristics participating in classification by utilizing a Relief (recursive F) algorithm, and dividing the primarily selected data set into two parts: a product quality characteristic training data set and a product quality characteristic testing data set; inputting the training data set into a multi-objective optimization random forest feature selection algorithm to obtain a key quality characteristic set; and finally, verifying the obtained key quality characteristic set by using the test data set. The method considers the complex influence of the multivariate quality characteristics on the final quality of the product, accurately analyzes the key quality characteristics in the product, provides reference for identifying the key quality characteristics, provides support for quality control and improves the product quality prediction capability.

Description

Key quality characteristic identification method based on multi-target evolution random forest characteristic selection
Technical Field
The invention provides a key quality characteristic identification method based on multi-objective evolution random forest characteristic selection, and belongs to the field of quality management.
Background
In modern industrial production, a large amount of process data and quality characteristic data are generated in the production process of a product, including production environment data, product characteristic data, assembly characteristic data, customer demand characteristic data, and the like. Some of the quality characteristics have very important influence on the quality of the product, and some have little influence on the quality of the product, so that the identification of the key quality characteristics closely related to the quality of the product has very important significance on the continuous improvement of the product, the quality prediction of the product and the quality control of the product.
The traditional key quality characteristic identification method comprises key characteristic expansion and quality function expansion, wherein the key characteristic expansion method decomposes the whole product layer by layer, expands the product layer by layer from the aspects of product characteristics, part characteristics, process characteristics and the like, and then applies a qualitative and quantitative analysis method to identify the key quality characteristics. The quality function expansion method embodies customer demand guidance, focuses on the part of quality characteristics most focused by customers, but easily ignores potential key quality characteristics purely according to customer demands. The complex product comprises a large number of quality characteristics of component levels, the influence relationship among the quality characteristics is complex, and the mutual influence and the key degree of the quality characteristics are difficult to determine by the traditional qualitative and quantitative method.
The invention provides a method for identifying key quality characteristics by using quality characteristic data in industrial production on the basis of combining a multi-objective optimization algorithm and a machine learning model. The method is used for improving the product continuity in the production process, and improving the capabilities of product quality control, product quality prediction and the like.
Disclosure of Invention
(1) Objects of the invention
The invention aims to provide a key quality characteristic identification method based on multi-objective evolution random forest characteristic selection, and aims to solve the problem that the conventional qualitative and quantitative method is difficult to identify and determine the key quality characteristic.
(2) Technical scheme
The invention provides a key quality characteristic identification method based on multi-objective evolution random forest characteristic selection for solving the problems, as shown in the attached figure 1, the method comprises the following steps:
step 1: acquiring multivariate quality characteristic data information in a production process through digital detection of a workshop, wherein the multivariate quality characteristic data information comprises a plurality of process parameters, product size parameters, product grade classification and other quality characteristics which are important factors influencing the overall quality level of a product, so that a product quality characteristic data set is formed;
step 2: the method comprises the following steps of carrying out primary selection on quality characteristics participating in classification by utilizing a Relief (recursive F) algorithm, and dividing a data set after the primary selection into two parts: a product quality characteristic training data set and a product quality characteristic testing data set;
and step 3: dividing the training data set into an internal training set and an internal test set, wherein the internal training set is used for training a random forest model classifier, and the internal test set is used for evaluating the selected quality characteristic set generated by the algorithmsThe partial objective function value of (2). Then inputting a multi-target evolution random forest feature selection algorithm, establishing a plurality of corresponding algorithm targets, generating an initial population, and setting an iterative population algebra to obtain an advantage key quality characteristic set;
and 4, step 4: and verifying and evaluating the obtained key quality characteristic set by using the random forest classifier trained by the test data set and the key quality characteristic set.
The term "product quality characteristic data set" in step 1 refers to a data set having a certain number of quality characteristics (characteristic attributes), a certain number of samples (sampling products), and a definite classification for each sample (sampling product) for the same object (product) as shown in fig. 2.
In step 2, the Relief F algorithm performs initial selection on the quality characteristics participating in classification, the specific algorithm flow is shown in fig. 3, and the specific method is as follows:
2-1: extracting an individual from a sample of a certain typeESearching in homogeneous and heterogeneous samples, respectivelyFindingkThe nearest neighbor samples form a homogeneous neighbor sample setFAnd heterogeneous neighbor sample setsG;
2-2: then useEAndFandGdefining feature weight by the difference of average difference of each feature of the intermediate sampleW. For any featuremTo finishnSub-sampled feature weightsW m The calculation formula is as follows:
Figure 122265DEST_PATH_IMAGE001
in the formula:
ca sample class representing a heterogeneous sample;
E[m]representing an individualEFeature(s)mA value of (d);
F j [m]is shown asjThe value of the nearest homogeneous sample feature m;
p(c) Indicates the heterogeneous sample class ascThe probability of (d);
class(E) Representing an individualEA category of (1);
P(class(E) Represents the probability that the sample class is the same as E;
G(c)j[m]is shown asjIs the nearest tocClass sample characterizationmA value of (d);
the larger the weight of the characteristic is, the larger the inter-class distance and the small intra-class distance of the sample are caused by the characteristic, and the large identification effect on the class is achieved;
2-3: eliminating the quality characteristic that the distance between classes is smaller than the distance in the classes, and dividing the initially selected data set into two parts: a training data set of product quality characteristics and a testing data set of product quality characteristics, the training and testing sets being adapted tokCross-folding authentication method in which data is divided equally by sample sizekIn part, takek-1Part is a training set, and part 1 is a testing set.
The flow of the multi-objective evolution random forest feature selection algorithm in the step 3 is shown in fig. 4. The algorithm flow consists of two parts, namely a multi-objective evolutionary algorithm, an NSGA II algorithm is selected and Matlab software is used for realizing the multi-objective evolutionary algorithm. And the second is a random forest classifier which is realized by utilizing Python. The whole algorithm implementation process is realized by Matlab and Python interaction.
Wherein, in the multi-target evolution random forest characteristic selection algorithm in the step 3, the multi-target evolution algorithm selects the NSGA II algorithm, and the individual gene coding mode in the population adopts a binary coding method to solve the problemsIs coded asCThen, thenC=(c 1 ,c 2 ,c 3 ,c 4 ,c 5 ,…,c N ) Is 1NThe vector of (2).NFor the total mass characteristic quantity, each elementc i ∈{0,1}(i=1,2,…,n) Represents the firstiOne feature is selected or not selected, if it is '1', and not selected if it is '0'. Each code corresponds to a solution, i.e. a subset of quality characteristics.
In the algorithm of the multi-objective evolution random forest feature selection algorithm in the step 3, the population genetic mode is binary tournament selection, two individuals are selected from a parent generation population each time, the two individuals are compared (using a crowding comparison operator), and more superior individuals are added into a child generation population.
In the multi-target evolution random forest feature selection algorithm in the step 3, a single-point crossing method is selected as a crossing method among individuals in the population, and the individuals are crossedC 1 =(c 11 ,c 12 ,c 13 ,c 14 ,c 15 ,…,c 1N ),C 2 =(c 21 ,c 22 ,c 23 ,c 24 ,c 25 ,…,c 2N ) By cross probabilityp c Two new individuals were generated by performing crossover operations:C 1 =(c 11 ,c 12 ,c 13 ,…,c 1e-1 ,c 2e ,…,c 2N ),C 2 =(c 21 ,c 22 ,c 23 ,…,c 2e-1 ,c 1e ,…,c 1N )。
in the algorithm of the multi-objective evolution random forest feature selection algorithm in the step 3, a multi-objective variation method is adopted for variation modes of individuals in the population, and the individuals adopt the multi-objective variation methodC=(c 1 ,c 2 ,c 3 ,c 4 ,c 5 ,…,c N ) Each gene has mutation probabilityp m Carrying out mutation operation to generate a new individual: if the in situ is '0', the mutation is '1', and if the in situ is '1', the mutation is '0'.
In the multi-objective evolution random forest feature selection algorithm in the step 3, the multi-objective evolution algorithm selects the NSGA II algorithm, and the algorithm target is set by the actual production requirement, including but not limited to
Min F(s)={f 1 (s),f 2 (s),f 3 (s)},f 1 In order to classify the error rate of the data,f 2 under the algorithm of Relief FsThe inverse of the sum of the weights of (c),f 3 is the quality characteristic subset size.
In the algorithm of the multi-objective evolution random forest feature selection algorithm in the step 3, the non-dominant sequencing basis of individuals in the population is as follows: for minimizing the multiobjective optimization problem, fornA target componentf i (s),(i=1,2,…,n) Any given two decision variablesX a ,X b If the following two conditions are satisfied, it is calledX a DominatingX b
For any onei∈1,2,…,nAll are provided withf i (X a )≤f i (X b ) If true;
exist ofi∈1,2,…,nSo thatf i (X a )≤f i (X b ) If true;
if no other decision variables can dominate one decision variable, the decision variable is called as a non-dominated solution, in a group of solutions, the Pareto level of the non-dominated solution is defined as 1, the non-dominated solution is deleted from the solution set, the Pareto level of the rest solutions is defined as 2, and by analogy, the Pareto levels of all solutions in the solution set can be obtained, and the Pareto levels are sorted as shown in fig. 5.
In the multi-objective evolution random forest feature selection algorithm in the step 3, the crowding degree sequencing basis of the same non-dominant level individual in the population is as follows: the crowdedness represents the density of individuals around a given point in a population, and is defined asi d Representing, visually, the individualiThe length of the largest rectangle surrounding the individual i but not the rest, as shown in fig. 6.
The specific method in the step 3 is as follows:
3-1: initializing the population and randomly generating a population Pt,PtEach individual being a selected set of quality characteristicss
3-2: for population PtProceeding heredity, crossover and variation to obtain population Pt’
3-3: target function pair population R set by algorithmt=Pt +Pt’Carrying out fitness evaluation on each individual, and obtaining a target value for each individualf 1 (s),f 2 (s),f 3 (s)};
3-4: using fast non-dominant sorting method to RtEach individual is ranked in a non-dominated ranking order;
3-5: selecting the individual with the minimum current non-dominant grade to enter the selected population Pt+1Up to Pt+1Until the population cannot accommodate the next level;
3-6: carrying out congestion distance sorting on the next non-dominated level individuals by using a congestion distance distribution method;
3-7: selecting the individual with the largest crowding distance to enter an election group Pt+1Until population P is completedt+1
3-8: and repeating the steps 3-2 to 3-7 until the algorithm termination condition is reached. Outputting the population individuals after the algorithm is terminated, and decoding to obtain an identified key quality characteristic set;
specifically, in step 3-3, the "fitness evaluation" specifically includes the following steps:
3-3-1: decoding each individual of R into a corresponding set of quality characteristics;
3-3-2: corresponding quality characteristic set after decoding, wherein the quantity of the quality characteristics is a fitness functionf 3 (s) value;
3-3-3: corresponding quality characteristic set after decoding and corresponding to a Relief F algorithmsIs a function of the inverse of the sum of the weights off 2 (s) A value of (d);
3-3-4: extracting quality characteristic data sets corresponding to the internal training set to train the random forest classifier respectively;
3-3-5: extracting quality characteristic data sets corresponding to the internal test set, respectively verifying and predicting the precision of the trained random forest classifier, and obtaining a fitness functionf 1 (s) The value of (c).
In the step 4, the specific verification method in verifying and evaluating the obtained key quality characteristic set by using the test data set is to extract the quality characteristic data sets corresponding to the test set and respectively verify and predict the accuracy of the trained random forest classifier, and the method also adoptskAnd (4) folding and crossing verification method.
Drawings
FIG. 1 is a block flow diagram of the method of the present invention.
Fig. 2 is a block diagram of a product quality characteristic data set.
Fig. 3 is a flowchart of the Relief F algorithm.
FIG. 4 is a flow chart of a multi-objective evolutionary random forest feature selection algorithm.
FIG. 5 is a graph of Pareto rank after non-dominated sorting.
Fig. 6 is a schematic diagram of the congestion degree ranking.
FIG. 7 is a population evolution mode diagram of a multi-objective evolution random forest feature selection algorithm.
Detailed Description
The invention provides a key quality characteristic identification method based on multi-objective evolution random forest characteristic selection, and the invention is further described in detail with reference to the attached drawings.
Step 1, acquiring multivariate quality characteristic data information in a production process through digital detection of a workshop, wherein the multivariate quality characteristic data information comprises a plurality of process parameters, product size parameters, product grade classification and other quality characteristics, and the quality characteristics are important factors influencing the overall quality level of a product, so that a product quality characteristic data set is formed.
The term "product quality characteristic data set" as shown in fig. 2 refers to a data set that has a certain number of quality characteristics (characteristic attributes), a certain number of samples (sampling products), and a definite classification for each sample (sampling product) for the same research object (product).
Step 2, performing primary selection on the quality characteristics participating in classification by using a Relief F algorithm, and dividing the data set after the primary selection into two parts: a product quality characteristic training data set and a product quality characteristic testing data set.
The "Relief F algorithm performs initial selection on the quality characteristics participating in classification", the specific algorithm flow is shown in fig. 3, and the specific method is as follows:
2-1: extracting an individual from a sample of a certain typeEFinding out the same type and different type samples respectivelykThe nearest neighbor samples form a homogeneous neighbor sample setFAnd heterogeneous neighbor sample setsG
2-2: then useEAndFandGdifference determination of average difference of each characteristic of medium sampleSemantic feature weightW. For any featuremTo finishnSub-sampled feature weightsW m The calculation formula is as follows:
Figure 467796DEST_PATH_IMAGE001
in the formula:
ca sample class representing a heterogeneous sample;
E[m]representing an individualEFeature(s)mA value of (d);
F j [m]is shown asjFeatures of a nearest neighbor homogeneous samplemA value of (d);
p(c) Indicates the heterogeneous sample class ascThe probability of (d);
class(E) Representing an individualEA category of (1);
P(class(E) Represent sample classes andEthe same probability;
G(c)j[m]is shown asjIs the nearest tocClass sample characterizationmA value of (d);
the larger the weight of the characteristic is, the larger the inter-class distance and the small intra-class distance of the sample are caused by the characteristic, and the large identification effect on the class is achieved;
2-3: eliminating the quality characteristic that the distance between classes is smaller than the distance in the classes, and dividing the initially selected data set into two parts: a training data set of product quality characteristics and a testing data set of product quality characteristics, the training and testing sets being adapted tokCross-folding authentication method in which data is divided equally by sample sizekIn part, takek-1Part is a training set, and part 1 is a testing set.
Step 3, dividing the training data set into an internal training set and an internal test set, wherein the internal training set is used for training the random forest model classifier, and the internal test set is used for evaluating the selected quality characteristic set generated by the algorithmsThe partial objective function value of (2). Then inputting a multi-target evolution random forest feature selection algorithm to establish multiple forest featuresAnd generating an initial population corresponding to the algorithm target, and setting an iterative population algebra to obtain an advantage key quality characteristic set.
The algorithm flow of the multi-target evolution random forest feature selection algorithm is shown in the attached figure 4. The algorithm flow consists of two parts, namely a multi-objective evolutionary algorithm, an NSGA II algorithm is selected and Matlab software is used for realizing the multi-objective evolutionary algorithm. And the second is a random forest classifier which is realized by utilizing Python. The whole algorithm implementation process is realized by Matlab and Python interaction. The specific method in step 3 is as follows:
3-1: initializing the population and randomly generating a population Pt,PtEach individual being a selected set of quality characteristicss. The individual gene coding mode in the population adopts a binary coding method to solvesIs coded asCThen, thenC=(c 1 ,c 2 ,c 3 , c 4 ,c 5 ,…,c N ) Is 1NThe vector of (2).NFor the total mass characteristic quantity, each elementc i ∈{0,1}(i=1,2,3,…,n) Represents the firstiOne feature is selected or not selected, if it is '1', and not selected if it is '0'. Each code corresponds to a solution, i.e. a subset of quality characteristics;
3-2: for population PtProceeding heredity, crossover and variation to obtain population Pt. The method comprises the following specific steps:
3-2-1: the genetic mode in the population is binary tournament selection, two individuals are selected from a parent generation population each time, the two individuals are compared (a crowding comparison operator is used), and more superior individuals are added into a child generation population;
3-2-2: the method for crossing individuals in the population adopts a single-point crossing method, and the individualsC 1 =(c 11 ,c 12 ,c 13 ,c 14 , c 15 ,…,c 1N ),C 2 =(c 21 ,c 22 ,c 23 ,c 24 ,c 25 ,…,c 2N ) By cross probabilityp c Two new individuals were generated by performing crossover operations:
C 1 =(c 11 ,c 12 ,c 13 ,…,c 1e-1 ,c 2e …,c 2N ),C 2 =(c 21 ,c 22 ,c 23 ,…,c 2e-1 ,c 1e …,c 1N );
3-2-3: the variation mode of individuals in the population adopts a multipoint variation method, and the individualsC=(c 1 ,c 2 ,c 3 ,c 4 ,c 5 ,…,c N ) Each gene has mutation probabilityp m Carrying out mutation operation to generate a new individual: if the original position is '0', the mutation is '1', and if the original position is '1', the mutation is '0';
3-3: target function pair population R set by algorithmt=Pt +Pt’The algorithm target is set by actual production requirements, including but not limited toMin F(s)={f 1 (s),f 2 (s),f 3 (s)},f 1 In order to classify the error rate of the data,f 2 under the algorithm of Relief FsThe inverse of the sum of the weights of (c),f 3 for the quality characteristic subset size, fitness evaluation is performed on each individual, and each individual obtains a target valuef 1 (s),f 2 (s),f 3 (s) The method comprises the following specific steps:
3-3-1: will be provided withREach individual is decoded into a corresponding set of quality characteristics;
3-3-2: set of corresponding quality characteristics after decoding, number of quality characteristicsQuantity as a fitness functionf 3 (s)A value of (d);
3-3-3: corresponding quality characteristic set after decoding and corresponding to a Relief F algorithmsIs a function of the inverse of the sum of the weights off 2 (s) A value of (d);
3-3-4: extracting quality characteristic data sets corresponding to the internal training set to train the random forest classifier respectively;
3-3-5: extracting quality characteristic data sets corresponding to the internal test set, respectively verifying and predicting the precision of the trained random forest classifier, and obtaining a fitness functionf 1 (s) A value of (d);
3-4: using fast non-dominated sorting method pairsR t Each individual is subjected to non-dominated ranking, and the specific steps are as follows:
the non-dominant ranking of individuals in the population is based on: for minimizing the multiobjective optimization problem, fornA target componentf i (s),(i=1,2,…,n) Any given two decision variablesX a ,X b If the following two conditions are satisfied, it is calledX a DominatingX b
1, for anyi∈1,2,…,nAll are provided withf i (X a )≤f i (X b ) If true;
2, existence ofi∈1,2,…,nSo thatf i (X a )≤f i (X b ) If true;
if one decision variable does not have other decision variables capable of dominating the decision variable, the decision variable is called as a non-dominated solution, in a group of solutions, the Pareto level of the non-dominated solution is defined as 1, the non-dominated solution is deleted from the solution set, the Pareto level of the rest solutions is defined as 2, and by analogy, the Pareto levels of all solutions in the solution set can be obtained, and the Pareto levels are ranked as shown in fig. 5;
3-5: selecting the individuals with the smallest current non-dominant gradeEntry populationP t+1 Up toP t+1 Until the population cannot accommodate the next level;
3-6: and carrying out congestion distance sequencing on the next non-dominated level individual by using a congestion distance distribution method, wherein the specific method comprises the following steps:
the crowding degree sequencing basis of the individuals with the same non-dominant grade in the population is as follows: the crowdedness represents the density of individuals around a given point in a population, and is defined asi d Representing, visually, the individualiThe surroundings include the individualiBut does not include the length of the largest rectangle of the rest of the individuals, as shown in fig. 6;
3-7: selecting the individual with the largest crowding distance to enter the selected populationP t+1 Until population is completedP t+1 . FIG. 7 shows the evolution process of the population, including steps 3-4 to 3-7;
3-8: and repeating the steps 3-2 to 3-7 until the algorithm termination condition is reached. And outputting the population individuals after the algorithm is terminated, and decoding to obtain the identified key quality characteristic set.
And 4, verifying and evaluating the obtained key quality characteristic set by using the random forest classifier trained by the test data set and the key quality characteristic set, wherein the specific verification method is to extract the quality characteristic data sets corresponding to the test set and verify and predict the precision of the trained random forest classifier respectively, and the same method is adoptedkAnd (4) folding and crossing verification method.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the applicant has described the present invention in detail, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention and shall be covered by the claims of the present invention.

Claims (8)

1. A key quality characteristic identification method based on multi-objective evolution random forest characteristic selection is characterized by comprising the following steps: step 1: acquiring multivariate quality characteristic data information in a production process through digital detection of a workshop, wherein the multivariate quality characteristic data information comprises a plurality of process parameters, product size parameters, product grade classification and other quality characteristics which are important factors influencing the overall quality level of a product, so that a product quality characteristic data set is formed; step 2: performing primary selection on the quality characteristics participating in classification by using a Relief (recursive F) algorithm to obtain the algorithm weight of the quality characteristics, eliminating the quality characteristics of which the inter-class distance is less than the intra-class distance, and dividing the initially selected data set into two parts: a product quality characteristic training data set and a product quality characteristic testing data set; and step 3: inputting the training data set into a multi-objective evolution random forest feature selection algorithm, establishing a plurality of corresponding algorithm targets, generating an initial population, and setting an iterative population algebra to obtain an advantage key quality feature set; and 4, step 4: and verifying and evaluating the obtained key quality characteristic set by using the test data set.
2. The method for identifying key quality characteristics based on multi-objective evolution random forest characteristic selection as claimed in claim 1, is characterized in that: the "product quality characteristic data set" in step 1 refers to a data set having a certain number of quality characteristics (characteristic attributes), a certain number of samples (sampling products), and a definite classification for each sample (sampling products) for the same study object (product).
3. The method for identifying key quality characteristics based on multi-objective evolution random forest characteristic selection as claimed in claim 1, is characterized in that: the "Relief F" algorithm in step 2 is an extension of the Relief algorithm, and the specific process is as follows: extracting an individual from a sample of a certain typeEFinding out the same type and different type samples respectivelykThe nearest neighbor samples form a homogeneous neighbor sample setFAnd heterogeneous neighbor sample setsTThen, further withEAndFandGdefining feature weight by the difference of average difference of each feature of the intermediate sampleW
4. The key quality characteristic identification method based on multi-objective evolution random forest characteristic selection as claimed in claim 1The method is characterized in that: step 2 the training set and the test set are adoptedkCross-folding authentication method in which data is divided equally by sample sizekIn part, takek-1Part is a training set, and part 1 is a testing set.
5. The method for identifying key quality characteristics based on multi-objective evolution random forest characteristic selection as claimed in claim 1, is characterized in that: the multi-objective evolutionary random forest feature selection algorithm comprises two parts, namely a multi-objective evolutionary algorithm, an NSGA II algorithm is selected and realized by Matlab software; and the random forest classifier is realized by utilizing Python, and the whole algorithm realization process is realized by Matlab and Python interaction.
6. The method for identifying key quality characteristics based on multi-objective evolution random forest characteristic selection as claimed in claim 1, is characterized in that: step 3, the multi-target evolution random forest feature selection algorithm is used for selecting the quality feature set of the population individualssThe objective function is given by production practice requirements, including but not limited to:
MinF(s)={f 1 (s),f 2 (s),f 3 (s)},f 1 in order to classify the error rate of the data,f 2 under the algorithm of Relief FsThe inverse of the sum of the weights of (c),f 3 is the feature subset size.
7. The method for identifying key quality characteristics based on multi-objective evolution random forest characteristic selection as claimed in claim 1, is characterized in that: in the step 3, the training set in the multi-objective evolution random forest feature selection algorithm is divided into an internal training set and an internal test set, wherein the internal training set is used for training a random forest model classifier, and the internal test set is used for evaluating the selected quality characteristic set generated by the algorithmsThe partial objective function value of (2).
8. A key quality characteristic identification method based on multi-objective evolution random forest characteristic selection is characterized by comprising the following steps: step 4, the verification and evaluation method adopts the stepskFold-cross validation method, system and methodkA test data set is co-processedkSecond verification, getkTarget average for secondary verification.
CN202110752786.2A 2021-07-03 2021-07-03 Key quality characteristic identification method based on multi-objective evolution random forest characteristic selection Expired - Fee Related CN113268936B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110752786.2A CN113268936B (en) 2021-07-03 2021-07-03 Key quality characteristic identification method based on multi-objective evolution random forest characteristic selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110752786.2A CN113268936B (en) 2021-07-03 2021-07-03 Key quality characteristic identification method based on multi-objective evolution random forest characteristic selection

Publications (2)

Publication Number Publication Date
CN113268936A true CN113268936A (en) 2021-08-17
CN113268936B CN113268936B (en) 2022-07-19

Family

ID=77236356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110752786.2A Expired - Fee Related CN113268936B (en) 2021-07-03 2021-07-03 Key quality characteristic identification method based on multi-objective evolution random forest characteristic selection

Country Status (1)

Country Link
CN (1) CN113268936B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845796A (en) * 2016-12-28 2017-06-13 中南大学 One kind is hydrocracked flow product quality on-line prediction method
JP2017161991A (en) * 2016-03-07 2017-09-14 三菱重工業株式会社 Quality evaluation system, quality evaluation method and program
CN109523086A (en) * 2018-11-26 2019-03-26 浙江蓝卓工业互联网信息技术有限公司 The qualitative forecasting method and system of chemical products based on random forest
CN110288199A (en) * 2019-05-29 2019-09-27 北京航空航天大学 The method of product quality forecast
CN110456756A (en) * 2019-03-25 2019-11-15 中南大学 A method of suitable for continuous production process overall situation operation conditions online evaluation
CN110582091A (en) * 2018-06-11 2019-12-17 中国移动通信集团浙江有限公司 method and apparatus for locating wireless quality problems
CN112418538A (en) * 2020-11-30 2021-02-26 武汉科技大学 Continuous casting billet inclusion prediction method based on random forest classification

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017161991A (en) * 2016-03-07 2017-09-14 三菱重工業株式会社 Quality evaluation system, quality evaluation method and program
CN106845796A (en) * 2016-12-28 2017-06-13 中南大学 One kind is hydrocracked flow product quality on-line prediction method
CN110582091A (en) * 2018-06-11 2019-12-17 中国移动通信集团浙江有限公司 method and apparatus for locating wireless quality problems
CN109523086A (en) * 2018-11-26 2019-03-26 浙江蓝卓工业互联网信息技术有限公司 The qualitative forecasting method and system of chemical products based on random forest
CN110456756A (en) * 2019-03-25 2019-11-15 中南大学 A method of suitable for continuous production process overall situation operation conditions online evaluation
CN110288199A (en) * 2019-05-29 2019-09-27 北京航空航天大学 The method of product quality forecast
CN112418538A (en) * 2020-11-30 2021-02-26 武汉科技大学 Continuous casting billet inclusion prediction method based on random forest classification

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
QIAO SHI等: "The Application of Tobacco Product Quality Prediction Using Ensemble Learning Method", 《2019 IEEE 4TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC)》 *
乔佩蕊: "基于改进LASSO-RF的复杂产品质量预测研究", 《CNKI优秀硕士学位论文全文库 经济与管理科学辑》 *
伍薪烨: "复杂装备装配质量多特征决策与作业优化调度方法研究", 《CNKI优秀硕士学位论文全文库 工程科技Ⅰ辑》 *

Also Published As

Publication number Publication date
CN113268936B (en) 2022-07-19

Similar Documents

Publication Publication Date Title
CN110213222B (en) Network intrusion detection method based on machine learning
CN108921604B (en) Advertisement click rate prediction method based on cost-sensitive classifier integration
CN108898479B (en) Credit evaluation model construction method and device
CN111414849B (en) Face recognition method based on evolution convolutional neural network
CN112232413B (en) High-dimensional data feature selection method based on graph neural network and spectral clustering
CN108681742B (en) Analysis method for analyzing sensitivity of driver driving behavior to vehicle energy consumption
CN106446602A (en) Prediction method and system for RNA binding sites in protein molecules
CN107016416B (en) Data classification prediction method based on neighborhood rough set and PCA fusion
CN110222838B (en) Document sorting method and device, electronic equipment and storage medium
CN112633337A (en) Unbalanced data processing method based on clustering and boundary points
CN112906890A (en) User attribute feature selection method based on mutual information and improved genetic algorithm
CN101923604A (en) Classification method for weighted KNN oncogene expression profiles based on neighborhood rough set
CN106951728B (en) Tumor key gene identification method based on particle swarm optimization and scoring criterion
CN115481841A (en) Material demand prediction method based on feature extraction and improved random forest
CN113268936B (en) Key quality characteristic identification method based on multi-objective evolution random forest characteristic selection
CN115481844A (en) Distribution network material demand prediction system based on feature extraction and improved SVR model
CN117272025A (en) High-dimensional data feature selection method based on fuzzy competition particle swarm multi-objective optimization
CN108305174B (en) Resource processing method, device, storage medium and computer equipment
CN113657441A (en) Classification algorithm based on weighted Pearson correlation coefficient and combined with feature screening
KR100727555B1 (en) Creating method for decision tree using time-weighted entropy and recording medium thereof
Mahfuz et al. Clustering heterogeneous categorical data using enhanced mini batch K-means with entropy distance measure
CN113269217A (en) Radar target classification method based on Fisher criterion
CN112801197A (en) K-means method based on user data distribution
CN115017125B (en) Data processing method and device for improving KNN method
Irawan et al. Accounts Receivable Seamless Prediction for Companies by Using Multiclass Data Mining Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220719