CN112270614A

CN112270614A - Design resource big data modeling method for manufacturing enterprise whole system optimization design

Info

Publication number: CN112270614A
Application number: CN202011049729.XA
Authority: CN
Inventors: 任鸿儒; 肖毅; 鲁仁全; 徐雍; 周琪
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2021-01-26
Anticipated expiration: 2040-09-29
Also published as: CN112270614B

Abstract

The invention discloses a design resource big data modeling method for the whole system optimization design of a manufacturing enterprise, which constructs an accurate and effective design resource big data model for the whole system optimization design of the manufacturing enterprise by a KNN (K nearest neighbor-logistic regression) combined model algorithm after collecting, cleaning and characteristic processing big data of main bodies such as design, manufacture, products and users in the manufacturing enterprise, thereby prejudging related businesses in the manufacturing enterprise, optimizing data related to the main bodies such as design, manufacture, products and users, and solving the problems that the existing design resource data model only considers data of a single design department and does not integrate and summarize all data of the design department and the single data model can not accurately predict and classify results.

Description

Design resource big data modeling method for manufacturing enterprise whole system optimization design

Technical Field

The invention relates to the technical field of manufacturing industry and big data, in particular to a design resource big data modeling method for the whole system optimization design of a manufacturing enterprise.

Background

The industrial big data is an important strategic resource for transformation and upgrading of the manufacturing industry in China, and in order to fully utilize mass data generated in the design, manufacturing, management and service processes of manufacturing enterprises, a data space construction method and technology of the manufacturing enterprises become important basic leading-edge technologies. The manufacturing enterprise data space is a space formed by full-system and full-value chain data generated in business domains such as design, manufacture, management and service, has the characteristics of multi-mode, cross-scale, high flux, strong association, re-mechanism and the like besides the characteristics of large data 4V (large scale, fast change, miscellaneous type and low quality), and causes the problem of difficult modeling of large manufacturing data.

Most of the current manufacturing big data modeling methods are aimed at modeling in a single business field, correlation influences of data of other business fields are not fully considered in the modeling process, a modeling method which runs through multiple business fields and the whole life cycle of a product is lacked, and core problems of the business fields such as design resources, management processes, manufacturing processes, product services and the like cannot be comprehensively and effectively described in the full-process and full-system angle.

The product design is the first link of the life cycle of the product, on one hand, the existing design resource data model only considers single design department data, does not integrate and summarize all the design department data, and the data model adopts a single algorithm, so that the situation that the classification result cannot be accurately predicted may exist.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides a design resource big data modeling method for the whole-system optimization design of a manufacturing enterprise, realizes the highly ordered display of the relation of design resource big data, realizes the whole-system full-value chain modeling of the manufacturing big data by matching with a full-process manufacturing process, a full-run-through management process and a full-period product service business model, and solves the problem that the traditional relational database model cannot reasonably and effectively model the big data of the manufacturing enterprise.

In order to achieve the purpose, the technical scheme provided by the invention is as follows:

a design resource big data modeling method for manufacturing enterprise whole system optimization design comprises the following steps:

s1, collecting multi-source heterogeneous design resource big data, and converting the big data into a structured data source with a uniform format;

s2, cleaning the collected data, and removing the data which do not meet the requirements;

s3, performing feature processing on the data meeting the requirements;

s4, classifying and predicting the samples to be classified by adopting a KNN (K nearest neighbor-logistic regression) combined model algorithm, judging whether the design of a new product in a manufacturing enterprise can be completed within a specified period or not, and optimizing the data of a main body related to the design, the manufacture, the product and a user according to the predicted result.

Further, step S1 is to collect multi-source heterogeneous design resource big data, and the specific steps of converting the structured data source with the uniform format are as follows:

s1-1, identifying a data source related to the design resource main body of the manufacturing enterprise and a storage position of the data source;

s1-2, aiming at the relational database, adopting Sqoop technology to configure data connection between the relational database and the HDFS, and importing data in the relational database into the HDFS of Hadoop;

s1-3, analyzing the data file by adopting a MapReduce programming method aiming at the data in the file format and uploading the data file to the HDFS;

s1-4, integrating all the previously acquired main data in Hive based on the relational model;

and S1-5, establishing a structured subject data set.

Further, the data cleansing includes the steps of:

s2-1, preprocessing data;

s2-2, removing or complementing missing data;

s2-3, removing data with errors in content;

s2-4, removing the logically wrong data;

s2-5, removing unnecessary data;

and S2-6, performing data correlation verification.

Further, the feature processing includes the steps of:

s3-1, solving the problem of unbalance of positive and negative samples by adopting an information oversampling SMOTE method, and avoiding the problem of low prediction accuracy caused by sample unbalance existing in the subsequent KNN algorithm and logistic regression algorithm;

s3-2, selecting characteristics by a variance selection method;

and S3-3, performing dimension reduction processing on the feature matrix dimension after feature selection through a principal component analysis method.

Further, the specific process of step S3-1 is as follows:

3-1-1) for each sample x in the minority class, the formula is utilized:

solving the Euclidean distance d from the sample x to other samples y of a few types;

3-1-2) the number of the most-class samples is m, the number of the few-class samples is n, and the following steps are performed:

taking k other samples with the minimum Euclidean distance d in each sample x as the neighbor x of the sample x_k；

3-1-3) for each neighbor x_kUsing a method of random linear interpolation, at x and x_kIn the generation of a new sample x_n：

x_n＝x+ε|x_k-x|

Wherein epsilon is a random value between 0 and 1;

3-1-4) repeating the steps 3-1-3) until the number of the minority class samples and the number of the majority class samples are equal or have small difference.

Further, the specific process of step S3-3 is as follows:

3-3-1) carrying out normalization processing on the characteristics;

converting by using a linear function:

y＝(x-MinValue)/(MaxValue-MinValue)

wherein, x and y are values before and after conversion respectively, and MaxValue and MinValue are the maximum value and the minimum value of the sample respectively;

3-3-2) calculating the average value of each column of features, and then subtracting the average value of the column of features from each dimension;

3-3-3) calculating a covariance matrix of the sample features;

3-3-4) calculating an eigenvalue and an eigenvector of the covariance matrix;

3-3-5) sorting the calculated characteristic values from large to small;

3-3-6) taking out the first K eigenvectors and eigenvalues, and multiplying the initial sample matrix by an eigenvector matrix formed by the K eigenvectors to obtain an eigenvector matrix after dimension reduction;

the calculation of the value of K refers to the following formula:

the minimum value of K is found that satisfies the above equation, where λ is the eigenvalue of the covariance matrix.

Further, the step S4 is specifically:

s4-1, dividing the data after feature processing into a training set and a test set data for training and testing the model;

s4-2, training the KNN model by using the training set data, testing the KNN model by using the test set data, and calculating the class I classification error rate (classifying the majority into a few by mistake)Probability of class) ω₁；

S4-3, training the logistic regression model by using the training set data, testing the logistic regression model by using the test set data, and calculating the class I classification error rate omega₂；

S4-4, constructing a KNN proximity-logistic regression combination model based on Lagrange;

s4-5, predicting whether the design of a new product in a manufacturing enterprise can be completed within a specified period by utilizing a KNN proximity-logistic regression combination model;

and S4-6, optimizing data of the subject related to design, manufacture, product and user according to the predicted result.

Further, in order to determine whether the classification results of the KNN neighborhood algorithm and the logistic regression algorithm and the KNN neighborhood-logistic regression combination model algorithm are accurate, the step S4-1 selects a cross validation method, and divides the data after feature processing into three equal parts, which are A, B, C; a, B, C are then further divided into three groups in a cross-wise manner, the first group being a "training set: A. b; test set C ", the second set is the" training set: B. c; test set a ", the third group is the" training set: A. c; test set B ".

Further, in step S4-2, after the KNN model is trained by using the first set of training set data, the KNN model is tested by using the same set of test set data, and then the above operations are repeated by using the second and third sets of data to find the average class i classification error rate ω of the KNN model three times₁(ii) a The method comprises the following specific steps:

4-2-1) according to the euclidean distance formula:

to calculate the euclidean distance d between the first set of test set data x and the first set of training set data y;

4-2-2) sorting according to the calculated Euclidean distance d, and selecting the minimum k points, wherein the value of k is smaller than the square root of the number of samples in the training set and is an odd number;

4-2-3) determining the frequency of k points in two categories, namely 'design can be completed in a specified period' and 'design cannot be completed in the specified period', and taking the category with the highest frequency as the prediction classification of the data to be classified;

4-2-4) calculating class I classification error rate omega of KNN model algorithm corresponding to the first group of data according to classification results₁₁；

4-2-5) repeating the steps of 4-2-1) -4-2-4) twice to obtain class I classification error rate omega of the KNN model algorithm corresponding to the other two groups of data₁₂、ω₁₃Finally, the average value ω is obtained₁＝(ω₁₁+ω₁₂+ω₁₃) (ii)/3 class i classification error rate as KNN model algorithm;

and step S4-3, after training the logistic regression model with the first set of training set data, testing the logistic regression model with the same set of test set data, and repeating the above operations with the second and third sets of data to find the average class I classification error rate omega of the logistic regression model three times₂The method comprises the following steps:

4-3-1) determining a prediction function:

based on Sigmoid function:

setting the weight vector as theta ═ theta₀,θ₁,θ₂,...,θ_n)，

Using the first training set data as input vector x ═ 1, x₁,x₂,...,x_n) (ii) a Let z (x) be θ^Tx, obtaining a prediction function of the logistic regression algorithm:

recording whether the product design is finished in a specified period as y, recording y as 1 when the product design is finished on time, and recording y as 0 when the product design is not finished on time;

it h_θ(x) Expressed that the input value is x, the weight parameter is thetaIn the case of (1), y is a probability of 1;

4-3-2) determining a weight vector θ:

for a given data set, a maximum likelihood estimation method can be used to estimate the weight vector θ:

likelihood function:

its log-likelihood function:

at this time introduce

Then the gradient is converted into a gradient descent task to obtain a minimum value, and the later half part is an added regularization item, so that the overfitting problem of the model is solved;

in the formula, xi is a punishment item strength value, a group of punishment item strengths xi with different values is selected, such as [0.01, 0.1, 1, 10 and 100], each value is circulated, 5 recall rates recall of each value after 5 times of cross validation are obtained, so that the recall rate recall corresponding to each punishment strength is obtained, and xi corresponding to the recall rate recall with the highest value is selected as the punishment item strength value;

for the solution of the theta value, firstly, the partial derivative of each J (theta) to theta is solved, then a certain theta value is given, the partial derivative is continuously subtracted by the theta value to multiply the step length, and then a new theta is calculated until the value of the theta value is changed to a value that the difference value of the J (theta) between two iterations is small enough, namely the value of the J (theta) calculated by two iterations is basically not changed, which indicates that the J (theta) reaches the local minimum value at the moment; then calculating each theta value, substituting into a logistic regression equation h_θ(x) Finally, a prediction function is obtained;

wherein the partial derivative of J (θ) with respect to θ is:

normalized theta_jThe iterative formula of (a) is:

4-3-3) input the first set of test set data into a predictor function h of a logistic regression algorithm trained on the first set of training set data_θ(x) Classifying the test set data according to the obtained probability value;

4-3-4) calculating class I classification error rate omega of the logistic regression model algorithm corresponding to the first group of data according to the classification result₂₁；

4-3-5) repeating the steps of 4-3-1) -4-3-4) twice to obtain class I classification error rate omega of the logistic regression model algorithm corresponding to the other two groups of data₂₂、ω₂₃Finally, the average value ω is obtained₂＝(ω₂₁+ω₂₂+ω₂₃) And/3 as class I classification error rate of the logistic regression model algorithm.

Further, the specific process of constructing the KNN neighborhood-logistic regression combination model based on Lagrange in step S4-4 is as follows:

4-4-1) determination of the prediction function:

by p_iThe predicted values of the combination model representing the ith sample include:

p_i＝α₁k_i+α₂l_i

wherein k is_i、l_iRespectively representing the prediction probability values of the KNN and logistic regression models to the ith sample, alpha₁、α₂Respectively represent the weight values of KNN and logistic regression models, and alpha₁+α₂＝1；

4-4-2) construct the Lagrange loss function:

wherein ω is₁、ω₂Regarding the class I classification error rate of the submodel obtained in the steps (2) and (3), taking the error rate as a punishment parameter of the submodel, wherein lambda is Lagrange operator;

4-4-3) solving for alpha₁，α₂Optimum value of (2):

due to L (alpha)₁，α₂λ) is a convex function, there is a minimum value, and the minimum point is α₁，α₂The optimum value of (d);

alpha can be obtained by solving the above formula by python₁，α₂The optimum value of (c).

Compared with the prior art, the principle and the advantages of the scheme are as follows:

according to the scheme, after the big data of the main bodies such as design, manufacture, products and users in a manufacturing enterprise are collected, cleaned and subjected to characteristic processing, an accurate and effective design resource big data model facing the whole system optimization design of the manufacturing enterprise is constructed by using a KNN adjacent-logistic regression combined model algorithm, so that the related business in the manufacturing enterprise is prejudged, the data of the main bodies such as design, manufacture, products and users are optimized, the problems that the existing design resource data model only considers data of a single design department and does not integrate and summarize all data of the design department are solved, and the problem that the classification result cannot be accurately predicted by the single data model is solved.

In addition, the scheme is matched with a full-process manufacturing process, a full-through management process and a full-period product service business model to realize the full-value chain modeling of the whole system of the large data, and can further solve the problem that the traditional relational database model cannot reasonably and effectively model the large data of a manufacturing enterprise.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the services required for the embodiments or the technical solutions in the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a design resource big data modeling method for manufacturing enterprise whole system optimization design according to the present invention;

FIG. 2 is a flow chart of data cleaning in the design resource big data modeling method for manufacturing enterprise whole system optimization design according to the invention.

Detailed Description

The invention will be further illustrated with reference to specific examples:

as shown in fig. 1, the design resource big data modeling method for manufacturing enterprise system wide optimization design according to this embodiment includes the following steps:

s1, data acquisition:

and S1-5, establishing a structured subject data set.

The collected multisource heterogeneous design resource big data can be converted into a structured data set with a uniform format through the steps.

As shown in fig. 2, the collected data is cleaned to remove data which does not meet the requirements; the method comprises the following specific steps:

s2-1, preprocessing data: viewing metadata, including all information describing data such as field interpretation, data source, code table and the like, so that the data itself can be intuitively understood, some problems can be found preliminarily, and preparation is made for later processing;

s2-2, removing or completing missing data: determining missing range of each data field, directly discarding data of missing key data fields, and completely filling non-key data, wherein the method comprises the steps of speculating filling missing values by using business knowledge or experience, filling missing values by using calculation results (mean, median, mode and the like) of the same index, and filling missing values by using calculation results of different indexes;

s2-3, removing data with errors in content and ensuring the correctness of the data;

s2-4, removing logically wrong data: discarding the data with logic errors according to the service rule to ensure that the data logic is correct;

s2-5, removing unnecessary data: removing data irrelevant to the business rules, and ensuring the relevance of the data;

s2-6, carrying out data correlation verification: for data from multiple sources, correlation verification is necessary, and if not, the data needs to be cleaned.

And S3, performing feature processing on the data meeting the requirements:

s3-1, handling the class imbalance problem: when there is a serious class imbalance problem in the data, the predicted result is often biased to the class with a large number, which affects the accuracy of the model. A common method for dealing with the class imbalance problem is a random undersampling method, which reduces the size of the majority classes by randomly removing some majority class samples, but this may lose important data, and the sampled data cannot represent all the data, resulting in inaccurate classification results. There is also a random oversampling method in which the scale of the minority class is increased by randomly copying the minority class samples, and although this method does not cause information loss and performs better than the undersampling method, the probability of overfitting is increased.

In the embodiment, under the condition that important data are not lost and overfitting is relieved, the information oversampling SMOTE method is selected to solve the class imbalance problem. The specific analysis and calculation process is as follows:

3-1-1) for each sample x in the minority class, the formula is utilized:

x_n＝x+ε|x_k-x|

Wherein epsilon is a random value between 0 and 1;

S3-2, selecting features through a variance selection method, firstly calculating the variance value of each feature, preferentially eliminating the feature with the variance value of 0, and then selecting the feature with the variance value larger than the threshold value according to the threshold value.

S3-3, after the feature selection is completed, the problems of large calculated amount and long training time of the model can be caused due to overlarge feature matrix, and the dimensionality of the feature matrix after the feature selection is subjected to dimensionality reduction processing through a Principal Component Analysis (PCA). The analysis and calculation process is as follows:

3-3-1) carrying out normalization processing on the characteristics;

converting by using a linear function:

y＝(x-MinValue)/(MaxValue-MinValue)

3-3-3) calculating a covariance matrix of the sample features;

3-3-4) calculating an eigenvalue and an eigenvector of the covariance matrix;

3-3-5) sorting the calculated characteristic values from large to small;

the calculation of the value of K refers to the following formula:

S4, in order to avoid the situation that a single algorithm model may not accurately predict the classification result, the present embodiment selects to use the KNN neighbor-logistic regression model algorithm to classify and predict the samples to be classified, so as to determine whether the design of a new product in a manufacturing enterprise can be completed within a specified period, and optimize the data of the main bodies related to design, manufacturing, product, user, and the like according to the prediction result.

The method comprises the following specific steps:

s4-1, determining training set and test set data

In order to determine whether the classification results of the KNN proximity algorithm, the logistic regression algorithm and the KNN proximity-logistic regression combination model algorithm are accurate, a cross-validation method is selected, and data after feature processing are divided into three equal parts which are A, B, C respectively; a, B, C are then further divided into three groups in a cross-wise manner, the first group being a "training set: A. b; test set C ", the second set is the" training set: B. c; test set a ", the third group is the" training set: A. c; test set B ";

s4-2, after the KNN model is trained by using the first group of training set data, the KNN model is tested by using the same group of test set data, and then the operation is repeated by using the second group of data and the third group of data to obtain the average class I classification error rate omega of the KNN model for three times₁(ii) a The method comprises the following specific steps:

4-2-1) according to the euclidean distance formula:

s4-3, training the logistic regression model with the first set of training set data, testing the logistic regression model with the same set of test set data, repeating the above operations with the second and third sets of data, and finding the average class I of the logistic regression model three timesClassification error rate omega₂The method comprises the following steps:

4-3-1) determining a prediction function:

based on Sigmoid function:

setting the weight vector as theta ═ theta₀,θ₁,θ₂,...,θ_n)，

it h_θ(x) The probability that y is 1 when the input value is x and the weight parameter is theta is shown;

4-3-2) determining a weight vector θ:

likelihood function:

its log-likelihood function:

at this time introduce

in the formula, xi is a penalty term strength value, a group of penalty term strengths xi with different values is selected, such as [0.01, 0.1, 1, 10, 100], each value is circulated, 5 recalls (recall rates) of each value after 5 times of cross validation are obtained, so that the recalls corresponding to each penalty strength can be obtained, and xi corresponding to the recalls with the highest value is selected as the penalty term strength value;

for the solution of the theta value, the partial derivative of each J (theta) to theta is firstly solved, then a certain theta value is given, the partial derivative is continuously subtracted to multiply the step length, and then a new theta is calculated until the value of the theta value is changed to a value which enables the difference between two iterations of the J (theta) to be small enough, namely the value of the J (theta) calculated by two iterations is basically not changed, which indicates that the J (theta) reaches a local minimum value at the moment. Then calculating each theta value, substituting into a logistic regression equation h_θ(x) And finally obtaining a prediction function.

Wherein the partial derivative of J (θ) with respect to θ is:

normalized theta_jThe iterative formula of (a) is:

4-3-5) repeating the steps of 4-3-1) -4-3-4) twice to obtain the class I of the logistic regression model algorithm corresponding to the other two groups of dataError-like rate omega₂₂、ω₂₃Finally, the average value ω is obtained₂＝(ω₂₁+ω₂₂+ω₂₃) (ii)/3 class I classification error rate as logistic regression model algorithm;

s4-4, constructing a KNN proximity-logistic regression combination model:

4-4-1) determination of the prediction function:

p_i＝α₁k_i+α₂l_i

4-4-2) construct the Lagrange loss function:

4-4-3) solving for alpha₁，α₂Optimum value of (2):

S4-5, service prediction:

respectively inputting the data of the sample to be classified into a KNN model and a logistic regression modelObtaining respective prediction probability values k and l, and then using the formula p ═ alpha₁k+α₂l, obtaining a predicted value of the combined model, and judging whether the design of a new product can be completed in a specified period according to the value;

s4-6, optimizing design resources, and optimizing data related to main bodies such as design, manufacture, products, users and the like according to the prejudged result, wherein the steps are as follows:

4-6-1) when the design of the new product can be completed in the specified period as the result of the prejudgment, the main data with smaller weight theta in the logistic regression algorithm can be degraded properly, for example, when the weight theta of the "qualification of the designer" is smaller, the high-level engineer can be changed into the first-level engineer and the middle-level engineer, so as to save the labor cost.

4-6-2) when the design of the new product cannot be completed in a specified period according to the prejudgment result, properly upgrading the main data with a larger weight theta in the logistic regression algorithm, for example, when the weight theta of the 'processing equipment quality' is larger, processing equipment with better quality can be selected to process the product.

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that variations based on the shape and principle of the present invention should be covered within the scope of the present invention.

Claims

1. A design resource big data modeling method for manufacturing enterprise whole system optimization design is characterized by comprising the following steps:

s3, performing feature processing on the data meeting the requirements;

2. The modeling method for the design resource big data of the manufacturing enterprise whole system optimization design according to claim 1, wherein the step S1 is to collect the multi-source heterogeneous design resource big data, and the specific steps of converting the structured data source with the uniform format are as follows:

and S1-5, establishing a structured subject data set.

3. The design resource big data modeling method for manufacturing enterprise system-wide optimization design according to claim 1, wherein the data cleaning comprises the steps of:

s2-1, preprocessing data;

s2-2, removing or complementing missing data;

s2-3, removing data with errors in content;

s2-4, removing the logically wrong data;

s2-5, removing unnecessary data;

and S2-6, performing data correlation verification.

4. The design resource big data modeling method for manufacturing enterprise system-wide optimization design according to claim 1, wherein the feature processing comprises the steps of:

s3-2, selecting characteristics by a variance selection method;

5. The modeling method for design resource big data of manufacturing enterprise system-wide optimization design according to claim 4, wherein the specific process of step S3-1 is as follows:

3-1-1) for each sample x in the minority class, the formula is utilized:

taking the k other samples with the minimum Euclidean distance d in each sample x as the neighbor chi of the sample x_k；

x_n＝x+ε|x_k-x|

Wherein epsilon is a random value between 0 and 1;

6. The modeling method for design resource big data of manufacturing enterprise system-wide optimization design according to claim 4, wherein the specific process of step S3-3 is as follows:

3-3-1) carrying out normalization processing on the characteristics;

converting by using a linear function:

y＝(x-MinValue)/(MaxValue-MinValue)

3-3-3) calculating a covariance matrix of the sample features;

3-3-4) calculating an eigenvalue and an eigenvector of the covariance matrix;

3-3-5) sorting the calculated characteristic values from large to small;

the calculation of the value of K refers to the following formula:

7. The design resource big data modeling method for manufacturing enterprise system-wide optimization design according to claim 1, wherein the step S4 specifically includes:

s4-2, after the KNN model is trained by the training set data, the KNN model is tested by the test set data, and the I-type classification of the KNN model is calculatedError-like rate omega₁；

8. The modeling method for design resource big data facing manufacturing enterprise whole system optimization design according to claim 7, wherein in step S4-1, in order to determine whether the classification results of the KNN neighborhood algorithm and the logistic regression algorithm and the KNN neighborhood-logistic regression combination model algorithm are accurate, a cross validation method is selected, and the data after feature processing is divided into three equal parts, respectively A, B, C; a, B, C are then further divided into three groups in a cross-wise manner, the first group being a "training set: A. b; test set C ", the second set is the" training set: B. c; test set a ", the third group is the" training set: A. c; test set B ".

9. The method for modeling design resource big data facing manufacturing enterprise whole system optimization design according to claim 8, wherein the step S4-2 is performed by training the KNN model with a first set of training set data, testing the KNN model with the same set of test set data, and repeating the above operations with a second and a third sets of data to find the average class i classification error rate ω of the KNN model three times₁(ii) a The method comprises the following specific steps:

4-2-1) according to the euclidean distance formula:

4-3-1) determining a prediction function:

based on Sigmoid function:

setting the weight vector as theta ═ theta₀,θ₁,θ₂,...,θ_n)，

4-3-2) determining a weight vector θ:

likelihood function:

its log-likelihood function:

at this time introduce

for solving theta values, the partial derivative of each J (theta) to theta is firstly solved, then a certain theta value is given, the partial derivative is continuously subtracted to be multiplied by a step length, and thenCalculating new theta until the value of theta changes to a value which enables the difference value of J (theta) between two iterations to be small enough, namely the value of J (theta) calculated by two iterations basically does not change, and the value of J (theta) reaches a local minimum value at the moment; then calculating each theta value, substituting into a logistic regression equation h_θ(x) Finally, a prediction function is obtained;

wherein the partial derivative of J (θ) with respect to θ is:

normalized theta_jThe iterative formula of (a) is:

10. The design resource big data modeling method for manufacturing enterprise whole system optimization design according to claim 9, wherein the specific process of step S4-4 based on Lagrange to construct the KNN neighborhood-logistic regression combination model is as follows:

4-4-1) determination of the prediction function:

by p_iRepresenting the combination of the ith sampleThe predicted values of the model are as follows:

p_i＝α₁k_i+α₂l_i

4-4-2) construct the Lagrange loss function:

4-4-3) solving for alpha₁，α₂Optimum value of (2):