CN112270614B

CN112270614B - Design resource big data modeling method for manufacturing enterprise full-system optimization design

Info

Publication number: CN112270614B
Application number: CN202011049729.XA
Authority: CN
Inventors: 任鸿儒; 肖毅; 鲁仁全; 徐雍; 周琪
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2024-05-10
Anticipated expiration: 2040-09-29
Also published as: CN112270614A

Abstract

The invention discloses a design resource big data modeling method for the whole system optimization design of a manufacturing enterprise, which is used for constructing an accurate and effective design resource big data model for the whole system optimization design of the manufacturing enterprise by a KNN adjacent-logistic regression combination model algorithm after big data of the main bodies such as design, manufacture, products, users and the like in the manufacturing enterprise are collected, cleaned and subjected to characteristic processing, so that related business in the manufacturing enterprise is prejudged, and meanwhile, data related to the main bodies such as design, manufacture, products, users and the like are optimized, and the problem that the existing design resource data model only considers single design department data and does not integrate and summarize all the design department data is solved, and the problem that the classification result cannot be accurately predicted by the single data model is solved.

Description

Design resource big data modeling method for manufacturing enterprise full-system optimization design

Technical Field

The invention relates to the technical field of manufacturing industry and big data, in particular to a design resource big data modeling method for the whole system optimization design of manufacturing enterprises.

Background

Industrial big data is an important strategic resource for the conversion and upgrading of the manufacturing industry in China, and in order to fully utilize mass data generated in the process of designing, manufacturing, managing and servicing of manufacturing enterprises, the method and the technology for constructing the data space of the manufacturing enterprises become an important basic front-end technology. The manufacturing enterprise data space is a space formed by full-system and full-value chain data generated in business domains such as design, manufacturing, management and service, has the characteristics of large data 4V (large scale, rapid change, type impurity, low quality) and the characteristics of multi-mode, cross-scale, high flux, strong correlation, heavy mechanism and the like, and causes the problem of difficult modeling of manufacturing large data.

The current modeling method for manufacturing big data is mostly aimed at modeling in a single service field, the associated influence of data in other service fields is not fully considered in the modeling process, the modeling method for penetrating through multiple service fields and the whole life cycle of a product is lacking, and the core problems of the service fields such as design resources, management flows, manufacturing processes, product services and the like cannot be comprehensively and effectively characterized in a whole-flow whole-system view.

The product design is the primary link of the product life cycle, the existing design resource data model only considers single design department data on one hand, all the design department data are not integrated and summarized, the algorithm adopted by the data model is single, and the situation that the classification result cannot be accurately predicted possibly exists.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides a design resource big data modeling method for the whole system optimization design of a manufacturing enterprise, realizes the highly ordered display of the relation of the design resource big data, and realizes the whole system full value chain modeling of the manufacturing big data together with a business model of a whole process manufacturing process, a whole through management process and a whole period product service, thereby solving the problem that the traditional relational database model cannot reasonably and effectively model the big data of the manufacturing enterprise.

In order to achieve the above purpose, the technical scheme provided by the invention is as follows:

a design resource big data modeling method for manufacturing enterprise full system optimization design comprises the following steps:

s1, acquiring multi-source heterogeneous design resource big data, and converting the multi-source heterogeneous design resource big data into a structured data source with a uniform format;

S2, cleaning the collected data to remove data which does not meet the requirements;

s3, carrying out feature processing on the data meeting the requirements;

s4, carrying out classification prediction on the sample to be classified by adopting a KNN proximity-logistic regression combined model algorithm, so as to judge whether the design of a new product in a manufacturing enterprise can be completed within a specified period, and optimizing the data of a main body related to the design, the manufacture, the product and the user according to the prediction result.

Further, the step S1 collects big data of the multi-source heterogeneous design resource, and the specific steps of converting the structured data source in a unified format are as follows:

s1-1, identifying a data source related to a manufacturing enterprise design resource main body and a storage position of the data source;

S1-2, aiming at a relational database, configuring data connection between the relational database and an HDFS by adopting an Sqoop technology, and importing data in the relational database into the HDFS of the Hadoop;

s1-3, analyzing a data file by adopting a MapReduce programming method aiming at data in a file format, and uploading the data file to an HDFS;

s1-4, integrating all the main body data acquired before in Hive based on a relational model;

s1-5, building a structured main body data set.

Further, the data cleansing includes the steps of:

S2-1, preprocessing data;

s2-2, removing or complementing missing data;

S2-3, removing data with errors in the content;

S2-4, removing data with logic errors;

s2-5, removing unnecessary data;

s2-6, verifying data relevance.

Further, the feature processing includes the steps of:

S3-1, solving the problem of unbalanced positive and negative samples by adopting an information oversampling SMOTE method, and avoiding the problem of low prediction accuracy caused by unbalanced samples in a subsequent KNN algorithm and a logistic regression algorithm;

S3-2, performing feature selection through a variance selection method;

s3-3, performing dimension reduction treatment on the feature matrix dimension after feature selection through a principal component analysis method.

Further, the specific process of the step S3-1 is as follows:

3-1-1) for each sample x in the minority class, the formula is used:

obtaining Euclidean distance d from a sample x to other minority samples y;

3-1-2) the majority class sample number is denoted as m, the minority class sample number is denoted as n, let:

Taking k other samples with the minimum Euclidean distance d in each sample x as neighbor x _k of the sample x;

3-1-3) for each neighbor x _k, a new sample x _n is generated in x and x _k using a random linear interpolation method:

x_n＝x+ε|x_k-x|

wherein epsilon is a random value between 0 and 1;

3-1-4) repeating steps 3-1-3) until the minority class samples and the majority class samples are equal or have no difference.

Further, the specific process of the step S3-3 is as follows:

3-3-1) carrying out normalization treatment on the characteristics;

Conversion using a linear function:

y＝(x-MinValue)/(MaxValue-MinValue)

Wherein x and y are values before and after conversion, maxValue, minValue is the maximum value and minimum value of the sample;

3-3-2) calculating the average value of the features of each column, and then subtracting the feature average value of the column from each dimension;

3-3-3) calculating a covariance matrix of the sample features;

3-3-4) calculating eigenvalues and eigenvectors of the covariance matrix;

3-3-5) sorting the calculated characteristic values from large to small;

3-3-6) taking out the first K eigenvectors and eigenvalues, multiplying the initial sample matrix by an eigenvector matrix formed by the K eigenvectors, and obtaining a feature matrix after dimension reduction;

The calculation of the K value refers to the following formula:

The minimum K value satisfying the above equation is found, where λ is the eigenvalue of the covariance matrix.

Further, the step S4 specifically includes:

s4-1, dividing the data after feature processing into a training set and a testing set data for training and testing a model;

S4-2, after training the KNN model by using training set data, testing the KNN model by using testing set data, and solving class I classification error rate (probability of misclassifying most classes into minority classes) omega ₁;

s4-3, after the logistic regression model is trained by using the training set data, testing the logistic regression model by using the testing set data, and obtaining class I classification error rate omega ₂;

S4-4, constructing a KNN proximity-logistic regression combination model based on Lagrange;

S4-5, predicting whether the design of a new product in the manufacturing enterprise can be completed within a specified period by utilizing a KNN adjacent-logistic regression combination model;

and S4-6, optimizing data of the main body including design, manufacture, products and users according to the predicted result.

Further, in the step S4-1, in order to determine whether the classification results of the KNN proximity algorithm, the logistic regression algorithm and the KNN proximity-logistic regression combination model algorithm are accurate, a cross verification method is selected, and the data after feature processing are divided into three parts, which are A, B, C respectively; then A, B, C are divided into three groups according to the crossing mode, wherein the first group is a training set: A. b, a step of preparing a composite material; test set C ", the second group is" training set: B. c, performing operation; test set a ", the third group is" training set: A. c, performing operation; test set B).

Further, step S4-2 is to test the KNN model with the same set of test set data after training the KNN model with the first set of training set data, and then repeat the above operations with the second and third sets of data to obtain an average class i classification error rate ω ₁ for three times of the KNN model; the method comprises the following specific steps:

4-2-1) according to the Euclidean distance formula:

Calculating the Euclidean distance d between the first group of test set data x and the first group of training set data y;

4-2-2) sorting the sizes according to the calculated Euclidean distance d, and selecting the minimum k points, wherein the value of k is required to be smaller than the square root of the number of samples of the training set and is an odd number;

4-2-3) determining the frequencies of occurrence of k points in two categories, namely that the design can be completed in a specified period and that the design cannot be completed in the specified period, and taking the category with the highest frequency as the prediction classification of the data to be classified;

4-2-4) according to the classification result, obtaining class I classification error rate omega ₁₁ of the KNN model algorithm corresponding to the first group of data;

4-2-5) repeating the steps 4-2-1) -4-2-4) twice, solving the class I classification error rate omega ₁₂、ω₁₃ of the KNN model algorithm corresponding to the other two groups of data, and finally, solving the average value omega ₁＝(ω₁₁+ω₁₂+ω₁₃)/3 to be used as the class I classification error rate of the KNN model algorithm;

And step S4-3, after the logistic regression model is trained by using the first group of training set data, the logistic regression model is tested by using the same group of testing set data, and then the operation is repeated by using the second group of data and the third group of data, so as to obtain the average class I classification error rate omega ₂ of the logistic regression model three times, and the steps are as follows:

4-3-1) determining a predictive function:

based on Sigmoid function:

The weight vector is set to θ= (θ ₀,θ₁,θ₂,...,θ_n),

Taking the first set of training set data as an input vector x= (1, x ₁,x₂,...,x_n); let z (x) =θ ^T x, get the prediction function of the logistic regression algorithm:

marking whether the product design is finished within a specified period as y, marking y as 1 when the product design is finished on time, and marking y as 0 when the product design is not finished on time;

h _θ (x) represents the probability of y=1 in the case where the input value is x and the weight parameter is θ;

4-3-2) determining a weight vector θ:

for a given data set, a maximum likelihood estimation method may be used to estimate the weight vector θ:

Likelihood function:

its log likelihood function:

At this time introduce

Further converting the model into a gradient descent task to obtain a minimum value, wherein the second half part is an added regularization item, so as to solve the problem of overfitting of the model;

in the above formula, ζ is a penalty term strength value, a group of penalty term strengths ζ with different values are selected, for example [0.01,0.1,1, 10, 100], each value is circulated, 5 recall rates recall of each value after 5 times of cross-validation are obtained, so that a recall rate recall corresponding to each penalty strength is obtained, and then ζ corresponding to the recall rate recall with the highest value is selected as the penalty term strength value;

Solving the theta value, firstly solving the partial derivative of each J (theta) to theta, then giving a certain theta value, continuously subtracting the partial derivative from the certain theta value to multiply the step length, and then calculating new theta until the value of the theta changes to enable the difference value of the J (theta) between two iterations to be small enough, namely the value of the J (theta) calculated by the two iterations is basically unchanged, and indicating that the J (theta) reaches a local minimum value at the moment; then calculating each theta value, substituting the theta value into a logistic regression equation h _θ (x), and finally obtaining a prediction function;

wherein the partial derivative of J (θ) to θ is:

the iterative formula of θ _j after regularization is:

4-3-3) inputting the first group of test set data into a prediction function h _θ (x) of a logistic regression algorithm trained by the first group of training set data, and classifying the test set data according to the obtained probability value;

4-3-4) according to the classification result, obtaining class I classification error rate omega ₂₁ of the logistic regression model algorithm corresponding to the first group of data;

4-3-5) the steps 4-3-1) -4-3-4) are repeated twice, the class I classification error rate omega ₂₂、ω₂₃ of the logistic regression model algorithm corresponding to the other two groups of data is obtained, and finally the average value omega ₂＝(ω₂₁+ω₂₂+ω₂₃)/3 is obtained to be used as the class I classification error rate of the logistic regression model algorithm.

Further, the specific process of constructing the KNN proximity-logistic regression combination model based on Lagrange in the step S4-4 is as follows:

4-4-1) determination of the prediction function:

The predicted value of the combined model of the i-th sample is represented by p _i, and is:

p_i＝α₁k_i+α₂l_i

Wherein k _i、l_i represents the predicted probability value of the KNN and the logistic regression model on the ith sample respectively, alpha ₁、α₂ represents the weight value of the KNN and the logistic regression model respectively, and alpha ₁+α₂ =1;

4-4-2) constructing Lagrange loss function:

Wherein omega ₁、ω₂ is class I classification error rate of the submodel obtained in the step (2) and (3), wherein the class I classification error rate is regarded as a penalty parameter of the submodel, and lambda is Lagrange operator;

4-4-3) optimal value for α ₁,α₂:

Since L (α ₁,α₂, λ) is a convex function, there is a minimum, and the minimum point is the optimal value of α ₁,α₂;

The optimal value for alpha ₁,α₂ can be obtained by solving the above equation using python.

Compared with the prior art, the scheme has the following principle and advantages:

According to the scheme, after large data of main bodies such as design, manufacture, products and users in manufacturing enterprises are collected, cleaned and subjected to characteristic processing, an accurate and effective design resource large data model for the whole system optimization design of the manufacturing enterprises is constructed by a KNN adjacent-logistic regression combined model algorithm, so that related businesses in the manufacturing enterprises are prejudged, and meanwhile, data of the main bodies such as the design, manufacture, products and users are optimized, and the problem that the existing design resource data model only considers single design department data and does not integrate and gather all the design department data is solved, and the problem that a single data model possibly cannot accurately predict classification results is solved.

In addition, the scheme is matched with the full-flow manufacturing process, the full-through management process and the business model of full-period product service to realize the modeling of the full-system full-value chain of the manufacturing big data together, and the problem that the traditional relational database model cannot reasonably and effectively model the big data of a manufacturing enterprise can be further solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the services required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the figures in the following description are only some embodiments of the present invention, and that other figures can be obtained according to these figures without inventive effort to a person skilled in the art.

FIG. 1 is a schematic flow chart of a design resource big data modeling method for manufacturing enterprise full system optimization design;

FIG. 2 is a flow chart of data cleaning in a design resource big data modeling method for manufacturing enterprise full system optimization design.

Detailed Description

The invention is further illustrated by the following examples:

As shown in fig. 1, the method for modeling design resource big data for optimizing design of a whole system of a manufacturing enterprise according to the embodiment includes the following steps:

S1, data acquisition:

s1-5, building a structured main body data set.

Through the steps, the collected multi-source heterogeneous design resource big data can be converted into a structured data set with a uniform format.

As shown in fig. 2, the collected data is subjected to cleaning treatment to remove data which does not meet the requirements; the method comprises the following specific steps:

S2-1, data preprocessing: viewing metadata, including field interpretations, data sources, code tables, etc. all describing data, allows an intuitive understanding of the data itself and preliminary discovery of problems in preparation for later processing;

S2-2, removing or complementing missing data: determining the missing range of each data field, directly discarding the data of the data field with the missing key, and filling and perfecting the non-key data, wherein the method comprises the steps of presuming the filling missing value by business knowledge or experience, filling the missing value by the calculation results (average value, median, mode and the like) of the same index, and filling the missing value by the calculation results of different indexes;

S2-3, removing data with errors in the content, and ensuring the correctness of the data;

S2-4, removing logically wrong data: discarding the data with logic errors according to the business rules to ensure the logic correctness of the data;

S2-5, removing unnecessary data: removing data irrelevant to the business rule, and ensuring the relativity of the data;

s2-6, verifying data relevance: for data from multiple sources, it is necessary to perform correlation verification, and if not, the data needs to be cleaned.

S3, carrying out feature processing on the data meeting the requirements:

S3-1, class imbalance problem processing: when there is a serious class imbalance problem in the data, the predicted result tends to deviate to the class with a large number, and the accuracy of the model is affected. A common method for dealing with the problem of class imbalance is a random undersampling method, which reduces the scale of the majority class by randomly removing some majority class samples, but important data may be lost in this way, and the sampled data cannot represent all data, so that the classification result is inaccurate. There is also a random oversampling method that increases the scale of minority classes by randomly copying minority class samples, and although this method does not cause information loss, the performance is also superior to the undersampling method, but the possibility of overfitting is increased.

In the embodiment, under the conditions of not losing important data and relieving over-fitting, the information over-sampling SMOTE method is adopted to solve the problem of class unbalance. The concrete analysis and calculation flow is as follows:

3-1-1) for each sample x in the minority class, the formula is used:

obtaining Euclidean distance d from a sample x to other minority samples y;

x_n＝x+ε|x_k-x|

wherein epsilon is a random value between 0 and 1;

S3-2, selecting the features through a variance selection method, firstly calculating variance values of the features, preferentially eliminating the features with the variance values of 0, and then selecting the features with the variance values larger than the threshold according to the threshold.

S3-3, after feature selection is completed, the problems of large calculation amount and long training time of the model possibly caused by overlarge feature matrix are solved, and dimension reduction processing is performed on the feature matrix dimension after feature selection through a Principal Component Analysis (PCA). The analysis and calculation flow is as follows:

3-3-1) carrying out normalization treatment on the characteristics;

Conversion using a linear function:

y＝(x-MinValue)/(MaxValue-MinValue)

3-3-3) calculating a covariance matrix of the sample features;

3-3-4) calculating eigenvalues and eigenvectors of the covariance matrix;

3-3-5) sorting the calculated characteristic values from large to small;

The calculation of the K value refers to the following formula:

S4, in order to avoid the situation that a single algorithm model possibly cannot accurately predict the classification result, the embodiment selects a KNN proximity-logistic regression combined model algorithm to classify and predict the sample to be classified, so as to judge whether the design of a new product in a manufacturing enterprise can be completed within a specified period, and optimize the data related to the main body such as design, manufacture, product, user and the like according to the prediction result.

The method comprises the following specific steps:

s4-1, determining training set and testing set data

In order to determine whether the classification results of the KNN proximity algorithm, the logistic regression algorithm and the KNN proximity-logistic regression combination model algorithm are accurate, a cross verification method is selected, and data after feature processing are divided into three parts which are A, B, C respectively; then A, B, C are divided into three groups according to the crossing mode, wherein the first group is a training set: A. b, a step of preparing a composite material; test set C ", the second group is" training set: B. c, performing operation; test set a ", the third group is" training set: A. c, performing operation; test set B ";

S4-2, after training the KNN model by using the first group of training set data, testing the KNN model by using the same group of testing set data, and then repeating the operation by using the second group of data and the third group of data to obtain the average class I classification error rate omega ₁ of the KNN model three times; the method comprises the following specific steps:

4-2-1) according to the Euclidean distance formula:

s4-3, after the logistic regression model is trained by using the first group of training set data, the logistic regression model is tested by using the same group of testing set data, and then the operation is repeated by using the second group of data and the third group of data, so that the average class I classification error rate omega ₂ of the logistic regression model is obtained three times, and the method comprises the following steps:

4-3-1) determining a predictive function:

based on Sigmoid function:

The weight vector is set to θ= (θ ₀,θ₁,θ₂,...,θ_n),

4-3-2) determining a weight vector θ:

Likelihood function:

its log likelihood function:

At this time introduce

in the above formula, xi is a penalty value, selecting a group of penalty values of xi with different values, such as [0.01,0.1,1, 10, 100], and cycling each value to obtain 5 recall values (recall rate) of each value after 5 times of cross-validation, so that recall corresponding to each penalty value can be obtained, and then selecting xi corresponding to recall with the highest value as the penalty value;

Solving for the value of theta, firstly solving the partial derivative of each J (theta) to theta, then giving a certain value of theta, continuously subtracting the partial derivative from the value of theta to multiply the step length, and then calculating new theta until the value of theta changes to enable the difference value of J (theta) between two iterations to be small enough, namely the value of J (theta) calculated by the two iterations is basically unchanged, which means that the J (theta) reaches a local minimum value at the moment. And then calculating each theta value, and substituting the theta value into a logistic regression equation h _θ (x) to finally obtain a prediction function.

Wherein the partial derivative of J (θ) to θ is:

the iterative formula of θ _j after regularization is:

4-3-5) the steps 4-3-1) -4-3-4) are repeated twice, class I classification error rate omega ₂₂、ω₂₃ of the logistic regression model algorithm corresponding to the other two groups of data is obtained, and finally average value omega ₂＝(ω₂₁+ω₂₂+ω₂₃)/3 is obtained to be used as class I classification error rate of the logistic regression model algorithm;

s4-4, constructing a KNN proximity-logistic regression combination model:

4-4-1) determination of the prediction function:

p_i＝α₁k_i+α₂l_i

4-4-2) constructing Lagrange loss function:

4-4-3) optimal value for α ₁,α₂:

S4-5, service prediction:

Respectively inputting data of a sample to be classified into a KNN model and a logistic regression model to obtain respective prediction probability values k and l, obtaining a prediction value of the combined model by using a formula p=alpha ₁k+α₂ l, and judging whether the design of a new product can be completed in a specified period according to the value;

S4-6, optimizing design resources, and optimizing data related to main bodies such as design, manufacture, products, users and the like according to a prejudging result, wherein the method comprises the following steps of:

4-6-1) when the design of the new product with the pre-judging result can be completed within the specified period, the main data with smaller weight theta in the logistic regression algorithm can be properly degraded, for example, when the weight theta of the 'designer senior' is smaller, the personnel participating in the design can be replaced by a senior engineer to a junior and a senior engineer, so that the labor cost is saved.

4-6-2) When the design of the new product cannot be completed within a specified period as a result of the pre-judgment, the main body data with larger weight theta in the logistic regression algorithm can be properly upgraded, for example, when the weight theta of the processing equipment is larger, the processing equipment with better quality can be selected to process the product.

The above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, so variations in shape and principles of the present invention should be covered.

Claims

1. A design resource big data modeling method for manufacturing enterprise whole system optimization design is characterized by comprising the following steps:

s3, carrying out feature processing on the data meeting the requirements;

S4, carrying out classification prediction on a sample to be classified by adopting a KNN proximity-logistic regression combined model algorithm, so as to judge whether the design of a new product in a manufacturing enterprise can be completed within a specified period, and optimizing data of a main body related to the design, the manufacture, the product and a user according to a prediction result;

the step S4 specifically includes:

S4-2, after training the KNN model by using training set data, testing the KNN model by using testing set data, and solving class I classification error rate omega ₁;

s4-6, optimizing data of a main body including design, manufacture, products and users according to a predicted result;

in the step S4-1, in order to determine whether the classification results of the KNN proximity algorithm, the logistic regression algorithm and the KNN proximity-logistic regression combination model algorithm are accurate, a cross verification method is selected, and data after feature processing are divided into three parts, which are A, B, C respectively; then A, B, C are divided into three groups according to the crossing mode, wherein the first group is a training set: A. b, a step of preparing a composite material; test set C ", the second group is" training set: B. c, performing operation; test set a ", the third group is" training set: A. c, performing operation; test set B ";

step S4-2 is to test the KNN model by using the same set of test set data after training the KNN model by using the first set of training set data, and then repeat the operation by using the second and third sets of data to obtain the average class I classification error rate omega ₁ of the KNN model three times; the method comprises the following specific steps:

4-2-1) according to the Euclidean distance formula:

4-3-1) determining a predictive function:

based on Sigmoid function:

The weight vector is set to θ= (θ ₀,θ₁,θ₂,...,θ_n),

4-3-2) determining a weight vector θ:

For a given data set, a maximum likelihood estimation method is used to estimate the weight vector θ:

Likelihood function:

its log likelihood function:

At this time introduce

In the above formula, xi is a punishment item force value, a group of punishment item forces with different values, xi [0.01,0.1,1, 10, 100], are selected, each value is circulated, 5 recall rates recall of each value after 5 times of cross verification are obtained, so that a recall rate recall corresponding to each punishment item force is obtained, and then xi corresponding to the recall rate recall with the highest value is selected as the punishment item force value;

Solving the theta value, firstly solving the partial derivative of each J (theta) to theta, then giving a certain theta value, continuously subtracting the partial derivative from the certain theta value to multiply the step length, and then calculating new theta until the value of the theta changes to enable the difference value of the J (theta) between two iterations to be small enough, namely the value of the J (theta) calculated by the two iterations is basically unchanged, and indicating that the J (theta) reaches a local minimum value at the moment; then calculate each theta value and substitute the values into a logistic regression equation Finally obtaining a prediction function;

wherein the partial derivative of J (θ) to θ is:

the iterative formula of θ _j after regularization is:

the specific process of constructing the KNN proximity-logistic regression combination model based on Lagrange in the step S4-4 is as follows:

4-4-1) determination of the prediction function:

the predicted values of the combined model for the ith sample are denoted by pi, and are:

p_i＝α₁k_i+α₂l_i

4-4-2) constructing Lagrange loss function:

Wherein omega ₁、ω₂ is class I classification error rate of the submodel obtained in the step S4-2 and the step S4-3 respectively, wherein the class I classification error rate is regarded as a punishment parameter of the submodel, and lambda is Lagrange operator;

4-4-3) optimal value for α ₁,α₂:

2. The method for modeling design resource big data for full-system optimization design of manufacturing enterprises according to claim 1, wherein the specific steps of collecting multi-source heterogeneous design resource big data and converting the multi-source heterogeneous design resource big data into a structured data source with a uniform format are as follows:

s1-5, building a structured main body data set.

3. The method for modeling design resource big data for manufacturing enterprise-wide system optimization design according to claim 1, wherein the data cleaning comprises the steps of:

S2-1, preprocessing data;

s2-2, removing or complementing missing data;

S2-3, removing data with errors in the content;

S2-4, removing data with logic errors;

s2-5, removing unnecessary data;

s2-6, verifying data relevance.

4. The method for modeling design resource big data for manufacturing enterprise-wide system optimization design according to claim 1, wherein the feature processing comprises the steps of:

S3-2, performing feature selection through a variance selection method;

5. The method for modeling design resource big data for manufacturing enterprise-wide system optimization design according to claim 4, wherein the specific process of step S3-1 is as follows:

3-1-1) for each sample x in the minority class, the formula is used:

obtaining Euclidean distance d from a sample x to other minority samples y;

x_n＝x+ε|x_k-x|

wherein epsilon is a random value between 0 and 1;

6. The method for modeling design resource big data for manufacturing enterprise-wide system optimization design according to claim 4, wherein the specific process of step S3-3 is as follows:

3-3-1) carrying out normalization treatment on the characteristics;

Conversion using a linear function:

y＝(x-MinValue)/(MaxValue-MinValue)

3-3-3) calculating a covariance matrix of the sample features;

3-3-4) calculating eigenvalues and eigenvectors of the covariance matrix;

3-3-5) sorting the calculated characteristic values from large to small;

The calculation of the K value refers to the following formula: