CN112270614A - Design resource big data modeling method for manufacturing enterprise whole system optimization design - Google Patents

Design resource big data modeling method for manufacturing enterprise whole system optimization design Download PDF

Info

Publication number
CN112270614A
CN112270614A CN202011049729.XA CN202011049729A CN112270614A CN 112270614 A CN112270614 A CN 112270614A CN 202011049729 A CN202011049729 A CN 202011049729A CN 112270614 A CN112270614 A CN 112270614A
Authority
CN
China
Prior art keywords
data
design
value
logistic regression
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011049729.XA
Other languages
Chinese (zh)
Other versions
CN112270614B (en
Inventor
任鸿儒
肖毅
鲁仁全
徐雍
周琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202011049729.XA priority Critical patent/CN112270614B/en
Publication of CN112270614A publication Critical patent/CN112270614A/en
Application granted granted Critical
Publication of CN112270614B publication Critical patent/CN112270614B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24143Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Manufacturing & Machinery (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a design resource big data modeling method for the whole system optimization design of a manufacturing enterprise, which constructs an accurate and effective design resource big data model for the whole system optimization design of the manufacturing enterprise by a KNN (K nearest neighbor-logistic regression) combined model algorithm after collecting, cleaning and characteristic processing big data of main bodies such as design, manufacture, products and users in the manufacturing enterprise, thereby prejudging related businesses in the manufacturing enterprise, optimizing data related to the main bodies such as design, manufacture, products and users, and solving the problems that the existing design resource data model only considers data of a single design department and does not integrate and summarize all data of the design department and the single data model can not accurately predict and classify results.

Description

Design resource big data modeling method for manufacturing enterprise whole system optimization design
Technical Field
The invention relates to the technical field of manufacturing industry and big data, in particular to a design resource big data modeling method for the whole system optimization design of a manufacturing enterprise.
Background
The industrial big data is an important strategic resource for transformation and upgrading of the manufacturing industry in China, and in order to fully utilize mass data generated in the design, manufacturing, management and service processes of manufacturing enterprises, a data space construction method and technology of the manufacturing enterprises become important basic leading-edge technologies. The manufacturing enterprise data space is a space formed by full-system and full-value chain data generated in business domains such as design, manufacture, management and service, has the characteristics of multi-mode, cross-scale, high flux, strong association, re-mechanism and the like besides the characteristics of large data 4V (large scale, fast change, miscellaneous type and low quality), and causes the problem of difficult modeling of large manufacturing data.
Most of the current manufacturing big data modeling methods are aimed at modeling in a single business field, correlation influences of data of other business fields are not fully considered in the modeling process, a modeling method which runs through multiple business fields and the whole life cycle of a product is lacked, and core problems of the business fields such as design resources, management processes, manufacturing processes, product services and the like cannot be comprehensively and effectively described in the full-process and full-system angle.
The product design is the first link of the life cycle of the product, on one hand, the existing design resource data model only considers single design department data, does not integrate and summarize all the design department data, and the data model adopts a single algorithm, so that the situation that the classification result cannot be accurately predicted may exist.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a design resource big data modeling method for the whole-system optimization design of a manufacturing enterprise, realizes the highly ordered display of the relation of design resource big data, realizes the whole-system full-value chain modeling of the manufacturing big data by matching with a full-process manufacturing process, a full-run-through management process and a full-period product service business model, and solves the problem that the traditional relational database model cannot reasonably and effectively model the big data of the manufacturing enterprise.
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
a design resource big data modeling method for manufacturing enterprise whole system optimization design comprises the following steps:
s1, collecting multi-source heterogeneous design resource big data, and converting the big data into a structured data source with a uniform format;
s2, cleaning the collected data, and removing the data which do not meet the requirements;
s3, performing feature processing on the data meeting the requirements;
s4, classifying and predicting the samples to be classified by adopting a KNN (K nearest neighbor-logistic regression) combined model algorithm, judging whether the design of a new product in a manufacturing enterprise can be completed within a specified period or not, and optimizing the data of a main body related to the design, the manufacture, the product and a user according to the predicted result.
Further, step S1 is to collect multi-source heterogeneous design resource big data, and the specific steps of converting the structured data source with the uniform format are as follows:
s1-1, identifying a data source related to the design resource main body of the manufacturing enterprise and a storage position of the data source;
s1-2, aiming at the relational database, adopting Sqoop technology to configure data connection between the relational database and the HDFS, and importing data in the relational database into the HDFS of Hadoop;
s1-3, analyzing the data file by adopting a MapReduce programming method aiming at the data in the file format and uploading the data file to the HDFS;
s1-4, integrating all the previously acquired main data in Hive based on the relational model;
and S1-5, establishing a structured subject data set.
Further, the data cleansing includes the steps of:
s2-1, preprocessing data;
s2-2, removing or complementing missing data;
s2-3, removing data with errors in content;
s2-4, removing the logically wrong data;
s2-5, removing unnecessary data;
and S2-6, performing data correlation verification.
Further, the feature processing includes the steps of:
s3-1, solving the problem of unbalance of positive and negative samples by adopting an information oversampling SMOTE method, and avoiding the problem of low prediction accuracy caused by sample unbalance existing in the subsequent KNN algorithm and logistic regression algorithm;
s3-2, selecting characteristics by a variance selection method;
and S3-3, performing dimension reduction processing on the feature matrix dimension after feature selection through a principal component analysis method.
Further, the specific process of step S3-1 is as follows:
3-1-1) for each sample x in the minority class, the formula is utilized:
Figure BDA0002709175470000031
solving the Euclidean distance d from the sample x to other samples y of a few types;
3-1-2) the number of the most-class samples is m, the number of the few-class samples is n, and the following steps are performed:
Figure BDA0002709175470000032
taking k other samples with the minimum Euclidean distance d in each sample x as the neighbor x of the sample xk
3-1-3) for each neighbor xkUsing a method of random linear interpolation, at x and xkIn the generation of a new sample xn
xn=x+ε|xk-x|
Wherein epsilon is a random value between 0 and 1;
3-1-4) repeating the steps 3-1-3) until the number of the minority class samples and the number of the majority class samples are equal or have small difference.
Further, the specific process of step S3-3 is as follows:
3-3-1) carrying out normalization processing on the characteristics;
converting by using a linear function:
y=(x-MinValue)/(MaxValue-MinValue)
wherein, x and y are values before and after conversion respectively, and MaxValue and MinValue are the maximum value and the minimum value of the sample respectively;
3-3-2) calculating the average value of each column of features, and then subtracting the average value of the column of features from each dimension;
3-3-3) calculating a covariance matrix of the sample features;
3-3-4) calculating an eigenvalue and an eigenvector of the covariance matrix;
3-3-5) sorting the calculated characteristic values from large to small;
3-3-6) taking out the first K eigenvectors and eigenvalues, and multiplying the initial sample matrix by an eigenvector matrix formed by the K eigenvectors to obtain an eigenvector matrix after dimension reduction;
the calculation of the value of K refers to the following formula:
Figure BDA0002709175470000041
the minimum value of K is found that satisfies the above equation, where λ is the eigenvalue of the covariance matrix.
Further, the step S4 is specifically:
s4-1, dividing the data after feature processing into a training set and a test set data for training and testing the model;
s4-2, training the KNN model by using the training set data, testing the KNN model by using the test set data, and calculating the class I classification error rate (classifying the majority into a few by mistake)Probability of class) ω1
S4-3, training the logistic regression model by using the training set data, testing the logistic regression model by using the test set data, and calculating the class I classification error rate omega2
S4-4, constructing a KNN proximity-logistic regression combination model based on Lagrange;
s4-5, predicting whether the design of a new product in a manufacturing enterprise can be completed within a specified period by utilizing a KNN proximity-logistic regression combination model;
and S4-6, optimizing data of the subject related to design, manufacture, product and user according to the predicted result.
Further, in order to determine whether the classification results of the KNN neighborhood algorithm and the logistic regression algorithm and the KNN neighborhood-logistic regression combination model algorithm are accurate, the step S4-1 selects a cross validation method, and divides the data after feature processing into three equal parts, which are A, B, C; a, B, C are then further divided into three groups in a cross-wise manner, the first group being a "training set: A. b; test set C ", the second set is the" training set: B. c; test set a ", the third group is the" training set: A. c; test set B ".
Further, in step S4-2, after the KNN model is trained by using the first set of training set data, the KNN model is tested by using the same set of test set data, and then the above operations are repeated by using the second and third sets of data to find the average class i classification error rate ω of the KNN model three times1(ii) a The method comprises the following specific steps:
4-2-1) according to the euclidean distance formula:
Figure BDA0002709175470000051
to calculate the euclidean distance d between the first set of test set data x and the first set of training set data y;
4-2-2) sorting according to the calculated Euclidean distance d, and selecting the minimum k points, wherein the value of k is smaller than the square root of the number of samples in the training set and is an odd number;
4-2-3) determining the frequency of k points in two categories, namely 'design can be completed in a specified period' and 'design cannot be completed in the specified period', and taking the category with the highest frequency as the prediction classification of the data to be classified;
4-2-4) calculating class I classification error rate omega of KNN model algorithm corresponding to the first group of data according to classification results11
4-2-5) repeating the steps of 4-2-1) -4-2-4) twice to obtain class I classification error rate omega of the KNN model algorithm corresponding to the other two groups of data12、ω13Finally, the average value ω is obtained1=(ω111213) (ii)/3 class i classification error rate as KNN model algorithm;
and step S4-3, after training the logistic regression model with the first set of training set data, testing the logistic regression model with the same set of test set data, and repeating the above operations with the second and third sets of data to find the average class I classification error rate omega of the logistic regression model three times2The method comprises the following steps:
4-3-1) determining a prediction function:
based on Sigmoid function:
Figure BDA0002709175470000061
setting the weight vector as theta ═ theta012,...,θn),
Using the first training set data as input vector x ═ 1, x1,x2,...,xn) (ii) a Let z (x) be θTx, obtaining a prediction function of the logistic regression algorithm:
Figure BDA0002709175470000062
recording whether the product design is finished in a specified period as y, recording y as 1 when the product design is finished on time, and recording y as 0 when the product design is not finished on time;
it hθ(x) Expressed that the input value is x, the weight parameter is thetaIn the case of (1), y is a probability of 1;
4-3-2) determining a weight vector θ:
for a given data set, a maximum likelihood estimation method can be used to estimate the weight vector θ:
likelihood function:
Figure BDA0002709175470000063
its log-likelihood function:
Figure BDA0002709175470000064
at this time introduce
Figure BDA0002709175470000065
Then the gradient is converted into a gradient descent task to obtain a minimum value, and the later half part is an added regularization item, so that the overfitting problem of the model is solved;
in the formula, xi is a punishment item strength value, a group of punishment item strengths xi with different values is selected, such as [0.01, 0.1, 1, 10 and 100], each value is circulated, 5 recall rates recall of each value after 5 times of cross validation are obtained, so that the recall rate recall corresponding to each punishment strength is obtained, and xi corresponding to the recall rate recall with the highest value is selected as the punishment item strength value;
for the solution of the theta value, firstly, the partial derivative of each J (theta) to theta is solved, then a certain theta value is given, the partial derivative is continuously subtracted by the theta value to multiply the step length, and then a new theta is calculated until the value of the theta value is changed to a value that the difference value of the J (theta) between two iterations is small enough, namely the value of the J (theta) calculated by two iterations is basically not changed, which indicates that the J (theta) reaches the local minimum value at the moment; then calculating each theta value, substituting into a logistic regression equation hθ(x) Finally, a prediction function is obtained;
wherein the partial derivative of J (θ) with respect to θ is:
Figure BDA0002709175470000071
normalized thetajThe iterative formula of (a) is:
Figure BDA0002709175470000072
4-3-3) input the first set of test set data into a predictor function h of a logistic regression algorithm trained on the first set of training set dataθ(x) Classifying the test set data according to the obtained probability value;
4-3-4) calculating class I classification error rate omega of the logistic regression model algorithm corresponding to the first group of data according to the classification result21
4-3-5) repeating the steps of 4-3-1) -4-3-4) twice to obtain class I classification error rate omega of the logistic regression model algorithm corresponding to the other two groups of data22、ω23Finally, the average value ω is obtained2=(ω212223) And/3 as class I classification error rate of the logistic regression model algorithm.
Further, the specific process of constructing the KNN neighborhood-logistic regression combination model based on Lagrange in step S4-4 is as follows:
4-4-1) determination of the prediction function:
by piThe predicted values of the combination model representing the ith sample include:
pi=α1ki2li
wherein k isi、liRespectively representing the prediction probability values of the KNN and logistic regression models to the ith sample, alpha1、α2Respectively represent the weight values of KNN and logistic regression models, and alpha12=1;
4-4-2) construct the Lagrange loss function:
Figure BDA0002709175470000081
wherein ω is1、ω2Regarding the class I classification error rate of the submodel obtained in the steps (2) and (3), taking the error rate as a punishment parameter of the submodel, wherein lambda is Lagrange operator;
4-4-3) solving for alpha1,α2Optimum value of (2):
due to L (alpha)1,α2λ) is a convex function, there is a minimum value, and the minimum point is α1,α2The optimum value of (d);
Figure BDA0002709175470000082
alpha can be obtained by solving the above formula by python1,α2The optimum value of (c).
Compared with the prior art, the principle and the advantages of the scheme are as follows:
according to the scheme, after the big data of the main bodies such as design, manufacture, products and users in a manufacturing enterprise are collected, cleaned and subjected to characteristic processing, an accurate and effective design resource big data model facing the whole system optimization design of the manufacturing enterprise is constructed by using a KNN adjacent-logistic regression combined model algorithm, so that the related business in the manufacturing enterprise is prejudged, the data of the main bodies such as design, manufacture, products and users are optimized, the problems that the existing design resource data model only considers data of a single design department and does not integrate and summarize all data of the design department are solved, and the problem that the classification result cannot be accurately predicted by the single data model is solved.
In addition, the scheme is matched with a full-process manufacturing process, a full-through management process and a full-period product service business model to realize the full-value chain modeling of the whole system of the large data, and can further solve the problem that the traditional relational database model cannot reasonably and effectively model the large data of a manufacturing enterprise.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the services required for the embodiments or the technical solutions in the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a design resource big data modeling method for manufacturing enterprise whole system optimization design according to the present invention;
FIG. 2 is a flow chart of data cleaning in the design resource big data modeling method for manufacturing enterprise whole system optimization design according to the invention.
Detailed Description
The invention will be further illustrated with reference to specific examples:
as shown in fig. 1, the design resource big data modeling method for manufacturing enterprise system wide optimization design according to this embodiment includes the following steps:
s1, data acquisition:
s1-1, identifying a data source related to the design resource main body of the manufacturing enterprise and a storage position of the data source;
s1-2, aiming at the relational database, adopting Sqoop technology to configure data connection between the relational database and the HDFS, and importing data in the relational database into the HDFS of Hadoop;
s1-3, analyzing the data file by adopting a MapReduce programming method aiming at the data in the file format and uploading the data file to the HDFS;
s1-4, integrating all the previously acquired main data in Hive based on the relational model;
and S1-5, establishing a structured subject data set.
The collected multisource heterogeneous design resource big data can be converted into a structured data set with a uniform format through the steps.
As shown in fig. 2, the collected data is cleaned to remove data which does not meet the requirements; the method comprises the following specific steps:
s2-1, preprocessing data: viewing metadata, including all information describing data such as field interpretation, data source, code table and the like, so that the data itself can be intuitively understood, some problems can be found preliminarily, and preparation is made for later processing;
s2-2, removing or completing missing data: determining missing range of each data field, directly discarding data of missing key data fields, and completely filling non-key data, wherein the method comprises the steps of speculating filling missing values by using business knowledge or experience, filling missing values by using calculation results (mean, median, mode and the like) of the same index, and filling missing values by using calculation results of different indexes;
s2-3, removing data with errors in content and ensuring the correctness of the data;
s2-4, removing logically wrong data: discarding the data with logic errors according to the service rule to ensure that the data logic is correct;
s2-5, removing unnecessary data: removing data irrelevant to the business rules, and ensuring the relevance of the data;
s2-6, carrying out data correlation verification: for data from multiple sources, correlation verification is necessary, and if not, the data needs to be cleaned.
And S3, performing feature processing on the data meeting the requirements:
s3-1, handling the class imbalance problem: when there is a serious class imbalance problem in the data, the predicted result is often biased to the class with a large number, which affects the accuracy of the model. A common method for dealing with the class imbalance problem is a random undersampling method, which reduces the size of the majority classes by randomly removing some majority class samples, but this may lose important data, and the sampled data cannot represent all the data, resulting in inaccurate classification results. There is also a random oversampling method in which the scale of the minority class is increased by randomly copying the minority class samples, and although this method does not cause information loss and performs better than the undersampling method, the probability of overfitting is increased.
In the embodiment, under the condition that important data are not lost and overfitting is relieved, the information oversampling SMOTE method is selected to solve the class imbalance problem. The specific analysis and calculation process is as follows:
3-1-1) for each sample x in the minority class, the formula is utilized:
Figure BDA0002709175470000101
solving the Euclidean distance d from the sample x to other samples y of a few types;
3-1-2) the number of the most-class samples is m, the number of the few-class samples is n, and the following steps are performed:
Figure BDA0002709175470000111
taking k other samples with the minimum Euclidean distance d in each sample x as the neighbor x of the sample xk
3-1-3) for each neighbor xkUsing a method of random linear interpolation, at x and xkIn the generation of a new sample xn
xn=x+ε|xk-x|
Wherein epsilon is a random value between 0 and 1;
3-1-4) repeating the steps 3-1-3) until the number of the minority class samples and the number of the majority class samples are equal or have small difference.
S3-2, selecting features through a variance selection method, firstly calculating the variance value of each feature, preferentially eliminating the feature with the variance value of 0, and then selecting the feature with the variance value larger than the threshold value according to the threshold value.
S3-3, after the feature selection is completed, the problems of large calculated amount and long training time of the model can be caused due to overlarge feature matrix, and the dimensionality of the feature matrix after the feature selection is subjected to dimensionality reduction processing through a Principal Component Analysis (PCA). The analysis and calculation process is as follows:
3-3-1) carrying out normalization processing on the characteristics;
converting by using a linear function:
y=(x-MinValue)/(MaxValue-MinValue)
wherein, x and y are values before and after conversion respectively, and MaxValue and MinValue are the maximum value and the minimum value of the sample respectively;
3-3-2) calculating the average value of each column of features, and then subtracting the average value of the column of features from each dimension;
3-3-3) calculating a covariance matrix of the sample features;
3-3-4) calculating an eigenvalue and an eigenvector of the covariance matrix;
3-3-5) sorting the calculated characteristic values from large to small;
3-3-6) taking out the first K eigenvectors and eigenvalues, and multiplying the initial sample matrix by an eigenvector matrix formed by the K eigenvectors to obtain an eigenvector matrix after dimension reduction;
the calculation of the value of K refers to the following formula:
Figure BDA0002709175470000121
the minimum value of K is found that satisfies the above equation, where λ is the eigenvalue of the covariance matrix.
S4, in order to avoid the situation that a single algorithm model may not accurately predict the classification result, the present embodiment selects to use the KNN neighbor-logistic regression model algorithm to classify and predict the samples to be classified, so as to determine whether the design of a new product in a manufacturing enterprise can be completed within a specified period, and optimize the data of the main bodies related to design, manufacturing, product, user, and the like according to the prediction result.
The method comprises the following specific steps:
s4-1, determining training set and test set data
In order to determine whether the classification results of the KNN proximity algorithm, the logistic regression algorithm and the KNN proximity-logistic regression combination model algorithm are accurate, a cross-validation method is selected, and data after feature processing are divided into three equal parts which are A, B, C respectively; a, B, C are then further divided into three groups in a cross-wise manner, the first group being a "training set: A. b; test set C ", the second set is the" training set: B. c; test set a ", the third group is the" training set: A. c; test set B ";
s4-2, after the KNN model is trained by using the first group of training set data, the KNN model is tested by using the same group of test set data, and then the operation is repeated by using the second group of data and the third group of data to obtain the average class I classification error rate omega of the KNN model for three times1(ii) a The method comprises the following specific steps:
4-2-1) according to the euclidean distance formula:
Figure BDA0002709175470000122
to calculate the euclidean distance d between the first set of test set data x and the first set of training set data y;
4-2-2) sorting according to the calculated Euclidean distance d, and selecting the minimum k points, wherein the value of k is smaller than the square root of the number of samples in the training set and is an odd number;
4-2-3) determining the frequency of k points in two categories, namely 'design can be completed in a specified period' and 'design cannot be completed in the specified period', and taking the category with the highest frequency as the prediction classification of the data to be classified;
4-2-4) calculating class I classification error rate omega of KNN model algorithm corresponding to the first group of data according to classification results11
4-2-5) repeating the steps of 4-2-1) -4-2-4) twice to obtain class I classification error rate omega of the KNN model algorithm corresponding to the other two groups of data12、ω13Finally, the average value ω is obtained1=(ω111213) (ii)/3 class i classification error rate as KNN model algorithm;
s4-3, training the logistic regression model with the first set of training set data, testing the logistic regression model with the same set of test set data, repeating the above operations with the second and third sets of data, and finding the average class I of the logistic regression model three timesClassification error rate omega2The method comprises the following steps:
4-3-1) determining a prediction function:
based on Sigmoid function:
Figure BDA0002709175470000131
setting the weight vector as theta ═ theta012,...,θn),
Using the first training set data as input vector x ═ 1, x1,x2,...,xn) (ii) a Let z (x) be θTx, obtaining a prediction function of the logistic regression algorithm:
Figure BDA0002709175470000132
recording whether the product design is finished in a specified period as y, recording y as 1 when the product design is finished on time, and recording y as 0 when the product design is not finished on time;
it hθ(x) The probability that y is 1 when the input value is x and the weight parameter is theta is shown;
4-3-2) determining a weight vector θ:
for a given data set, a maximum likelihood estimation method can be used to estimate the weight vector θ:
likelihood function:
Figure BDA0002709175470000133
its log-likelihood function:
Figure BDA0002709175470000141
at this time introduce
Figure BDA0002709175470000142
Then the gradient is converted into a gradient descent task to obtain a minimum value, and the later half part is an added regularization item, so that the overfitting problem of the model is solved;
in the formula, xi is a penalty term strength value, a group of penalty term strengths xi with different values is selected, such as [0.01, 0.1, 1, 10, 100], each value is circulated, 5 recalls (recall rates) of each value after 5 times of cross validation are obtained, so that the recalls corresponding to each penalty strength can be obtained, and xi corresponding to the recalls with the highest value is selected as the penalty term strength value;
for the solution of the theta value, the partial derivative of each J (theta) to theta is firstly solved, then a certain theta value is given, the partial derivative is continuously subtracted to multiply the step length, and then a new theta is calculated until the value of the theta value is changed to a value which enables the difference between two iterations of the J (theta) to be small enough, namely the value of the J (theta) calculated by two iterations is basically not changed, which indicates that the J (theta) reaches a local minimum value at the moment. Then calculating each theta value, substituting into a logistic regression equation hθ(x) And finally obtaining a prediction function.
Wherein the partial derivative of J (θ) with respect to θ is:
Figure BDA0002709175470000143
normalized thetajThe iterative formula of (a) is:
Figure BDA0002709175470000144
4-3-3) input the first set of test set data into a predictor function h of a logistic regression algorithm trained on the first set of training set dataθ(x) Classifying the test set data according to the obtained probability value;
4-3-4) calculating class I classification error rate omega of the logistic regression model algorithm corresponding to the first group of data according to the classification result21
4-3-5) repeating the steps of 4-3-1) -4-3-4) twice to obtain the class I of the logistic regression model algorithm corresponding to the other two groups of dataError-like rate omega22、ω23Finally, the average value ω is obtained2=(ω212223) (ii)/3 class I classification error rate as logistic regression model algorithm;
s4-4, constructing a KNN proximity-logistic regression combination model:
4-4-1) determination of the prediction function:
by piThe predicted values of the combination model representing the ith sample include:
pi=α1ki2li
wherein k isi、liRespectively representing the prediction probability values of the KNN and logistic regression models to the ith sample, alpha1、α2Respectively represent the weight values of KNN and logistic regression models, and alpha12=1;
4-4-2) construct the Lagrange loss function:
Figure BDA0002709175470000151
wherein ω is1、ω2Regarding the class I classification error rate of the submodel obtained in the steps (2) and (3), taking the error rate as a punishment parameter of the submodel, wherein lambda is Lagrange operator;
4-4-3) solving for alpha1,α2Optimum value of (2):
due to L (alpha)1,α2λ) is a convex function, there is a minimum value, and the minimum point is α1,α2The optimum value of (d);
Figure BDA0002709175470000152
alpha can be obtained by solving the above formula by python1,α2The optimum value of (c).
S4-5, service prediction:
respectively inputting the data of the sample to be classified into a KNN model and a logistic regression modelObtaining respective prediction probability values k and l, and then using the formula p ═ alpha1k+α2l, obtaining a predicted value of the combined model, and judging whether the design of a new product can be completed in a specified period according to the value;
s4-6, optimizing design resources, and optimizing data related to main bodies such as design, manufacture, products, users and the like according to the prejudged result, wherein the steps are as follows:
4-6-1) when the design of the new product can be completed in the specified period as the result of the prejudgment, the main data with smaller weight theta in the logistic regression algorithm can be degraded properly, for example, when the weight theta of the "qualification of the designer" is smaller, the high-level engineer can be changed into the first-level engineer and the middle-level engineer, so as to save the labor cost.
4-6-2) when the design of the new product cannot be completed in a specified period according to the prejudgment result, properly upgrading the main data with a larger weight theta in the logistic regression algorithm, for example, when the weight theta of the 'processing equipment quality' is larger, processing equipment with better quality can be selected to process the product.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that variations based on the shape and principle of the present invention should be covered within the scope of the present invention.

Claims (10)

1. A design resource big data modeling method for manufacturing enterprise whole system optimization design is characterized by comprising the following steps:
s1, collecting multi-source heterogeneous design resource big data, and converting the big data into a structured data source with a uniform format;
s2, cleaning the collected data, and removing the data which do not meet the requirements;
s3, performing feature processing on the data meeting the requirements;
s4, classifying and predicting the samples to be classified by adopting a KNN (K nearest neighbor-logistic regression) combined model algorithm, judging whether the design of a new product in a manufacturing enterprise can be completed within a specified period or not, and optimizing the data of a main body related to the design, the manufacture, the product and a user according to the predicted result.
2. The modeling method for the design resource big data of the manufacturing enterprise whole system optimization design according to claim 1, wherein the step S1 is to collect the multi-source heterogeneous design resource big data, and the specific steps of converting the structured data source with the uniform format are as follows:
s1-1, identifying a data source related to the design resource main body of the manufacturing enterprise and a storage position of the data source;
s1-2, aiming at the relational database, adopting Sqoop technology to configure data connection between the relational database and the HDFS, and importing data in the relational database into the HDFS of Hadoop;
s1-3, analyzing the data file by adopting a MapReduce programming method aiming at the data in the file format and uploading the data file to the HDFS;
s1-4, integrating all the previously acquired main data in Hive based on the relational model;
and S1-5, establishing a structured subject data set.
3. The design resource big data modeling method for manufacturing enterprise system-wide optimization design according to claim 1, wherein the data cleaning comprises the steps of:
s2-1, preprocessing data;
s2-2, removing or complementing missing data;
s2-3, removing data with errors in content;
s2-4, removing the logically wrong data;
s2-5, removing unnecessary data;
and S2-6, performing data correlation verification.
4. The design resource big data modeling method for manufacturing enterprise system-wide optimization design according to claim 1, wherein the feature processing comprises the steps of:
s3-1, solving the problem of unbalance of positive and negative samples by adopting an information oversampling SMOTE method, and avoiding the problem of low prediction accuracy caused by sample unbalance existing in the subsequent KNN algorithm and logistic regression algorithm;
s3-2, selecting characteristics by a variance selection method;
and S3-3, performing dimension reduction processing on the feature matrix dimension after feature selection through a principal component analysis method.
5. The modeling method for design resource big data of manufacturing enterprise system-wide optimization design according to claim 4, wherein the specific process of step S3-1 is as follows:
3-1-1) for each sample x in the minority class, the formula is utilized:
Figure FDA0002709175460000021
solving the Euclidean distance d from the sample x to other samples y of a few types;
3-1-2) the number of the most-class samples is m, the number of the few-class samples is n, and the following steps are performed:
Figure FDA0002709175460000022
taking the k other samples with the minimum Euclidean distance d in each sample x as the neighbor chi of the sample xk
3-1-3) for each neighbor xkUsing a method of random linear interpolation, at x and xkIn the generation of a new sample xn
xn=x+ε|xk-x|
Wherein epsilon is a random value between 0 and 1;
3-1-4) repeating the steps 3-1-3) until the number of the minority class samples and the number of the majority class samples are equal or have small difference.
6. The modeling method for design resource big data of manufacturing enterprise system-wide optimization design according to claim 4, wherein the specific process of step S3-3 is as follows:
3-3-1) carrying out normalization processing on the characteristics;
converting by using a linear function:
y=(x-MinValue)/(MaxValue-MinValue)
wherein, x and y are values before and after conversion respectively, and MaxValue and MinValue are the maximum value and the minimum value of the sample respectively;
3-3-2) calculating the average value of each column of features, and then subtracting the average value of the column of features from each dimension;
3-3-3) calculating a covariance matrix of the sample features;
3-3-4) calculating an eigenvalue and an eigenvector of the covariance matrix;
3-3-5) sorting the calculated characteristic values from large to small;
3-3-6) taking out the first K eigenvectors and eigenvalues, and multiplying the initial sample matrix by an eigenvector matrix formed by the K eigenvectors to obtain an eigenvector matrix after dimension reduction;
the calculation of the value of K refers to the following formula:
Figure FDA0002709175460000031
the minimum value of K is found that satisfies the above equation, where λ is the eigenvalue of the covariance matrix.
7. The design resource big data modeling method for manufacturing enterprise system-wide optimization design according to claim 1, wherein the step S4 specifically includes:
s4-1, dividing the data after feature processing into a training set and a test set data for training and testing the model;
s4-2, after the KNN model is trained by the training set data, the KNN model is tested by the test set data, and the I-type classification of the KNN model is calculatedError-like rate omega1
S4-3, training the logistic regression model by using the training set data, testing the logistic regression model by using the test set data, and calculating the class I classification error rate omega2
S4-4, constructing a KNN proximity-logistic regression combination model based on Lagrange;
s4-5, predicting whether the design of a new product in a manufacturing enterprise can be completed within a specified period by utilizing a KNN proximity-logistic regression combination model;
and S4-6, optimizing data of the subject related to design, manufacture, product and user according to the predicted result.
8. The modeling method for design resource big data facing manufacturing enterprise whole system optimization design according to claim 7, wherein in step S4-1, in order to determine whether the classification results of the KNN neighborhood algorithm and the logistic regression algorithm and the KNN neighborhood-logistic regression combination model algorithm are accurate, a cross validation method is selected, and the data after feature processing is divided into three equal parts, respectively A, B, C; a, B, C are then further divided into three groups in a cross-wise manner, the first group being a "training set: A. b; test set C ", the second set is the" training set: B. c; test set a ", the third group is the" training set: A. c; test set B ".
9. The method for modeling design resource big data facing manufacturing enterprise whole system optimization design according to claim 8, wherein the step S4-2 is performed by training the KNN model with a first set of training set data, testing the KNN model with the same set of test set data, and repeating the above operations with a second and a third sets of data to find the average class i classification error rate ω of the KNN model three times1(ii) a The method comprises the following specific steps:
4-2-1) according to the euclidean distance formula:
Figure FDA0002709175460000041
to calculate the euclidean distance d between the first set of test set data x and the first set of training set data y;
4-2-2) sorting according to the calculated Euclidean distance d, and selecting the minimum k points, wherein the value of k is smaller than the square root of the number of samples in the training set and is an odd number;
4-2-3) determining the frequency of k points in two categories, namely 'design can be completed in a specified period' and 'design cannot be completed in the specified period', and taking the category with the highest frequency as the prediction classification of the data to be classified;
4-2-4) calculating class I classification error rate omega of KNN model algorithm corresponding to the first group of data according to classification results11
4-2-5) repeating the steps of 4-2-1) -4-2-4) twice to obtain class I classification error rate omega of the KNN model algorithm corresponding to the other two groups of data12、ω13Finally, the average value ω is obtained1=(ω111213) (ii)/3 class i classification error rate as KNN model algorithm;
and step S4-3, after training the logistic regression model with the first set of training set data, testing the logistic regression model with the same set of test set data, and repeating the above operations with the second and third sets of data to find the average class I classification error rate omega of the logistic regression model three times2The method comprises the following steps:
4-3-1) determining a prediction function:
based on Sigmoid function:
Figure FDA0002709175460000051
setting the weight vector as theta ═ theta012,...,θn),
Using the first training set data as input vector x ═ 1, x1,x2,...,xn) (ii) a Let z (x) be θTx, obtaining a prediction function of the logistic regression algorithm:
Figure FDA0002709175460000052
recording whether the product design is finished in a specified period as y, recording y as 1 when the product design is finished on time, and recording y as 0 when the product design is not finished on time;
it hθ(x) The probability that y is 1 when the input value is x and the weight parameter is theta is shown;
4-3-2) determining a weight vector θ:
for a given data set, a maximum likelihood estimation method can be used to estimate the weight vector θ:
likelihood function:
Figure FDA0002709175460000053
its log-likelihood function:
Figure FDA0002709175460000054
at this time introduce
Figure FDA0002709175460000061
Then the gradient is converted into a gradient descent task to obtain a minimum value, and the later half part is an added regularization item, so that the overfitting problem of the model is solved;
in the formula, xi is a punishment item strength value, a group of punishment item strengths xi with different values is selected, such as [0.01, 0.1, 1, 10 and 100], each value is circulated, 5 recall rates recall of each value after 5 times of cross validation are obtained, so that the recall rate recall corresponding to each punishment strength is obtained, and xi corresponding to the recall rate recall with the highest value is selected as the punishment item strength value;
for solving theta values, the partial derivative of each J (theta) to theta is firstly solved, then a certain theta value is given, the partial derivative is continuously subtracted to be multiplied by a step length, and thenCalculating new theta until the value of theta changes to a value which enables the difference value of J (theta) between two iterations to be small enough, namely the value of J (theta) calculated by two iterations basically does not change, and the value of J (theta) reaches a local minimum value at the moment; then calculating each theta value, substituting into a logistic regression equation hθ(x) Finally, a prediction function is obtained;
wherein the partial derivative of J (θ) with respect to θ is:
Figure FDA0002709175460000062
normalized thetajThe iterative formula of (a) is:
Figure FDA0002709175460000063
4-3-3) input the first set of test set data into a predictor function h of a logistic regression algorithm trained on the first set of training set dataθ(x) Classifying the test set data according to the obtained probability value;
4-3-4) calculating class I classification error rate omega of the logistic regression model algorithm corresponding to the first group of data according to the classification result21
4-3-5) repeating the steps of 4-3-1) -4-3-4) twice to obtain class I classification error rate omega of the logistic regression model algorithm corresponding to the other two groups of data22、ω23Finally, the average value ω is obtained2=(ω212223) And/3 as class I classification error rate of the logistic regression model algorithm.
10. The design resource big data modeling method for manufacturing enterprise whole system optimization design according to claim 9, wherein the specific process of step S4-4 based on Lagrange to construct the KNN neighborhood-logistic regression combination model is as follows:
4-4-1) determination of the prediction function:
by piRepresenting the combination of the ith sampleThe predicted values of the model are as follows:
pi=α1ki2li
wherein k isi、liRespectively representing the prediction probability values of the KNN and logistic regression models to the ith sample, alpha1、α2Respectively represent the weight values of KNN and logistic regression models, and alpha12=1;
4-4-2) construct the Lagrange loss function:
Figure FDA0002709175460000071
wherein ω is1、ω2Regarding the class I classification error rate of the submodel obtained in the steps (2) and (3), taking the error rate as a punishment parameter of the submodel, wherein lambda is Lagrange operator;
4-4-3) solving for alpha1,α2Optimum value of (2):
due to L (alpha)1,α2λ) is a convex function, there is a minimum value, and the minimum point is α1,α2The optimum value of (d);
Figure FDA0002709175460000072
alpha can be obtained by solving the above formula by python1,α2The optimum value of (c).
CN202011049729.XA 2020-09-29 2020-09-29 Design resource big data modeling method for manufacturing enterprise full-system optimization design Active CN112270614B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011049729.XA CN112270614B (en) 2020-09-29 2020-09-29 Design resource big data modeling method for manufacturing enterprise full-system optimization design

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011049729.XA CN112270614B (en) 2020-09-29 2020-09-29 Design resource big data modeling method for manufacturing enterprise full-system optimization design

Publications (2)

Publication Number Publication Date
CN112270614A true CN112270614A (en) 2021-01-26
CN112270614B CN112270614B (en) 2024-05-10

Family

ID=74349345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011049729.XA Active CN112270614B (en) 2020-09-29 2020-09-29 Design resource big data modeling method for manufacturing enterprise full-system optimization design

Country Status (1)

Country Link
CN (1) CN112270614B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115344830A (en) * 2022-08-02 2022-11-15 无锡致为数字科技有限公司 Event probability estimation method based on big data

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106779079A (en) * 2016-11-23 2017-05-31 北京师范大学 A kind of forecasting system and method that state is grasped based on the knowledge point that multimodal data drives
KR20170060603A (en) * 2015-11-24 2017-06-01 윤정호 Method and system on generating predicted information of companies in demand for patent license
CN107203492A (en) * 2017-05-31 2017-09-26 西北工业大学 Product design cloud service platform modularization task replanning and distribution optimization method
US20180173847A1 (en) * 2016-12-16 2018-06-21 Jang-Jih Lu Establishing a machine learning model for cancer anticipation and a method of detecting cancer by using multiple tumor markers in the machine learning model for cancer anticipation
KR20180096834A (en) * 2017-02-09 2018-08-30 충북대학교 산학협력단 Method and system for predicting optimal environmental condition in manufacturing process
EP3474196A1 (en) * 2017-10-23 2019-04-24 OneSpin Solutions GmbH Method of selecting a prover
US20190216368A1 (en) * 2018-01-13 2019-07-18 Chang Gung Memorial Hospital, Linkou Method of predicting daily activities performance of a person with disabilities
CN110147400A (en) * 2019-05-10 2019-08-20 青岛建邦供应链股份有限公司 Inter-trade data resource integrated system based on big data
CN111507507A (en) * 2020-03-24 2020-08-07 重庆森鑫炬科技有限公司 Big data-based monthly water consumption prediction method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170060603A (en) * 2015-11-24 2017-06-01 윤정호 Method and system on generating predicted information of companies in demand for patent license
CN106779079A (en) * 2016-11-23 2017-05-31 北京师范大学 A kind of forecasting system and method that state is grasped based on the knowledge point that multimodal data drives
US20180173847A1 (en) * 2016-12-16 2018-06-21 Jang-Jih Lu Establishing a machine learning model for cancer anticipation and a method of detecting cancer by using multiple tumor markers in the machine learning model for cancer anticipation
KR20180096834A (en) * 2017-02-09 2018-08-30 충북대학교 산학협력단 Method and system for predicting optimal environmental condition in manufacturing process
CN107203492A (en) * 2017-05-31 2017-09-26 西北工业大学 Product design cloud service platform modularization task replanning and distribution optimization method
EP3474196A1 (en) * 2017-10-23 2019-04-24 OneSpin Solutions GmbH Method of selecting a prover
US20190216368A1 (en) * 2018-01-13 2019-07-18 Chang Gung Memorial Hospital, Linkou Method of predicting daily activities performance of a person with disabilities
CN110147400A (en) * 2019-05-10 2019-08-20 青岛建邦供应链股份有限公司 Inter-trade data resource integrated system based on big data
CN111507507A (en) * 2020-03-24 2020-08-07 重庆森鑫炬科技有限公司 Big data-based monthly water consumption prediction method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
万会芳;杜彦璞;: "K近邻和Logistic回归分类算法比较研究", 洛阳理工学院学报(自然科学版), no. 03, 25 September 2016 (2016-09-25), pages 83 - 86 *
林泳昌;朱晓姝;: "一种基于SMOTE的不均衡样本KNN分类方法", 广西科学, no. 03, 8 July 2020 (2020-07-08), pages 276 - 283 *
窦文章;吕修磊;: "基于回归时序模型的售后服务资源计划系统设计", 统计与决策, no. 13, 10 July 2009 (2009-07-10), pages 23 - 25 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115344830A (en) * 2022-08-02 2022-11-15 无锡致为数字科技有限公司 Event probability estimation method based on big data

Also Published As

Publication number Publication date
CN112270614B (en) 2024-05-10

Similar Documents

Publication Publication Date Title
CN110990461A (en) Big data analysis model algorithm model selection method and device, electronic equipment and medium
Abid et al. Linear regression with shuffled labels
CN101604363B (en) Classification system and classification method of computer rogue programs based on file instruction frequency
US11580425B2 (en) Managing defects in a model training pipeline using synthetic data sets associated with defect types
US10733344B2 (en) Method of selecting a prover
Zhou et al. A hybrid feature selection method RFSTL for manufacturing quality prediction based on a high dimensional imbalanced dataset
CN111325264A (en) Multi-label data classification method based on entropy
Cao et al. Graph-based workflow recommendation: on improving business process modeling
CN114861788A (en) Load abnormity detection method and system based on DBSCAN clustering
CN111367801A (en) Data transformation method for cross-company software defect prediction
Gleich et al. Seeded PageRank solution paths
Ahsan et al. Developing computational intelligence for smart qualification testing of electronic products
Fuda et al. Artificial intelligence in clinical multiparameter flow cytometry and mass cytometry–key tools and progress
CN112270614B (en) Design resource big data modeling method for manufacturing enterprise full-system optimization design
JP2020004409A (en) Automation and self-optimization type determination of execution parameter of software application on information processing platform
Chu et al. Recognition of unknown wafer defect via optimal bin embedding technique
Marcus et al. Flexible operator embeddings via deep learning
Zhang et al. Imbalanced networked multi-label classification with active learning
Guo et al. The FRCK clustering algorithm for determining cluster number and removing outliers automatically
Zhang et al. A comparative study of absent features and unobserved values in software effort data
CN111061711A (en) Large data flow unloading method and device based on data processing behavior
Göbler et al. $\textttcausalAssembly $: Generating Realistic Production Data for Benchmarking Causal Discovery
FAHRUDIN Sequence clustering in process mining for business process analysis using k-means
KR102556796B1 (en) Apparatus and method of recommending sampling method and classification algorithm by using metadata set
Singh et al. wCM based hybrid pre-processing algorithm for class imbalanced dataset

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant