CN115083549A - Product raw material ratio reverse derivation method based on data mining - Google Patents
Product raw material ratio reverse derivation method based on data mining Download PDFInfo
- Publication number
- CN115083549A CN115083549A CN202210838947.4A CN202210838947A CN115083549A CN 115083549 A CN115083549 A CN 115083549A CN 202210838947 A CN202210838947 A CN 202210838947A CN 115083549 A CN115083549 A CN 115083549A
- Authority
- CN
- China
- Prior art keywords
- raw material
- product
- performance
- product performance
- material ratio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000002994 raw material Substances 0.000 title claims abstract description 105
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000009795 derivation Methods 0.000 title claims abstract description 14
- 238000007418 data mining Methods 0.000 title claims abstract description 12
- 239000000047 product Substances 0.000 claims abstract description 156
- 238000005457 optimization Methods 0.000 claims abstract description 34
- 238000012549 training Methods 0.000 claims abstract description 26
- 239000005060 rubber Substances 0.000 claims abstract description 23
- 230000009467 reduction Effects 0.000 claims abstract description 22
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 16
- 239000012467 final product Substances 0.000 claims abstract description 13
- 238000007637 random forest analysis Methods 0.000 claims abstract description 13
- 238000004140 cleaning Methods 0.000 claims abstract description 12
- 238000000513 principal component analysis Methods 0.000 claims abstract description 7
- 230000008569 process Effects 0.000 claims description 14
- 238000003066 decision tree Methods 0.000 claims description 6
- 239000000203 mixture Substances 0.000 claims description 6
- 230000002159 abnormal effect Effects 0.000 claims description 4
- 239000000463 material Substances 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 239000002861 polymer material Substances 0.000 description 6
- 238000010801 machine learning Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000012827 research and development Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 239000000853 adhesive Substances 0.000 description 1
- 230000001070 adhesive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000001311 chemical methods and process Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000000155 melt Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000003973 paint Substances 0.000 description 1
- 239000004033 plastic Substances 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 239000002244 precipitate Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C60/00—Computational materials science, i.e. ICT specially adapted for investigating the physical or chemical properties of materials or phenomena associated with their design, synthesis, processing, characterisation or utilisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/20—Identification of molecular entities, parts thereof or of chemical compositions
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/30—Prediction of properties of chemical compounds, compositions or mixtures
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Chemical & Material Sciences (AREA)
- Crystallography & Structural Chemistry (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a product raw material ratio reverse derivation method based on data mining, which comprises the following steps: s1, arranging the raw material proportioning data into an initial structured data table, cleaning the initial raw material proportioning data to form an effective structured data table, and performing dimensionality reduction treatment by adopting principal component analysis to obtain a dimensionality reduction structured data table; s2, adding different product performance fields in the dimension reduction structured data table, and cleaning to obtain a plurality of final product performance training data sets; s3, constructing a product performance prediction model for the final product performance training data set by adopting a random forest regression algorithm; and S4, optimizing by combining a Bayes optimization algorithm with different product performance prediction models to obtain a raw material ratio with the prediction performance closest to the target product performance, and taking the raw material ratio as the raw material ratio of the target product. The invention realizes the reverse derivation of the raw material ratio of the rubber product.
Description
Technical Field
The invention relates to a product raw material ratio reverse derivation method based on data mining, and belongs to the technical field of rubber product raw material ratio reverse derivation.
Background
The technical field of polymer materials is a rapidly developing field, and there are many kinds of polymer materials, such as rubber, plastic, fiber, paint, adhesive, polymer-based composite material, etc., and the polymer materials are widely used, and are generally used in the building, transportation, agriculture, liquid crystal, medical, electrical and electronic industries, etc.
Many process conditions are often involved in the preparation process of the high polymer material rubber, and various conditions such as raw material proportion, temperature, time and the like all affect the performance of a final product, so a large amount of experiments are involved in the process optimization process, and corresponding single-factor tests and orthogonal tests need to be carried out by experimenters in a matching way, so that a large amount of time and materials are occupied in the process optimization process. Particularly, the optimization of the raw material proportion can involve various variables such as the types of raw materials and the proportion of the raw materials, the experimental cost can be greatly increased for rubber products with more types of raw materials, and the raw material proportion belongs to the most critical factor influencing the product performance, so that the method has important value for the rapid optimization of the raw material proportion.
The rubber in the high polymer material is synthesized by a plurality of raw materials according to a certain proportion, wherein only a small part of the raw materials are used as main raw materials, and the proportion is relatively large. The raw materials have different compositions and proportions, and various properties of the synthesized material products are completely different. It is known that the various properties of a product suggest that the composition and proportions of the materials required to synthesize the product have traditionally been tried according to the experience of the engineer in this field, but the process is time consuming and laborious and does not form empirical precipitates.
The machine learning method is different from the traditional high polymer material engineering technical method, does not need to consider the physical and chemical process, only needs to learn the mapping relation between the raw materials and the performance from data, thereby realizing the performance prediction and the reverse derivation from the raw materials to the products, and can bring more direct data guidance to the experimental optimization of the raw material ratio under the condition of the same other process conditions, thereby improving the raw material ratio optimization efficiency.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a product raw material ratio reverse derivation method based on data mining, and the mapping relation between a rubber product and the raw material ratio thereof is constructed by utilizing machine learning algorithms such as principal component analysis, random forest regression and the like, so that the reverse derivation of the raw material ratio of the rubber product is realized, and the research and development efficiency of the rubber product is greatly improved.
The technical scheme for solving the technical problems is as follows: a product raw material ratio reverse derivation method based on data mining comprises the following steps:
s1, arranging the rubber initial raw material proportioning data into an initial structured data table, cleaning the rubber initial raw material proportioning data to obtain error-free and repeated raw material proportioning data to form an effective structured data table, and performing dimension reduction treatment on the raw material proportioning data in the effective structured data table by adopting principal component analysis to obtain a dimension reduction structured data table;
s2, adding different product performance fields in the dimension reduction structured data table, and performing data cleaning to obtain a plurality of final product performance training data sets;
s3, constructing a product performance prediction model for each final product performance training data set by adopting a random forest regression algorithm;
and S4, optimizing by combining a Bayes optimization algorithm with different product performance prediction models to obtain a raw material ratio with the prediction performance closest to the target product performance, and taking the raw material ratio as the raw material ratio of the target product.
Further, in step S1, the rubber initial raw material proportioning data in the initial structured data table is firstly cleaned to obtain error-free and repeated raw material proportioning data, and then is sorted into an effective structured data table;
the cleaning of the rubber initial raw material proportioning data specifically comprises the following steps:
removing product samples which appear for multiple times from the same raw material;
rejecting duplicate product samples;
removing product samples with the raw material proportion sum not equal to 100%.
Furthermore, dimension reduction processing is carried out on the raw material proportioning data, and main components capable of explaining that the total variance is more than 95% are reserved as data characteristics after dimension reduction.
Furthermore, in the effective structured data table, the row is a product sample and is listed as a raw material name; the proportioning data of the raw material classes not used by the product samples in the effective structured data table is filled with 0.
Further, step S2 is specifically: adding different product performance fields after the characteristic columns of the dimension reduction structured data table to obtain an initial product performance training data set; and cleaning the initial product performance training data set, and eliminating product samples corresponding to abnormal values in the product performance field, thereby obtaining a correct final product performance training data set.
Further, in step S3, the specific parameters of the random forest regression algorithm are: the number of decision trees n _ estimators =200, the maximum depth of the decision tree max _ depth =4, the maximum feature number is 4, the minimum number of samples min _ samples _ split =2 required for internal node subdivision, and the minimum number of samples min _ samples _ leaf =1.
Further, in step S4, before the raw material mixture ratio with the predicted performance closest to the target product performance is obtained by combining the bayesian optimization algorithm with the performance prediction models of different products, the product performance database and the target product performance are normalized to the maximum and minimum to eliminate the dimensional influence, the product with the shortest euclidean distance to the target product performance in the product performance database is the product closest to the target product performance, and the optimization goal of the bayesian optimization algorithm is to minimize the euclidean distance between the predicted performance of the product performance prediction model and the target product performance.
Furthermore, the raw material proportioning optimization range of the target product is designed by taking the main raw material proportioning of the product closest to the performance of the target product as a reference, and the auxiliary raw material proportioning of the target product is consistent with that of the product closest to the performance of the target product.
Further, in step S4, a bayesian optimization search is performed using a bayesian _ opt tool package, a gaussian process is used as a proxy model for bayesian optimization, and UCB, EI or PI is used as an acquisition function for bayesian optimization.
The parameters of the Gaussian process model in the Bayesian optimization are as follows: kernel = Matern (nu =2.5), alpha =1e-6, n _ resets _ optimizer = 5.
The invention has the beneficial effects that:
1. according to the invention, the mapping relation between the rubber product and the raw material ratio is constructed by utilizing principal component analysis, random forest regression and Bayesian optimization, so that the reverse derivation of the raw material ratio of the rubber product is realized, and the research and development efficiency of rubber is greatly improved.
2. According to the method, the high-dimensional sparse raw material proportioning data is subjected to dimensionality reduction by using principal component analysis, most data information is kept, and meanwhile, the characteristic dimensionality is greatly reduced, so that the prediction precision of a product performance prediction model is favorably improved, and the accuracy of optimizing the raw material proportioning of a target rubber product by using a Bayesian optimization algorithm is favorably improved.
3. According to the method and the device, the input conditions corresponding to the target output are obtained in a Bayesian optimization mode, the calculation cost is saved, and the problem that the exhaustion method is infeasible is solved under the conditions that the combination quantity of experimental input conditions is large and the input conditions are continuous features.
Drawings
FIG. 1 is a flow chart showing the reverse derivation of the ratio of raw materials in the example.
FIG. 2 is a flow chart of a reverse experiment of the target product raw material ratio in the example.
Detailed Description
The present invention will be described in detail with reference to the following embodiments in order to make the aforementioned objects, features and advantages of the invention more comprehensible. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will recognize without departing from the spirit and scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
As shown in the figure, a product raw material ratio reverse derivation method based on data mining includes:
summarizing raw material proportioning data of various rubber products accumulated in daily research and development work, and cleaning the raw material proportioning data to obtain error-free and repeated raw material proportioning data; the method specifically comprises the following steps:
removing product samples which appear for multiple times from the same raw material;
rejecting duplicate product samples;
removing product samples with the raw material proportion sum not equal to 100%.
Rearranging the raw material proportioning data into an effective structured data table, wherein the row is a product sample and is listed as a raw material name; the proportioning data of the raw material classes not used by the product samples in the effective structured data table is filled with 0.
In this embodiment, 1000 raw material proportioning data of different rubber product models are collected, and 905 product samples are obtained from the rearranged effective structured data table after data cleaning, wherein the total number of raw material types is 97.
And reducing the high-dimensional sparse structured data in the effective structured data table into low-dimensional data features by using principal component analysis. And after dimension reduction processing is carried out on the effective structured data table, the dimension reduction structured data table still comprising rows and columns is obtained, and the rows and the columns in the dimension reduction structured data table are respectively the main component characteristics of the product sample after dimension reduction.
In this embodiment, a PCA interface of the sklern machine learning library is used to complete feature dimension reduction. After the first 15 principal components are reserved, the original 97-dimensional raw material class characteristics can be reduced to 15 dimensions, 97% of the total variance can be explained (97% of original data information can be reserved by selecting the 15 principal components as the data characteristics after dimension reduction), and redundant characteristics are greatly removed.
Taking the product number as a link, and respectively adding different product performance fields such as sensitivity, Melt Flow Rate and the like after a data feature column of a structured data table to obtain a plurality of initial product performance training data sets (each product performance data corresponds to raw material proportioning data one by one according to the product number as the link); and cleaning the data of the multiple initial product performance training data sets, and eliminating product samples corresponding to abnormal values in the product performance field of each initial product performance training data set to obtain a final product performance training data set. In this embodiment, after removing the product samples corresponding to the abnormal values in the product performance field, 860 product samples are finally left in the Density product performance training data set, and 866 product samples are finally left in the Melt Flow Rate product performance training data set.
Aiming at formula data and performance data in a final product performance training data set, a Random Forest Regressor interface of a sklern machine learning library is used for constructing and training a product performance prediction model, wherein the performance data is non-characteristic variables, and the formula data is characteristic variables. In the process of training the random forest regression model for the product formula data and the performance data in the final product performance training data set, considering both undersfit and overfit, the parameters of the obtained random forest regression model are set as follows: the number of decision trees n _ estimators =200, the maximum depth of the decision tree max _ depth =4, the maximum feature number is 4, the minimum number of samples min _ samples _ split =2 required for internal node subdivision, and the minimum number of samples min _ samples _ leaf =1.
The specific process is as follows: randomly extracting 80% of data in a final product performance training data set as a training sample set (Q) i 0 The remaining 20% as test set { Q } j 1 }。
Using a training set { Q i 0 And (5) training a random forest model, and adjusting parameters in the random forest model to obtain an optimal prediction model ϵ. Set to be tested { Q j 1 Inputting the characteristic variable X of the prediction model ϵ to obtain a prediction set P i 0 Will predict the set { P } i 0 And { Q } j 1 Actual value dataset of non-characteristic variable y in { Q } 1j 1 Comparison gives the explained variance regression score:
in the formula (I), the compound is shown in the specification,S EVS expressing the interpretation variance regression score, var { } expressing the variance, and when the interpretation variance regression score is not less than 0.9, the prediction model is considered to be valid.
Arranging to obtain a product performance database consisting of different product performance fields such as sensitivity, Melt Flow Rate and the like, wherein product samples (namely product numbers) are listed as different product performance data such as sensitivity, Melt Flow Rate and the like in the product performance database;
finding out a product which is closest to the target product performance in a product performance database, and taking the raw material combination of the product as the raw material combination of the target product;
designing the raw material proportioning optimization range of the target product based on the main raw material (the proportioning is more than 10%) of the product closest to the performance of the target product;
specifically, the product performance database and the target product performance are subjected to maximum and minimum normalization, dimensional influence is eliminated, the product with the minimum Euclidean distance from the target product performance in the product performance database is the product closest to the target product performance, the proportion of the main raw materials is +/-10% and serves as the upper limit and the lower limit of the optimal range of the proportion of the main raw materials of the target product, and the proportion of the secondary raw materials of the target product is consistent with the proportion of the main raw materials of the target product.
For example, the target product performance is Density =1.256 and Melt Flow Rate =21.9, and the main raw materials and the mixture ratios of the product closest to the target product performance are PCGX-002 (23.1%), PCGX-003 (37.45%), PCGX-007 (36%), and the secondary raw materials and the mixture ratios thereof are PCGX-025 (0.15%), PCGX-068 (0.3%), PCGX-061 (2%) and PCGX-020 (1%). Then, the optimal ranges of the main raw material proportion of the target product are 10 percent of fluctuation, namely PCGX-002 (13.1-33.1 percent), PCGX-003 (27.45-47.45 percent) and PCGX-007 (26-46 percent).
Optimizing different product performance prediction models obtained by combining a Bayesian optimization algorithm with a random forest regression algorithm to obtain a main raw material ratio with the prediction performance closest to the target product performance, wherein the main raw material ratio is as follows: PCGX-002=28.2%, PCGX-003=35.4%, PCGX-007= 31.3%.
Specifically, a Bayesian optimization search is performed by using a bayes _ opt tool package, a Gaussian process is used as a proxy model of the bayesian optimization, and UCB is used as a selection function of the bayesian optimization. The parameter setting of the Gaussian process model in the Bayesian optimization is as follows: kernel = Matern (nu =2.5), alpha =1e-6, n _ resets _ optimizer = 5.
In order to ensure that the sum of the raw material ratios of the target product is 100%, the optimized main raw material ratio needs to be scaled by the same time, and the raw material ratios of the finally obtained target product are PCGX-002=28.7%, PCGX-003=36.05%, PCGX-007=31.8%, PCGX-025=0.15%, PCGX-068=0.3%, PCGX-061=2%, and PCGX-020= 1%.
Under the condition that other process conditions are not changed, a rubber product preparation experiment is carried out according to the raw material proportioning data, the obtained rubber product is subjected to Density and Melt Flow Rate performance measurement, and the specific results are as follows:
the technical features of the embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described in the above, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. A product raw material ratio reverse derivation method based on data mining is characterized by comprising the following steps:
s1, arranging the rubber initial raw material proportioning data into an initial structured data table, cleaning the rubber initial raw material proportioning data to obtain error-free and repeated raw material proportioning data to form an effective structured data table, and performing dimension reduction treatment on the raw material proportioning data in the effective structured data table by adopting principal component analysis to obtain a dimension reduction structured data table;
s2, adding different product performance fields in the dimension reduction structured data table, and performing data cleaning to obtain a plurality of final product performance training data sets;
s3, constructing a product performance prediction model for each final product performance training data set by adopting a random forest regression algorithm;
and S4, optimizing by combining a Bayes optimization algorithm with different product performance prediction models to obtain a raw material ratio with the prediction performance closest to the target product performance, and taking the raw material ratio as the raw material ratio of the target product.
2. The method of claim 1, wherein the step S1 of reversely deriving the product raw material ratio based on data mining specifically includes:
removing product samples which appear for multiple times from the same raw material;
rejecting duplicate product samples;
and eliminating product samples with the raw material proportion sum not equal to 100%.
3. The method for reversely deducing the product raw material ratio based on data mining as claimed in claim 1, wherein the dimension reduction processing is performed on the raw material ratio data, and the principal component capable of explaining that the total variance is more than 95% is reserved as the data characteristic after dimension reduction.
4. The method of claim 1, wherein the effective structured data table includes row product samples and column material names; the proportioning data of the raw material classes not used by the product samples in the effective structured data table is filled with 0.
5. The method of claim 4, wherein the step S2 is specifically performed by: adding different product performance fields after the characteristic columns of the dimension reduction structured data table to obtain an initial product performance training data set; and cleaning the initial product performance training data set, and eliminating product samples corresponding to abnormal values in the product performance field, thereby obtaining a correct final product performance training data set.
6. The method for reversely deriving the product raw material ratio based on data mining as claimed in claim 1, wherein in step S3, the specific parameters of the random forest regression algorithm are as follows: the number of decision trees is 200, the maximum depth of the decision trees is 4, the maximum feature number is 4, the minimum sample number required by internal node subdivision is 2, and the minimum sample number of leaf nodes is 1.
7. The method of claim 1, wherein in step S4, before optimizing through the bayesian optimization algorithm in combination with the prediction models of different product properties to obtain the raw material mixture ratio with the predicted performance closest to the target product performance, the product performance database and the target product performance are normalized to the maximum and minimum to eliminate dimensional influence, the product with the shortest euclidean distance to the target product performance in the product performance database is the product with the closest euclidean distance to the target product performance, and the optimization objective of the bayesian optimization algorithm is to minimize the euclidean distance between the performance predicted by the product performance prediction model and the target product performance.
8. The method of claim 7, wherein the optimization range of the raw material ratio of the target product is designed based on the main raw material ratio of a product having the closest performance to the target product, and the secondary raw material ratio of the target product is consistent with that of the product having the closest performance to the target product.
9. The method as claimed in claim 7, wherein in step S4, a bayesian optimization search is performed using a bayesian _ opt toolkit, a gaussian process is used as a proxy model for bayesian optimization, and UCB, EI or PI is used as an acquisition function for bayesian optimization.
10. The method for reversely deducing the product raw material ratio based on data mining as claimed in claim 9, wherein the parameters of the gaussian process model in the bayesian optimization are as follows: the kernel function is Matern with nu of 2.5, alpha =1e-6, and the number of times the optimizer restarts is 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210838947.4A CN115083549B (en) | 2022-07-18 | 2022-07-18 | Product raw material ratio reverse derivation method based on data mining |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210838947.4A CN115083549B (en) | 2022-07-18 | 2022-07-18 | Product raw material ratio reverse derivation method based on data mining |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115083549A true CN115083549A (en) | 2022-09-20 |
CN115083549B CN115083549B (en) | 2023-04-07 |
Family
ID=83258769
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210838947.4A Active CN115083549B (en) | 2022-07-18 | 2022-07-18 | Product raw material ratio reverse derivation method based on data mining |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115083549B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116359420A (en) * | 2023-04-11 | 2023-06-30 | 烟台国工智能科技有限公司 | Chromatographic data impurity qualitative analysis method based on clustering algorithm and application |
CN117649898A (en) * | 2024-01-30 | 2024-03-05 | 烟台国工智能科技有限公司 | Liquid crystal material formula analysis method and device based on data mining |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106755972A (en) * | 2016-12-14 | 2017-05-31 | 中国地质大学(武汉) | A kind of method that sintering process comprehensive coke ratio is predicted based on Data Dimensionality Reduction method |
CN109508498A (en) * | 2018-11-14 | 2019-03-22 | 青岛科技大学 | Rubber shock absorber formula designing system and method based on BP artificial neural network |
CN109981016A (en) * | 2019-03-25 | 2019-07-05 | 安徽大学 | A kind of optimal fast square output method of electric bus asynchronous machine based on random forest regression algorithm |
CN111476321A (en) * | 2020-05-18 | 2020-07-31 | 哈尔滨工程大学 | Air flyer identification method based on feature weighting Bayes optimization algorithm |
CN112990592A (en) * | 2021-03-26 | 2021-06-18 | 广东工业大学 | Shared vehicle fault prediction method and system |
CN113782109A (en) * | 2021-09-13 | 2021-12-10 | 烟台国工智能科技有限公司 | Reactant derivation method and reverse synthesis derivation method based on Monte Carlo tree |
-
2022
- 2022-07-18 CN CN202210838947.4A patent/CN115083549B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106755972A (en) * | 2016-12-14 | 2017-05-31 | 中国地质大学(武汉) | A kind of method that sintering process comprehensive coke ratio is predicted based on Data Dimensionality Reduction method |
CN109508498A (en) * | 2018-11-14 | 2019-03-22 | 青岛科技大学 | Rubber shock absorber formula designing system and method based on BP artificial neural network |
CN109981016A (en) * | 2019-03-25 | 2019-07-05 | 安徽大学 | A kind of optimal fast square output method of electric bus asynchronous machine based on random forest regression algorithm |
CN111476321A (en) * | 2020-05-18 | 2020-07-31 | 哈尔滨工程大学 | Air flyer identification method based on feature weighting Bayes optimization algorithm |
CN112990592A (en) * | 2021-03-26 | 2021-06-18 | 广东工业大学 | Shared vehicle fault prediction method and system |
CN113782109A (en) * | 2021-09-13 | 2021-12-10 | 烟台国工智能科技有限公司 | Reactant derivation method and reverse synthesis derivation method based on Monte Carlo tree |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116359420A (en) * | 2023-04-11 | 2023-06-30 | 烟台国工智能科技有限公司 | Chromatographic data impurity qualitative analysis method based on clustering algorithm and application |
CN116359420B (en) * | 2023-04-11 | 2023-08-18 | 烟台国工智能科技有限公司 | Chromatographic data impurity qualitative analysis method based on clustering algorithm and application |
CN117649898A (en) * | 2024-01-30 | 2024-03-05 | 烟台国工智能科技有限公司 | Liquid crystal material formula analysis method and device based on data mining |
CN117649898B (en) * | 2024-01-30 | 2024-05-03 | 烟台国工智能科技有限公司 | Liquid crystal material formula analysis method and device based on data mining |
Also Published As
Publication number | Publication date |
---|---|
CN115083549B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115083549B (en) | Product raw material ratio reverse derivation method based on data mining | |
CN109863487A (en) | Non- fact type question answering system and method and the computer program for it | |
Gaucel et al. | Learning dynamical systems using standard symbolic regression | |
CN113454661A (en) | System and method for product failure cause analysis, computer readable medium | |
Mielniczuk et al. | Stopping rules for mutual information-based feature selection | |
Laurens et al. | Delineation of the genomics field by hybrid citation-lexical methods: interaction with experts and validation process | |
CN115062501A (en) | Chip packaging design optimization method based on adaptive subproblem selection strategy | |
Sklar | Fast MLE computation for the Dirichlet multinomial | |
CN116680976A (en) | Reverse design method for additive manufacturing metal material based on machine learning | |
CN117334271A (en) | Method for generating molecules based on specified attributes | |
CN115130985A (en) | Production control method and related apparatus, storage medium, and program product | |
Roeva et al. | InterCriteria analysis by pairs and triples of genetic algorithms application for models identification | |
US20240290440A1 (en) | Formulation graph for machine learning of chemical products | |
CN116756662A (en) | Yield prediction method and system for optimizing random forest based on Harris eagle algorithm | |
CN116502943A (en) | Quality tracing method for investment casting product | |
CN113836826A (en) | Key parameter determination method and device, electronic device and storage medium | |
CN116383503A (en) | Knowledge tracking method and system based on countermeasure learning and sequence recommendation | |
CN115270861A (en) | Product composition data monitoring method and device, electronic equipment and storage medium | |
CN115292672A (en) | Formula model construction method, system and device based on machine learning | |
Disanto et al. | Enumeration of compact coalescent histories for matching gene trees and species trees | |
Üresin | Correlation based regression imputation (CBRI) method for missing data imputation | |
JP2010044605A (en) | Device and program for searching database of steel plate production result | |
Jain et al. | Supervised Rank aggregation (SRA): A novel rank aggregation approach for ensemble-based feature selection | |
Bolshoy et al. | Ranking of prokaryotic genomes based on maximization of sortedness of gene lengths | |
Walaa et al. | Testing the Number of Components in a Birnbaum-Saunders Mixture Model under a Random Censoring Scheme |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |