CN117435904A - Single feature ordering and composite feature extraction method - Google Patents

Single feature ordering and composite feature extraction method Download PDF

Info

Publication number
CN117435904A
CN117435904A CN202311753604.9A CN202311753604A CN117435904A CN 117435904 A CN117435904 A CN 117435904A CN 202311753604 A CN202311753604 A CN 202311753604A CN 117435904 A CN117435904 A CN 117435904A
Authority
CN
China
Prior art keywords
feature
sample
features
expression
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311753604.9A
Other languages
Chinese (zh)
Other versions
CN117435904B (en
Inventor
胡旺
陈业航
章语
李欣悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202311753604.9A priority Critical patent/CN117435904B/en
Publication of CN117435904A publication Critical patent/CN117435904A/en
Application granted granted Critical
Publication of CN117435904B publication Critical patent/CN117435904B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2111Selection of the most significant subset of features by using evolutionary computational techniques, e.g. genetic algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C60/00Computational materials science, i.e. ICT specially adapted for investigating the physical or chemical properties of materials or phenomena associated with their design, synthesis, processing, characterisation or utilisation

Abstract

The invention discloses a single feature ordering and composite feature extraction method, and belongs to the technical field of data processing. The method comprises the following steps: s1, constructing an input data set; s2, partitioning and clustering; s3, classifying and carrying out symbolic regression, and decoding a symbolic regression result into an expression; s4, single feature ordering is carried out according to the sign regression result; s5, extracting composite features according to the symbolic regression result. The method can effectively improve the interpretability of the single feature selection result and remove irrelevant or redundant features; meanwhile, the composite characteristics conforming to the field interpretability can be extracted explicitly, so that the knowledge communication among the fields is promoted; in addition, the interference caused by noise features can be effectively removed by selecting the truly relevant features, so that a model is simplified, the model accuracy is improved, and the process of data generation is assisted to be understood.

Description

Single feature ordering and composite feature extraction method
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a single feature ordering and composite feature extraction method.
Background
Feature selection is an important issue in the field of data processing technology, with the goal of finding the optimal feature subset. The feature selection can eliminate irrelevant or redundant features, so that the purposes of reducing the number of features, improving the model accuracy and reducing the running time are achieved. On the other hand, the selection of truly relevant features can effectively simplify the model and assist in understanding the data generation process.
The feature selection is used as an NP-Hard problem, and the cost of finding the correct optimal configuration is very high in all possible permutations of how to get the optimal configuration of the feature subset given a set of features to be screened. In the field of feature selection, genetic algorithms determine an optimal feature subset by employing an evolutionary-based method; different feature subsets are encoded into a population by unique encoding means. Evaluating subsets from the population of each generation by adopting the correctness of the prediction model of the target task, and competing to determine which subsets are continued to the next generation; the next generation consists of contest winners and is intersected (winning feature set updated with features of other winners) and mutated (some features randomly introduced or deleted). After the algorithm runs for a certain number of generations, the optimal members of the population constitute the optimal feature subset.
Symbolic regression is a machine learning technique that aims to identify a potential mathematical expression. It first builds a population of na iotave stochastic formulas to represent the relationship between known independent variables and their dependent variable targets to predict new data. Each successive generation program evolved from the previous program and selected the most appropriate individuals from the population for genetic manipulation. The sign regression relies on the natural selection theory of Darwin, and by using operations such as simulating gene replication, crossover, mutation and the like among computer programs, under the conditions of larger initial population and reasonable crossover and mutation probability setting, the sign regression does not fall into a local optimal solution, can search rules hidden behind random values based on a large amount of actual data, and has wider applicability and higher accuracy compared with the traditional regression method. Genetic programming is a core algorithm of symbolic regression, and by introducing a self-defined function and a dynamic program service method, obvious effects are achieved in the fields of machine learning, artificial intelligence, combination optimization, self-adaptive systems, control technology and the like. The genetic programming is based on the characteristics of the function, a binary tree structure is adopted, a function expression is used in a data structure, and genetic operation aiming at binary strings in a genetic algorithm is improved to form genetic operation aiming at a binary tree.
The ideas of evolutionary computation are not separated from the symbolic regression and feature selection technologies. The former is to obtain a symbol expression which better accords with the relation between data through an evolution algorithm, and the latter is to obtain an optimal feature subset which can more predict the tag value through the evolution algorithm. However, most of the existing feature selection methods based on the evolutionary algorithm only can implicitly extract important features, but cannot provide an explanatory reason, which is definitely unfavorable for the knowledge exchange and verification among the cross-fields; furthermore, the features are not isolated in real life, and in many cases, the features play a role in the result recombination, and the complex features can be better reconstructed by adopting symbolic regression to perform feature extraction work.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides a single feature ordering and composite feature extraction method, and the method carries out pareto non-dominant ordering from the symbol regression expression results based on the occurrence frequency of the related features and the results of the partial derivative average values of the related features in each expression, thereby obtaining the importance ordering results of the related features; and extracting the composite characteristics conforming to the domain knowledge by extracting frequent sub-formulas in the symbolic regression result and combining the domain knowledge.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the single feature ordering and compound feature extracting method is characterized by comprising the following steps:
s1, constructing an input data set: for sample data to be processed, selecting parameters to be optimized in the sample data as labels, and selecting at least 3 features to be screened as related features; and (3) splicing the relevant characteristics of the samples with the corresponding labels after data preprocessing to obtain the input data of the single sample, and completing the construction of an input data set.
S2, partitioning and clustering: and carrying out cluster division on the input data set to obtain clusters in which each sample is located.
S3, sign regression: carrying out symbolic regression by classifying according to the clustering division result; in the symbolic regression process, the super parameters of each cluster are kept consistent, and the root mean square error is used as a fitness function; and after the symbolic regression iteration is finished, decoding the symbolic regression result into an expression to obtain the expression of each cluster.
S4, single feature ordering: counting the frequency of each related feature in the expression to obtain the total frequency of each related feature; meanwhile, selecting samples with fitting errors smaller than a set threshold value from each expression, and differentially calculating partial derivative average values of each related feature in the expression in the selected samples; and then, non-dominant sorting is carried out according to the total occurrence times of each related feature and the average value of the partial derivative of each related feature in the expression, so as to obtain a sorting result of the influence degree of the related feature on the parameters to be optimized.
S5, extracting composite features: and extracting substructures with occurrence frequencies larger than a set threshold value from the expression, and screening the extracted substructures by using a principal component analysis method or a correlation coefficient method to obtain composite characteristics.
Further, the data preprocessing includes: outlier rejection and data normalization;
the abnormal value eliminating process comprises the following steps: detecting abnormal values of the displacement sequences by adopting a Laida criterion; if the abnormal value exists, eliminating the abnormal value.
The data normalization process comprises the following steps: and (3) carrying out data standardization based on the mean value and standard deviation of the original data, wherein the standardized data meets the condition that the average value of samples in a single correlation characteristic is 0 and the variance is 1.
Further, the clustering division mode is as follows:
the input data for a single sample is expressed as:
(1)
wherein,representation sample->Related features of->,/>=1, 2,3, …, n, n is the total number of correlated features of the input dataset; but->Representation sample->Is a label value of (a).
Designating the number K of clusters, selecting any K samples from the input data set as initial center points to obtain a center point setWherein->Respectively representing the 1 st, 2 nd and K th center point samples; for the remaining samples which are not selected to be the center points, the Euclidean distance between each sample and all the center points is calculated by using a formula (2), and the samples are divided into clusters where the center points with the closest Euclidean distance are located according to the calculation result:
(2)
wherein,representing any sample in the input data set +.>Sample of any center point in the center point setEuclidean distance between->And->Respectively represent sample->And center point sample->Is>The values of the individual features.
And repeating the clustering dividing process, iterating until the cluster dividing is not changed or the maximum iteration number is reached, and completing the clustering dividing to obtain a clustering result.
Further, in step S3, symbol regression is implemented by using the evolutionary algorithm and the tree coding method.
Further, in step S4, the total number of times each relevant feature appears in the expression is calculated using expression 4):
(4)
wherein,for the number of expressions, +.>Indicate->Relevant features in the individual expressions->Frequency of occurrence;
calculating the average value of the partial derivatives of each relevant feature in the expression by using the formula (5):
(5)
wherein,indicate->Relevant features in the individual expressions->Is a derivative of the above.
Further, in step S4, a pareto non-dominant ranking algorithm is used to rank the single features.
Compared with other methods in the field of data processing, the method can effectively improve the interpretability of the single feature selection result, eliminate irrelevant or redundant features, thereby reducing the number of features and improving the model accuracy; meanwhile, composite features conforming to the field interpretability can be explicitly extracted from the sign regression result, so that knowledge exchange among fields is promoted; in addition, the interference caused by noise characteristics can be effectively removed by selecting the truly relevant characteristics, so that a model is simplified, and the understanding of the data generation process is assisted.
Drawings
FIG. 1 is a flow chart of feature selection based on symbolic regression according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of a symbolic regression process according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of the accuracy results of different feature selection algorithms according to embodiment 1 of the present invention.
Fig. 4 is a schematic diagram of the accuracy results of different feature selection algorithms according to embodiment 2 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantageous effects of the present invention more apparent, the following detailed description of the present invention will be given with reference to specific examples.
Example 1:
in this embodiment, taking the selection of the creep life characteristics of the nickel-based superalloy as an example, the creep life data of 10 ten thousand nickel-based high-temperature aggregate samples and nine corresponding characteristics to be screened are obtained, respectively: gamma prime volume fraction, shear modulus, domain inversion interfacial energy, stacking fault energy, gamma prime melting temperature, degree of mismatch, initial creep rate, applied stress, and creep temperature. In combination with practical process limitation factors and cost factors, 4 ten thousand samples are selected as the original data set of the embodiment.
Based on the above-mentioned nickel-based superalloy creep life data set, the present embodiment provides a single feature ordering and composite feature extraction method, the flow of which is shown in fig. 1, specifically including the following steps:
step 1: constructing an input data set;
for each sample in the nickel-based superalloy creep life original data set, taking life data of the sample as a label and 9 features to be screened as related features; preprocessing the relevant features, including: adopting Laida criterionCriteria) detecting abnormal values of the displacement sequences, and eliminating the abnormal values if the abnormal values exist; the data normalization is carried out on the related characteristics, and the process is as follows: data normalization is carried out based on the mean value and standard deviation of the original data, and the normalized data meets the requirement of a certain correlation characteristicThe mean value of the inner samples is 0, and the variance is 1; and (3) label: they are mapped to label data of 1-10 according to the value of creep life, and continuous creep life is mapped to discrete labels.
Splicing the preprocessed related features with the corresponding labels to obtain input data of a single sample:
(1)
wherein,representation sample->Related features of->,/>=1, 2,3, …,9; but->Representation sample->Is a label value of (a).
Step 2: partitioning and clustering;
the number K of the clusters is designated, if the number K of the clusters is too low, the number of samples in a single cluster is too high, and the purpose of the clusters cannot be achieved, and if the number K of the clusters is too high, the result of sign regression in the single cluster is not generalized; thus, for the nickel-base superalloy creep life data set, K is chosen to be 20 by empirical formula.
Selecting any 20 samples in the input data set as initial center points to obtain a center point setThe method comprises the steps of carrying out a first treatment on the surface of the For the remaining samples that are not selected to be the center point,the Euclidean distance of each sample to all center points is calculated using equation (2):
(2)
wherein,representing any sample in the input data set +.>Sample of any center point in the center point setEuclidean distance between->And->Respectively represent sample->And center point sample->Is>The values of the individual features; and dividing the samples into clusters where the center points with the nearest Euclidean distance are located according to the calculation result.
Repeating the clustering division, iterating until the cluster division is not changed or the maximum iteration number is reached, and completing the clustering to obtain a clustering result.
Step 3: symbol regression;
according to the clustering division result of the step 2, clustering is carried out, and symbolic regression is carried out, wherein the flow is shown in figure 2;
specifically, in the process of realizing symbolic regression by using an evolutionary algorithm, each generated expression is taken as an individual, and the fitness function in the evolutionary process is root mean square error RMSE, and the calculation formula is as follows:
(3)
wherein N is the number of all samples,then indicate +.>Life predictions for individual samples.
In the environment selection of each generation, the root mean square error is smaller, namely, individuals with higher fitness can be more easily left in the environment selection process, so that an expression with smaller error can be obtained along with the increase of the iteration times; in the symbolic regression process of this embodiment, the iteration number is set to 1000, the population size is 100, the variation probability is 0.8, and the crossover probability is 0.4.
In the symbol regression process, encoding an expression by adopting a multi-gene binary tree mode, wherein each gene consists of a binary tree, different genes form an expression, and a least square method is adopted among the different genes to determine coefficients; in this example, the depth of the tree was set to 6 and the maximum number of genes was 4. After the symbolic regression iteration is finished, the symbolic regression result is decoded into an expression.
Step 4: sequencing single features;
the ordering of the single feature specifically requires calculating two indexes, one is that the frequency of occurrence of the related feature in the expression is the more frequent the related feature occurs, the more important the related feature is; and the other is that the average value of the partial derivatives of the normalized correlation characteristic in the expression, the larger the average value of the partial derivatives is, the more sensitive the label is to the fluctuation of the correlation characteristic, and the more important the correlation characteristic is.
Calculating the total number of occurrences of each relevant feature in the expression using equation (4):
(4)
wherein,for the number of expressions to be used,
indicate->Relevant features in the individual expressions->Frequency of occurrence.
Selecting samples with the fitting error ranking of ten percent in each expression, and calculating the partial derivative average value of each relevant feature in the expression in the selected samples by using the formula (5):
(5)
wherein,indicate->Relevant features in the individual expressions->Is a derivative of the above.
And after the occurrence frequency and the partial derivative average value of each relevant feature are obtained, adopting non-dominant Parritor sorting to obtain a sorting result of the relevant features.
In this embodiment, the feature that the occurrence frequency is high is: gamma' volume fraction, shear modulus, stacking fault energy; the feature of higher average value of partial derivatives is: shear modulus, stacking fault energy, initial creep rate. Thus, according to the pareto non-dominant ordering, the top 4 single features in this embodiment are: gamma prime volume fraction, shear modulus, stacking fault energy, and initial creep rate.
Step 5: extracting composite new features;
extracting substructures with occurrence frequency more than 10% of the population quantity set by the symbolic regression, namely more than 10 times, according to the symbolic regression result expression obtained in the step 3; and then screening the extracted substructure by using a correlation coefficient method to obtain the composite characteristic.
In this embodiment, the extracted composite substructure isAnd->I.e. shear modulus initial creep rate and stacking fault energy initial creep rate.
The 4 single characteristics obtained in the step 4 and the two composite characteristics obtained in the step 5 are formed into a new characteristic data set, and are used as main characteristics affecting the creep life of the nickel-base superalloy, and the characteristics have better performance for predicting the creep life of the nickel-base superalloy.
And (3) verification: the creep life of the nickel-base superalloy is respectively predicted by using the 9 relevant characteristic original data sets and the new characteristic data sets obtained in the embodiment, the model prediction accuracy is shown in fig. 3, and it can be seen that the new characteristic data sets obtained in the embodiment are helpful for better predicting the value of the creep life of the nickel-base superalloy.
Example 2:
the present example uses the same nickel-base superalloy data as example 1 as the original dataset; the method is characterized in that the initial rate is used as a label, single feature ordering is carried out from other eight features to be screened, and composite feature extraction is carried out; the characteristics to be screened are respectively as follows: gamma volume fraction, shear modulus, domain inversion interfacial energy, stacking fault energy, gamma' melting temperature, degree of mismatch, applied stress, and creep temperature.
The nickel-base superalloy creep initiation rate dataset was subjected to a single feature ordering and composite feature extraction using the method described in example 1. Experimental results show that for the initial creep life of the nickel-based superalloy, the higher frequency of occurrence in the symbolic regression result expression is characterized by stacking fault energy, gamma' melting temperature and mismatch degree; the higher average value of the partial derivatives is characterized by gamma 'volume fraction, gamma' melting temperature and degree of mismatch. After non-dominant ordering of the occurrence frequency and the partial derivative average, the single feature selected is gamma' melting temperature and mismatch degree.
Extracting substructures with occurrence frequency more than 10% of the population quantity set by symbolic regression, namely more than 10 times, according to the symbolic regression result expression; then, the extracted substructure is screened by a correlation coefficient method, and the obtained composite features are as follows:and->I.e. gamma prime volume fraction mismatch versus stacking fault energy gamma prime melting temperature.
Constructing a new characteristic data set by using the gamma 'melting temperature, the mismatch degree, the gamma' volume fraction, the mismatch degree and the stacking fault energy; the initial creep rate of the nickel-base superalloy is predicted by using the original data set containing 8 relevant features and the new feature data set constructed in the embodiment, and the model prediction accuracy is shown in fig. 4, so that the new feature data set obtained in the embodiment is helpful to better predict the initial creep rate of the nickel-base superalloy.
While the invention has been described in terms of specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the equivalent or similar purpose, unless expressly stated otherwise; all of the features disclosed, or all of the steps in a method or process, except for mutually exclusive features and/or steps, may be combined in any manner.

Claims (6)

1. The single feature ordering and compound feature extracting method is characterized by comprising the following steps:
s1, constructing an input data set: for sample data to be processed, selecting parameters to be optimized in the sample data as labels, and selecting at least 3 features to be screened as related features; the relevant characteristics of the samples are spliced with the corresponding labels after data preprocessing, so that input data of a single sample are obtained, and the construction of an input data set is completed;
s2, partitioning and clustering: carrying out cluster division on an input data set to obtain clusters in which each sample is located;
s3, sign regression: carrying out symbolic regression by classifying according to the clustering division result; in the symbolic regression process, the super parameters of each cluster are kept consistent, and the root mean square error is used as a fitness function; after the symbolic regression iteration is finished, decoding the symbolic regression result into expressions to obtain expressions of each cluster;
s4, single feature ordering: counting the frequency of each related feature in the expression to obtain the total frequency of each related feature; meanwhile, selecting samples with fitting errors smaller than a set threshold value from each expression, and differentially calculating partial derivative average values of each related feature in the expression in the selected samples; then non-dominant sorting is carried out according to the total occurrence times of each related feature and the average value of the partial derivative of each related feature in the expression, and a result of sorting the influence degree of the related feature on the parameters to be optimized is obtained;
s5, extracting composite features: and extracting substructures with occurrence frequencies larger than a set threshold value from the expression, and screening the extracted substructures by using a principal component analysis method or a correlation coefficient method to obtain composite characteristics.
2. The single feature ordering and compound feature extraction method of claim 1, wherein the data preprocessing comprises: outlier rejection and data normalization;
the abnormal value eliminating process comprises the following steps: detecting abnormal values of the displacement sequences by adopting a Laida criterion; if the abnormal value exists, eliminating the abnormal value;
the data normalization process comprises the following steps: and (3) carrying out data standardization based on the mean value and standard deviation of the original data, wherein the standardized data meets the condition that the average value of samples in a single correlation characteristic is 0 and the variance is 1.
3. The method for sorting and extracting composite features according to claim 2, wherein the clustering is performed in the following manner:
the input data for a single sample is expressed as:
(1)
wherein,representation sample->Related features of->,/>=1, 2,3, …, n, n is the total number of correlated features of the input dataset; whileRepresentation sample->Is a tag value of (2);
designating the number K of clusters, selecting any K samples from the input data set as initial center points to obtain a center point setWherein->、/>、/>Respectively representing the 1 st, 2 nd and K th center point samples; for the remaining samples which are not selected to be the center points, the Euclidean distance between each sample and all the center points is calculated by using a formula (2), and the samples are divided into clusters where the center points with the closest Euclidean distance are located according to the calculation result:
(2)
wherein,representing any sample in the input data set +.>Sample of any center point in the center point set +.>Euclidean distance between->And->Respectively represent sample->And center point sample->Is>The values of the individual features;
and repeating the clustering dividing process, iterating until the cluster dividing is not changed or the maximum iteration number is reached, and completing the clustering dividing to obtain a clustering result.
4. The method for single feature ordering and compound feature extraction according to claim 3, wherein in step S3, symbol regression is implemented by using an evolutionary algorithm and a tree coding method.
5. The method for single feature ordering and compound feature extraction according to claim 4, wherein in step S4, the total number of occurrences of each relevant feature in the expression is calculated using formula (4):
(4)
where m is the number of expressions,representing the relevant feature +.>Frequency of occurrence;
calculating the average value of the partial derivatives of each relevant feature in the expression by using the formula (5):
(5)
wherein,representing the relevant feature +.>Is a derivative of the above.
6. The method of claim 4, wherein in step S4, the single feature ordering is performed by using a pareto non-dominant ordering algorithm.
CN202311753604.9A 2023-12-20 2023-12-20 Single feature ordering and composite feature extraction method Active CN117435904B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311753604.9A CN117435904B (en) 2023-12-20 2023-12-20 Single feature ordering and composite feature extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311753604.9A CN117435904B (en) 2023-12-20 2023-12-20 Single feature ordering and composite feature extraction method

Publications (2)

Publication Number Publication Date
CN117435904A true CN117435904A (en) 2024-01-23
CN117435904B CN117435904B (en) 2024-03-15

Family

ID=89551966

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311753604.9A Active CN117435904B (en) 2023-12-20 2023-12-20 Single feature ordering and composite feature extraction method

Country Status (1)

Country Link
CN (1) CN117435904B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105719253A (en) * 2016-01-20 2016-06-29 桂林电子科技大学 Kalman filtering phase unwrapping method having heapsort function in embedded manner
US9596196B1 (en) * 2013-10-17 2017-03-14 Amazon Technologies, Inc. Message grouping
US20180137219A1 (en) * 2016-11-14 2018-05-17 General Electric Company Feature selection and feature synthesis methods for predictive modeling in a twinned physical system
CN109800801A (en) * 2019-01-10 2019-05-24 浙江工业大学 K-Means clustering lane method of flow based on Gauss regression algorithm
CN110415111A (en) * 2019-08-01 2019-11-05 信雅达系统工程股份有限公司 Merge the method for logistic regression credit examination & approval with expert features based on user data
CN112257892A (en) * 2020-08-27 2021-01-22 中国石油化工股份有限公司 Method for optimizing complex gas reservoir drainage gas recovery process system
CN112330060A (en) * 2020-11-25 2021-02-05 新智数字科技有限公司 Equipment fault prediction method and device, readable storage medium and electronic equipment
CN113111308A (en) * 2021-03-15 2021-07-13 华南理工大学 Symbolic regression method and system based on data-driven genetic programming algorithm
CN113127864A (en) * 2019-12-31 2021-07-16 奇安信科技集团股份有限公司 Feature code extraction method and device, computer equipment and readable storage medium
CN115035966A (en) * 2022-08-09 2022-09-09 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Superconductor screening method, device and equipment based on active learning and symbolic regression
CN115329269A (en) * 2022-07-01 2022-11-11 四川大学 Differentiable genetic programming symbol regression method
CN115392361A (en) * 2022-08-12 2022-11-25 中国平安财产保险股份有限公司 Intelligent sorting method and device, computer equipment and storage medium
CN116596574A (en) * 2023-06-07 2023-08-15 国网安徽省电力有限公司电力科学研究院 Power grid user portrait construction method and system

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9596196B1 (en) * 2013-10-17 2017-03-14 Amazon Technologies, Inc. Message grouping
CN105719253A (en) * 2016-01-20 2016-06-29 桂林电子科技大学 Kalman filtering phase unwrapping method having heapsort function in embedded manner
US20180137219A1 (en) * 2016-11-14 2018-05-17 General Electric Company Feature selection and feature synthesis methods for predictive modeling in a twinned physical system
CN109800801A (en) * 2019-01-10 2019-05-24 浙江工业大学 K-Means clustering lane method of flow based on Gauss regression algorithm
CN110415111A (en) * 2019-08-01 2019-11-05 信雅达系统工程股份有限公司 Merge the method for logistic regression credit examination & approval with expert features based on user data
CN113127864A (en) * 2019-12-31 2021-07-16 奇安信科技集团股份有限公司 Feature code extraction method and device, computer equipment and readable storage medium
CN112257892A (en) * 2020-08-27 2021-01-22 中国石油化工股份有限公司 Method for optimizing complex gas reservoir drainage gas recovery process system
CN112330060A (en) * 2020-11-25 2021-02-05 新智数字科技有限公司 Equipment fault prediction method and device, readable storage medium and electronic equipment
CN113111308A (en) * 2021-03-15 2021-07-13 华南理工大学 Symbolic regression method and system based on data-driven genetic programming algorithm
CN115329269A (en) * 2022-07-01 2022-11-11 四川大学 Differentiable genetic programming symbol regression method
CN115035966A (en) * 2022-08-09 2022-09-09 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Superconductor screening method, device and equipment based on active learning and symbolic regression
CN115392361A (en) * 2022-08-12 2022-11-25 中国平安财产保险股份有限公司 Intelligent sorting method and device, computer equipment and storage medium
CN116596574A (en) * 2023-06-07 2023-08-15 国网安徽省电力有限公司电力科学研究院 Power grid user portrait construction method and system

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HOSEONG JEONG 等: "Semantic Cluster Operator for Symbolic Regression and Its Applications", 《ADVANCES IN ENGINEERING SOFTWARE》, 8 July 2022 (2022-07-08), pages 1 - 22 *
KE SHI 等: "A Two-Stage Evolutionary Algorithm with Repair Strategy for Heat Component-Constrained Layout Optimization", 《ADVANCES IN SWARM INTELLIGENCE》, 8 July 2023 (2023-07-08), pages 401, XP047662242, DOI: 10.1007/978-3-031-36622-2_33 *
刘源 等: "数据驱动的钢铁耐磨材料性能预测研究综述", 《机械工程学报》, 28 February 2022 (2022-02-28), pages 31 - 50 *
彭茂君: "基于地理信息系统平台的城市电网空间负荷预测", 《中国优秀硕士学位论文全文数据库 工程科技II辑》, 15 February 2007 (2007-02-15), pages 042 - 263 *
邓钥丹 等: "数据驱动的镍基高温合金多目标优化设计与开发", 《铸造技术》, 18 May 2022 (2022-05-18), pages 351 - 356 *

Also Published As

Publication number Publication date
CN117435904B (en) 2024-03-15

Similar Documents

Publication Publication Date Title
CN111914873A (en) Two-stage cloud server unsupervised anomaly prediction method
CN112070125A (en) Prediction method of unbalanced data set based on isolated forest learning
US20060230018A1 (en) Mahalanobis distance genetic algorithm (MDGA) method and system
CN109858714B (en) Tobacco shred quality inspection index prediction method, device and system based on improved neural network
CN111080408A (en) Order information processing method based on deep reinforcement learning
CN111832101A (en) Construction method of cement strength prediction model and cement strength prediction method
CN115375031A (en) Oil production prediction model establishing method, capacity prediction method and storage medium
CN111723367A (en) Power monitoring system service scene disposal risk evaluation method and system
CN117349782B (en) Intelligent data early warning decision tree analysis method and system
CN104732067A (en) Industrial process modeling forecasting method oriented at flow object
CN114528949A (en) Parameter optimization-based electric energy metering abnormal data identification and compensation method
CN115409292A (en) Short-term load prediction method for power system and related device
CN115145901A (en) Multi-scale-based time series prediction method and system
CN114548591A (en) Time sequence data prediction method and system based on hybrid deep learning model and Stacking
CN115185804A (en) Server performance prediction method, system, terminal and storage medium
CN116340726A (en) Energy economy big data cleaning method, system, equipment and storage medium
CN110110447B (en) Method for predicting thickness of strip steel of mixed frog leaping feedback extreme learning machine
CN114580546A (en) Industrial pump fault prediction method and system based on federal learning framework
CN113672871A (en) High-proportion missing data filling method and related device
CN117435904B (en) Single feature ordering and composite feature extraction method
CN115618751B (en) Steel plate mechanical property prediction method
CN113033419A (en) Method and system for identifying equipment fault based on evolutionary neural network
CN115600926A (en) Post-project evaluation method and device, electronic device and storage medium
CN112749211B (en) Novel tea yield prediction method based on electric power big data
CN113743453A (en) Population quantity prediction method based on random forest

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant