WO2021259003A1 - Procédé et appareil de reconnaissance de caractéristiques, et dispositif informatique et support de stockage - Google Patents

Procédé et appareil de reconnaissance de caractéristiques, et dispositif informatique et support de stockage Download PDF

Info

Publication number
WO2021259003A1
WO2021259003A1 PCT/CN2021/096980 CN2021096980W WO2021259003A1 WO 2021259003 A1 WO2021259003 A1 WO 2021259003A1 CN 2021096980 W CN2021096980 W CN 2021096980W WO 2021259003 A1 WO2021259003 A1 WO 2021259003A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
candidate
features
meta
training
Prior art date
Application number
PCT/CN2021/096980
Other languages
English (en)
Chinese (zh)
Inventor
孔清扬
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021259003A1 publication Critical patent/WO2021259003A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • This application relates to the technical field of artificial intelligence, and in particular to a feature recognition method, device, computer equipment, and storage medium.
  • automated machine learning is a major development direction.
  • the goal of automated machine learning is to use automated data-driven methods to make the aforementioned decisions.
  • the automatic machine learning system automatically determines the best solution. Domain experts no longer need to worry about learning various machine learning algorithms.
  • the most important part of automated machine learning is automated feature engineering.
  • the automatic feature construction tool featuretools uses a method called Deep Feature Synthesis (Deep Feature Synthesis, DFS) algorithm.
  • DFS algorithm is an automatic method for performing feature engineering on relational data and time data. It generates comprehensive features through operations (including sum, average, and count) applied to the data. But after the combination of violent features, it caused a dimensional disaster, which not only failed to improve the accuracy of the model, but also reduced the learning ability of the model. Therefore, how to select the features that are truly effective for machine learning from the constructed features is an urgent problem to be solved.
  • the main purpose of this application is to provide a feature recognition method, device, computer equipment and storage medium to solve the problem of how to select effective features from the structured features.
  • this application provides a feature recognition method, which includes the following steps:
  • Each of the first meta features is input into a probability model, and the probability that each of the first meta features is a preset label is calculated as the target probability of each candidate feature being a preset label; wherein, The probability model is trained based on the random forest model;
  • Each of the evaluation values is compared with a second preset threshold value, and if the evaluation value is greater than the second preset threshold value, it is determined that the candidate feature corresponding to the evaluation value is a valid feature.
  • This application also provides a feature recognition device, including:
  • the first acquiring unit is configured to acquire multiple original features, and generate multiple candidate features from the multiple original features according to a preset candidate feature generation method
  • a first generating unit configured to generate a corresponding first meta feature for each of the candidate features according to a preset first meta feature generation method
  • the first calculation unit is configured to input each of the first meta features into a probability model, and calculate the probability that each of the first meta features is a preset label, as the probability that each of the candidate features is a preset label Target probability
  • the comparing unit is configured to compare the target probability of each candidate feature with a first preset threshold, and compose a candidate feature set for all the candidate features whose target probability is greater than or equal to the first preset threshold ;
  • a second calculation unit configured to combine each candidate feature in the candidate feature set with a plurality of original features to calculate an evaluation value of each candidate feature
  • the determining unit is configured to compare each of the evaluation values with a second preset threshold value, and if the evaluation value is greater than the second preset threshold value, determine that the candidate feature corresponding to the evaluation value is a valid feature.
  • the present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, the steps of the above-mentioned feature recognition method are implemented:
  • Each of the first meta features is input into a probability model, and the probability that each of the first meta features is a preset label is calculated as the target probability of each candidate feature being a preset label; wherein, The probability model is trained based on the random forest model;
  • Each of the evaluation values is compared with a second preset threshold value, and if the evaluation value is greater than the second preset threshold value, it is determined that the candidate feature corresponding to the evaluation value is a valid feature.
  • This application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of a feature recognition method are realized:
  • Each of the first meta features is input into a probability model, and the probability that each of the first meta features is a preset label is calculated as the target probability of each candidate feature being a preset label; wherein, The probability model is trained based on the random forest model;
  • Each of the evaluation values is compared with a second preset threshold value, and if the evaluation value is greater than the second preset threshold value, it is determined that the candidate feature corresponding to the evaluation value is a valid feature.
  • the original feature generates candidate features according to the preset candidate feature generation method, and then the corresponding first meta feature is generated from the candidate feature, and the first meta feature is input to pre-training
  • the completed probability model calculates the probability that each candidate feature is a preset label, composes candidate features whose probability is greater than or equal to the first preset threshold to form a candidate feature set, and combines each candidate feature in the candidate feature set with all original features
  • the evaluation value of each candidate feature is collectively calculated, and the candidate feature whose evaluation value is greater than the first preset threshold is an effective feature.
  • This application combines meta-learning to evaluate the constructed feature, and selects the feature that is effective for machine learning.
  • FIG. 1 is a schematic diagram of the steps of a feature recognition method in an embodiment of the present application
  • Fig. 2 is a structural block diagram of a feature recognition device in an embodiment of the present application
  • FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.
  • an embodiment of the present application provides a feature recognition method, including the following steps:
  • Step S1 Obtain multiple original features, and generate multiple candidate features according to a preset candidate feature generation method for the multiple original features;
  • Step S2 generating a corresponding first meta feature for each of the candidate features according to a preset first meta feature generation method
  • Step S3 input each of the first meta features into a probability model, and calculate the probability that each of the first meta features is a preset label as the target probability of each of the candidate features being the preset label; where , The probability model is trained based on a random forest model;
  • Step S4 comparing the target probability of each candidate feature with a first preset threshold value, and composing a candidate feature set for all the candidate features whose target probability is greater than or equal to the first preset threshold value;
  • Step S5 combining each candidate feature in the candidate feature set with a plurality of original features to calculate an evaluation value of each candidate feature
  • Step S6 comparing each of the evaluation values with a second preset threshold value, and if the evaluation value is greater than the second preset threshold value, it is determined that the candidate feature corresponding to the evaluation value is a valid feature.
  • Feature engineering is essentially an engineering activity whose purpose is to extract features from raw data to the maximum extent for use by algorithms and models.
  • features are only mechanically combined and merged. The new features generated from this may not necessarily improve the accuracy of the model. Therefore, it is necessary to select features that are effective for machine learning from the generated new features.
  • the above-mentioned original features are some basic features, and each of the original features is generated according to a pre-established candidate feature generation method to generate a corresponding candidate feature.
  • the pre-established candidate feature generation method may include One or more of the following:
  • Feature transformation Convert the continuous feature and time feature in the original feature into discrete features, divide the feature value range of the numerical feature into multiple equal segments, and convert the time feature into week number, month, year, whether it is a weekend Etc., or adapt the range of continuous features to a specific distribution, such as using normalization to normalize numeric variables to [0,1];
  • Feature combination Combine two features into a new feature, such as two continuous variables using addition, subtraction, multiplication and division operations, and two categorical variables are combined;
  • High-level feature intersection generate a new feature from no less than two features, such as maximum aggregation, minimum aggregation, average aggregation, variance aggregation and number aggregation.
  • each of the candidate features generated above is generated corresponding to the first meta feature, and the preset first meta feature generation method is used to generate the following Two kinds of meta information:
  • Entropy and statistical test of candidate features Divide all original features into three subgroups according to the original feature types, namely discrete, numeric, and date-time, and use chi-square test and t-test to calculate each The correlation between each group and the candidate feature, and the measurement of the candidate feature based on entropy is also calculated;
  • each of the first meta features is input to a probability model, and the probability model is pre-trained based on the random forest model.
  • the trained probability model can calculate each of the first meta features.
  • the probability that the feature is a preset label is used as the target probability of each candidate feature being a preset label, and the preset label can be "good” or "bad".
  • the first preset threshold is a value defined by the user and lies between [0, 1].
  • a preset threshold is set to 0.5, and all candidate features corresponding to target probabilities greater than or equal to the first preset threshold are formed into a candidate feature set.
  • each candidate feature in the candidate feature set is combined with multiple original features to calculate the evaluation value of each candidate feature;
  • AUC Absolute Under the Curve, model evaluation index
  • accuracy can be used as the evaluation value of the candidate feature
  • the AUC value is the area under the ROC (Receiver Operating Characteristic, susceptibility curve) curve, and the AUC value is used as the evaluation value of the candidate feature.
  • Accuracy refers to the percentage of results that are predicted correctly.
  • the evaluation value corresponding to each candidate feature is compared with a second preset threshold.
  • the second preset threshold can be preset by the user or determined by all the original features.
  • the evaluation value of a candidate feature is greater than the second preset threshold, it is determined that the candidate feature is a valid feature. Since each candidate feature is combined with all the original features, the second preset threshold can be determined by all The original feature is determined.
  • the evaluation value is greater than the second preset threshold, indicating that the addition of the candidate feature can be beneficial to the use of the algorithm or model.
  • the candidate feature formed by the original feature structure is first input into the probability model to calculate the target probability of the preset label, and the candidate feature greater than or equal to the first preset threshold is selected according to the target probability, and the selected candidate The feature is combined with all the original features to calculate the respective evaluation value, and the candidate feature whose evaluation value is greater than the second preset threshold is used as the effective feature.
  • This application combines meta-learning to quickly and effectively select from the constructed features that are effective for machine learning Characteristics.
  • each of the first meta features is input into a probability model, and the probability that each of the first meta features is a preset label is calculated, as each candidate feature is a preset label Before step S3 of the target probability, it also includes:
  • Step S3a Obtain multiple labeled training sets, the labeled training set containing multiple original training features
  • Step S3b generating multiple candidate training features from multiple original training features according to the preset candidate feature generating method
  • Step S3c Generate a corresponding second meta feature for each of the labeled training sets according to a preset second meta feature generation method, and generate a plurality of candidate training features corresponding to the preset first meta feature generation method.
  • Step S3d assign a label to each candidate training feature
  • Step S3e Combine the second meta-features corresponding to the multiple labeled training sets and the third meta-features corresponding to each candidate feature to generate a new training data set; input the new training data set to Training is performed in the random forest model, so that the output result of the random forest model is the probability of the label, and the probability model of the completed training is obtained.
  • step S3a multiple labeled training sets are obtained, the labeled training sets are obtained from a data set storage library, and each labeled training set contains multiple original training features.
  • step S3b the original training features of each labeled training set are generated according to the preset candidate feature generation method to generate multiple candidate features, and the preset candidate feature generation method is specifically as described in the previous embodiment.
  • each of the marked training sets is generated according to the preset second meta-feature generation method to generate corresponding second meta-features
  • each of the marked training sets is generated with a second meta-feature, according to the preset second meta-features
  • the first meta feature generation method generates the third meta feature corresponding to a plurality of candidate training features
  • the third meta feature includes the two types of meta information described in the previous embodiment.
  • the preset second meta feature generating method is used to generate the following four types of meta information:
  • General information Analyze the general statistical information of the marked training set, such as the number of original training features, the statistical information of the data size, and other statistical information of the original training features;
  • Initial evaluation Statistics of the current performance when the learning algorithm is applied to the original training features.
  • the generated meta-information includes the defined evaluation mechanism and running time.
  • the evaluation mechanism can include AUC or accuracy;
  • Feature diversity After dividing all original training features into type-based subgroups, use chi-square test and t-test to calculate the similarity of each pair of original training features in the group.
  • a label is assigned to each candidate training feature.
  • the label can be "good” or “bad” to indicate whether the candidate training feature can improve the performance of the learner, and if it can improve the performance of the learner At the time, the label of the candidate training feature is "good".
  • step S3e As described in step S3e above, generate a new training data set from the second meta-features corresponding to a plurality of the labeled training sets and the third meta-features corresponding to one candidate training feature, and how many third meta-features there are How many new training data sets will be formed, and then input the new training data set into the random forest model for training, so that the output result of the random forest model is the probability of the label, when each new training After all the data sets are input into the random forest model for iterative training, the probability model of the completed training is obtained.
  • the step S3d of assigning a label to each candidate feature includes:
  • Step S3d1 input a plurality of the labeled training sets into a learner, and calculate a first learning value of the plurality of the labeled training sets;
  • Step S3d2 combining a plurality of the labeled training sets and each candidate training feature into a data set
  • Step S3d3 input each of the data sets into the learner, and calculate the respective second learning value of each of the data sets;
  • Step S3d4 comparing the respective second learning value of each data set with the first learning value, and if the second learning value of the data set is greater than the first learning value, it is the data set
  • the added candidate training feature is assigned a first type label.
  • a plurality of the labeled training sets are input to a learner, and the first learning values of the plurality of labeled training sets are calculated.
  • the learner is a model evaluation standard, which is used to calculate the learning value of the input data set, and the learner can calculate AUC or accuracy.
  • step S3d2 combine multiple labeled training sets and one candidate training feature to form a new data set.
  • candidate training features are combined into as many data sets, each data set has Including all labeled training sets.
  • each of the data sets is input into the learner, and the second learning value of each new data set is calculated, and each candidate training feature is performed with all the labeled training sets.
  • the second learning value can be regarded as the second learning value of the candidate training feature.
  • step S3d4 compare each of the second learning values with the first learning values. If the second learning value is greater than the first learning value, it indicates that the candidate training feature can be improved
  • the candidate training feature corresponding to the second learning value is assigned a first type label.
  • the first type label may be "good", when the second learning value is less than or equal to the first learning value
  • a second type of label is assigned to the candidate feature corresponding to the second learning value, and the second type of label may be "bad".
  • the step S5 of combining each of the candidate features in the candidate feature set with a plurality of the original features to calculate the evaluation value of each of the candidate features includes:
  • Step S501 compose a first target feature set into each of the multiple original features and each candidate feature in the candidate feature set;
  • Step S502 Calculate the AUC value of the first target feature set, and use the AUC value of the first target feature set as the evaluation value of the candidate feature.
  • step S501 a plurality of the original features and each candidate feature in the candidate feature set are combined to form a first target feature set, and how many candidate features there are in the candidate feature set are composed How many first target feature sets.
  • the AUC value of the first target feature set is calculated.
  • Each first target feature set includes a candidate feature and all original features, and the AUC value of the first target feature set is taken as the The evaluation value of the candidate feature.
  • the step S5 of combining each of the candidate features in the candidate feature set with a plurality of the original features to calculate the evaluation value of each of the candidate features includes:
  • Step S50a combining a plurality of the original features and each candidate feature in the candidate feature set to form a second target feature set
  • Step S50b calculating the AUC value and accuracy of each of the second target feature sets
  • step S50a a plurality of the original features and one candidate feature in the candidate feature set are combined to form a second target feature set, and the number of candidate features in the candidate feature set constitutes as many second target features.
  • Target feature set, each second target feature set includes all original features and candidate features in a candidate feature set.
  • each second target feature set includes all original features and candidate features in a candidate feature set
  • each second target feature set includes all original features and candidate features in a candidate feature set.
  • the candidate features in the target feature set are different, so the AUC value and accuracy of the second target feature set can be used as the basis for calculating the evaluation value of the candidate feature.
  • the above formula is used to calculate the evaluation value of the candidate feature.
  • the above formula combines the original feature and calculates the evaluation value of the candidate feature from both the AUC value and the accuracy, so as to improve the accuracy of the evaluation value of the candidate feature .
  • an embodiment of the present application also provides a feature recognition device, including:
  • the first acquiring unit 10 is configured to acquire multiple original features, and generate multiple candidate features from the multiple original features according to a preset candidate feature generation method
  • the first generating unit 20 is configured to generate a corresponding first meta feature for each of the candidate features according to a preset first meta feature generation method
  • the first calculation unit 30 is configured to input each of the first meta features into the probability model, and calculate the probability that each of the first meta features is a preset label, as each candidate feature is a preset label Target probability;
  • the comparing unit 40 is configured to compare the target probability of each candidate feature with a first preset threshold value, and compose all the candidate features whose target probability is greater than or equal to the first preset threshold value into candidate features set;
  • the second calculation unit 50 is configured to combine each candidate feature in the candidate feature set with a plurality of original features to calculate an evaluation value of each candidate feature;
  • the determining unit 60 is configured to compare each of the evaluation values with a second preset threshold value, and if the evaluation value is greater than the second preset threshold value, determine that the candidate feature corresponding to the evaluation value is a valid feature .
  • the feature recognition device further includes:
  • the second acquiring unit is configured to acquire multiple labeled training sets, where the labeled training set contains multiple original training features;
  • the second generating unit is configured to generate multiple candidate training features from the multiple original training features according to the preset candidate feature generating method
  • the third generating unit is configured to generate a corresponding second meta feature for each of the labeled training sets according to a preset second meta feature generation method, and train a plurality of the candidates according to the preset first meta feature generation method Feature generation corresponding to the third element feature;
  • An allocation unit configured to allocate a label to each candidate training feature
  • the training unit is configured to combine the second meta-features corresponding to a plurality of the labeled training sets and the third meta-features corresponding to each candidate feature to generate a new training data set; and to combine the new training data set Input to the random forest model for training, so that the output result of the random forest model is the probability of the label, and the probability model of the completed training is obtained.
  • the allocating unit includes:
  • the first calculation subunit is configured to input a plurality of the marked training sets into a learner, and calculate the first learning value of the plurality of the marked training sets;
  • a combination subunit configured to combine a plurality of the labeled training sets and each of the candidate training features into a data set
  • the second calculation subunit is used to input each of the data sets into the learner, and calculate the respective second learning value of each of the data sets;
  • the allocation subunit is used to compare the respective second learning value of each data set with the first learning value. If the second learning value of the data set is greater than the first learning value, it is The candidate training features added in the data set are assigned a first type label.
  • the second calculation unit 50 includes:
  • the first composition subunit is used to compose a plurality of the original features and each candidate feature in the candidate feature set into a first target feature set;
  • the third calculation subunit is configured to calculate the AUC value of the first target feature set, and use the AUC value of the first target feature set as the evaluation value of the candidate feature.
  • the second calculation unit 50 includes:
  • the second composition subunit is used to compose a plurality of the original features and each candidate feature in the candidate feature set into a second target feature set;
  • the fourth calculation subunit is used to calculate the AUC value and accuracy of each of the second target feature sets
  • M ak 1 +bk 2 ; wherein, a is the AUC value of the second target feature, and b is the second target feature.
  • the k 1 is the weight of the AUC value of the second target feature, and the k 2 is the weight of the accuracy of the second target feature.
  • an embodiment of the present application also provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 3.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer equipment is used to store the original feature data, the first meta feature data, and so on.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize a feature recognition method.
  • the above-mentioned processor executes the steps of the above-mentioned feature recognition method:
  • Each of the first meta features is input into a probability model, and the probability that each of the first meta features is a preset label is calculated as the target probability of each candidate feature being a preset label; wherein, The probability model is trained based on the random forest model;
  • Each of the evaluation values is compared with a second preset threshold value, and if the evaluation value is greater than the second preset threshold value, it is determined that the candidate feature corresponding to the evaluation value is a valid feature.
  • the processor executes the input of each of the first meta features into the probability model, and calculates the probability that each of the first meta features is a preset label, as each candidate feature Before the step of presetting the target probability of the label, it also includes:
  • the above-mentioned processor executing the step of assigning a label to each candidate feature includes:
  • Input a plurality of the labeled training sets into a learner, and calculate a first learning value of the plurality of the labeled training sets;
  • the respective second learning value of each data set is compared with the first learning value. If the second learning value of the data set is greater than the first learning value, then it is all added to the data set.
  • the candidate training features are assigned the first category of labels.
  • the above-mentioned processor executing the step of combining each of the candidate features in the candidate feature set with a plurality of the original features to calculate the evaluation value of each of the candidate features includes:
  • the AUC value of the first target feature set is calculated, and the AUC value of the first target feature set is used as the evaluation value of the candidate feature.
  • the above-mentioned processor executing the step of combining each of the candidate features in the candidate feature set with a plurality of the original features to calculate the evaluation value of each of the candidate features includes:
  • M ak 1 +bk 2 ; wherein, a is the AUC value of the second target feature, b is the accuracy of the second target feature, and k 1 is the weight of the AUC value of the second target feature, and the k 2 is the weight of the accuracy of the second target feature.
  • FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the above-mentioned storage medium may be a non-volatile storage medium or a volatile storage medium.
  • a computer program is stored thereon, and when the computer program is executed by the processor, a feature recognition method is realized, specifically:
  • Each of the first meta features is input into a probability model, and the probability that each of the first meta features is a preset label is calculated as the target probability of each candidate feature being a preset label; wherein, The probability model is trained based on the random forest model;
  • Each of the evaluation values is compared with a second preset threshold value, and if the evaluation value is greater than the second preset threshold value, it is determined that the candidate feature corresponding to the evaluation value is a valid feature.
  • the processor executes the input of each of the first meta features into the probability model, and calculates the probability that each of the first meta features is a preset label, as each candidate feature Before the step of presetting the target probability of the label, it also includes:
  • the above-mentioned processor executing the step of assigning a label to each candidate feature includes:
  • Input a plurality of the labeled training sets into a learner, and calculate a first learning value of the plurality of the labeled training sets;
  • the respective second learning value of each data set is compared with the first learning value. If the second learning value of the data set is greater than the first learning value, then it is all added to the data set.
  • the candidate training features are assigned the first category of labels.
  • the above-mentioned processor executing the step of combining each of the candidate features in the candidate feature set with a plurality of the original features to calculate the evaluation value of each of the candidate features includes:
  • the AUC value of the first target feature set is calculated, and the AUC value of the first target feature set is used as the evaluation value of the candidate feature.
  • the above-mentioned processor executing the step of combining each of the candidate features in the candidate feature set with a plurality of the original features to calculate the evaluation value of each of the candidate features includes:
  • M ak 1 +bk 2 ; wherein, a is the AUC value of the second target feature, b is the accuracy of the second target feature, and k 1 is the weight of the AUC value of the second target feature, and the k 2 is the weight of the accuracy of the second target feature.
  • the original feature generates candidate features according to the preset candidate feature generation method, and then the corresponding first meta feature is generated from the candidate feature ,
  • the first meta feature is input into the pre-trained probability model to calculate the probability that each candidate feature is a preset label, the candidate features whose probability of the candidate feature is greater than or equal to the first preset threshold are formed into a candidate feature set, and the candidate features are collected
  • the evaluation value of each candidate feature is calculated for each candidate feature and all the original feature sets.
  • the candidate feature whose evaluation value is greater than the first preset threshold is the effective feature.
  • This application combines meta-learning to evaluate the constructed feature and select the right Features that are effective for machine learning.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un procédé et un appareil de reconnaissance de caractéristiques, ainsi qu'un dispositif informatique et un support de stockage. Le procédé comprend : la génération d'une pluralité de caractéristiques candidates à partir d'une pluralité de caractéristiques d'origine selon une méthode de génération de caractéristiques candidates prédéfinie ; la génération d'une première méta-caractéristique correspondante à partir de chaque caractéristique candidate selon une méthode de génération de première méta-caractéristique prédéfinie ; l'introduction de la première méta-caractéristique dans un modèle de probabilité, de façon à obtenir une probabilité cible que chaque caractéristique candidate soit une étiquette prédéfinie ; la comparaison de la probabilité cible à une première valeur seuil prédéfinie, toutes les caractéristiques candidates dont les probabilités cibles sont supérieures ou égales à la première valeur seuil prédéfinie constituant un ensemble de caractéristiques candidates ; la combinaison des caractéristiques candidates figurant dans l'ensemble de caractéristiques candidates à la pluralité de caractéristiques d'origine de façon à calculer des valeurs d'évaluation des caractéristiques candidates ; et la comparaison des valeurs d'évaluation à une seconde valeur seuil prédéfinie, des caractéristiques candidates, dont les valeurs d'évaluation qui leur correspondent sont supérieures à la seconde valeur seuil prédéfinie, étant des caractéristiques efficaces. Au moyen du procédé et de l'appareil de reconnaissance de caractéristiques, et du dispositif informatique et du support de stockage décrits dans la présente invention, des caractéristiques efficaces pour l'apprentissage automatique sont sélectionnées à partir de caractéristiques construites.
PCT/CN2021/096980 2020-06-23 2021-05-28 Procédé et appareil de reconnaissance de caractéristiques, et dispositif informatique et support de stockage WO2021259003A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010583878.8A CN111832631A (zh) 2020-06-23 2020-06-23 特征识别方法、装置、计算机设备和存储介质
CN202010583878.8 2020-06-23

Publications (1)

Publication Number Publication Date
WO2021259003A1 true WO2021259003A1 (fr) 2021-12-30

Family

ID=72898031

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096980 WO2021259003A1 (fr) 2020-06-23 2021-05-28 Procédé et appareil de reconnaissance de caractéristiques, et dispositif informatique et support de stockage

Country Status (2)

Country Link
CN (1) CN111832631A (fr)
WO (1) WO2021259003A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114764603A (zh) * 2022-05-07 2022-07-19 支付宝(杭州)信息技术有限公司 针对用户分类模型、业务预测模型确定特征的方法及装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111832631A (zh) * 2020-06-23 2020-10-27 平安科技(深圳)有限公司 特征识别方法、装置、计算机设备和存储介质
CN112286980B (zh) * 2020-12-03 2021-08-17 北京口袋财富信息科技有限公司 一种基于用户行为的信息推送方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105931224A (zh) * 2016-04-14 2016-09-07 浙江大学 基于随机森林算法的肝脏平扫ct图像病变识别方法
CN109241418A (zh) * 2018-08-22 2019-01-18 中国平安人寿保险股份有限公司 基于随机森林的异常用户识别方法及装置、设备、介质
US10284585B1 (en) * 2016-06-27 2019-05-07 Symantec Corporation Tree rotation in random classification forests to improve efficacy
CN111832631A (zh) * 2020-06-23 2020-10-27 平安科技(深圳)有限公司 特征识别方法、装置、计算机设备和存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105931224A (zh) * 2016-04-14 2016-09-07 浙江大学 基于随机森林算法的肝脏平扫ct图像病变识别方法
US10284585B1 (en) * 2016-06-27 2019-05-07 Symantec Corporation Tree rotation in random classification forests to improve efficacy
CN109241418A (zh) * 2018-08-22 2019-01-18 中国平安人寿保险股份有限公司 基于随机森林的异常用户识别方法及装置、设备、介质
CN111832631A (zh) * 2020-06-23 2020-10-27 平安科技(深圳)有限公司 特征识别方法、装置、计算机设备和存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114764603A (zh) * 2022-05-07 2022-07-19 支付宝(杭州)信息技术有限公司 针对用户分类模型、业务预测模型确定特征的方法及装置

Also Published As

Publication number Publication date
CN111832631A (zh) 2020-10-27

Similar Documents

Publication Publication Date Title
Burridge Information preserving statistical obfuscation
Zehlike et al. Fair Top-k Ranking with multiple protected groups
Hien et al. A decision support system for evaluating international student applications
WO2021259003A1 (fr) Procédé et appareil de reconnaissance de caractéristiques, et dispositif informatique et support de stockage
Benedetto et al. The creation and use of the SIPP Synthetic Beta
EP4075281A1 (fr) Procédé de test de programme basé sur réseau neuronal artificiel et système de test, et application
CN112184089B (zh) 试题难度预测模型的训练方法、装置、设备及存储介质
CN115879748B (zh) 一种基于大数据的企业信息化管理集成平台
Lengler et al. The (1+ 1)-EA on noisy linear functions with random positive weights
WO2023134072A1 (fr) Procédé et appareil de génération de modèle de prédiction de défaut, dispositif, et support de stockage
US20230072297A1 (en) Knowledge graph based reasoning recommendation system and method
CN114781532A (zh) 机器学习模型的评估方法、装置、计算机设备及介质
US20210326475A1 (en) Systems and method for evaluating identity disclosure risks in synthetic personal data
Yet et al. Estimating criteria weight distributions in multiple criteria decision making: a Bayesian approach
WO2021212654A1 (fr) Procédé et appareil d'acquisition de modèles d'attribution de ressources de machines physiques, et dispositif informatique
Li et al. A study of genetic algorithm for project selection for analogy based software cost estimation
CN113761193A (zh) 日志分类方法、装置、计算机设备和存储介质
CN115222081A (zh) 学位资源预测方法、装置及计算机设备
CN113516189B (zh) 基于两阶段随机森林算法的网站恶意用户预测方法
WO2022217712A1 (fr) Appareil et procédé d'exploration de données, et dispositif informatique et support de stockage
Esmaeilzadeh et al. InfoMoD: Information-theoretic Model Diagnostics
Ji et al. A two-stage feature weighting method for naive Bayes and its application in software defect prediction
Kiekhaefer Simulation ranking and selection procedures and applications in network reliability design
CN116227995B (zh) 一种基于机器学习的指标分析方法及系统
Dell’Anno Estimating Corruption Around the World: An analysis by Partial Least Square-Structural Equation Modelling

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21830143

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21830143

Country of ref document: EP

Kind code of ref document: A1