CN107766946B - Method and system for generating combined features of machine learning samples - Google Patents

Method and system for generating combined features of machine learning samples Download PDF

Info

Publication number
CN107766946B
CN107766946B CN201710898898.2A CN201710898898A CN107766946B CN 107766946 B CN107766946 B CN 107766946B CN 201710898898 A CN201710898898 A CN 201710898898A CN 107766946 B CN107766946 B CN 107766946B
Authority
CN
China
Prior art keywords
features
combined
feature
machine learning
combination
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710898898.2A
Other languages
Chinese (zh)
Other versions
CN107766946A (en
Inventor
戴文渊
杨强
陈雨强
张舒羽
栾淑君
孙迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN201710898898.2A priority Critical patent/CN107766946B/en
Priority to CN202010658034.5A priority patent/CN111797998B/en
Publication of CN107766946A publication Critical patent/CN107766946A/en
Application granted granted Critical
Publication of CN107766946B publication Critical patent/CN107766946B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

A method and system for generating combined features of machine learning samples is provided. The method comprises the following steps: (A) acquiring unit features which can be combined; (B) providing a graphical interface for setting feature combination configuration items for defining how feature combinations are to be made between unit features to a user; (C) receiving input operation executed on a graphical interface by a user for setting a feature combination configuration item, and acquiring the feature combination configuration item set by the user according to the input operation; and (D) combining the features to be combined in the unit features based on the acquired feature combination configuration items to generate combined features of the machine learning samples. According to the method and the system, the user can realize automatic feature combination only by setting the relevant configuration items for limiting how to perform the feature combination through the interactive interface, so that the user experience is improved, and the effect of the machine learning model is also improved.

Description

Method and system for generating combined features of machine learning samples
Technical Field
The present invention relates generally to the field of artificial intelligence, and more particularly, to a method and system for generating combined features of machine learning samples.
Background
At present, the basic process of training the machine learning model mainly includes:
1. importing a data set (e.g., a data table) containing historical data records;
2. completing feature engineering, wherein, by performing various processing on the attribute information of the data records in the data set to obtain various features (for example, combined features can be included), a feature vector formed by the features can be used as a machine learning sample;
3. and training a model, wherein the model is learned based on the machine learning samples obtained through the feature engineering according to a set machine learning algorithm (such as a logistic regression algorithm, a decision tree algorithm, a neural network algorithm and the like).
In the above process, the process of generating features is important, and it affects the quality of the model. Each data record in the data table may include a plurality of attribute information (i.e., fields), and the characteristics may indicate the result of various field processing (or operations), such as the fields themselves, or parts of the fields, or combinations of the fields, so as to better reflect the data distribution and the inherent association and potential meaning between the fields. Taking the data mining field as an example, on the basis of accurate feature extraction, different combinations can be performed among features to help the learning process to better refine the data rules, and intrinsic association and potential meaning in the data distribution are dialyzed from multiple angles. The quality of the characteristic engineering directly determines the accuracy of the machine learning problem description, and further influences the quality of the model.
On the existing machine learning platform, an interactive mode based on a graphical interface can be adopted to complete a machine learning model training process, and a user does not need to write program codes personally. However, in the feature engineering link, a manually set feature combination mode is often manually input into the platform system. That is, the user needs to acquire a specific feature combination manner in advance, and cannot effectively implement automatic feature combination by means of the platform.
In addition, in order to obtain a feature combination mode in advance, a user needs to understand a service scene deeply, that is, the user manually combines features by means of service experience, and generally in the machine learning process, the data volume of used data is large, the user sometimes cannot analyze the data comprehensively, so that some invalid combination features are formulated, in order to improve the effect of the combination features, the user needs to make continuous attempts, and when the user faces large data volume and high-dimensional features, the work needs to take a long time. In this case, not only the workload is increased, but also the work efficiency is reduced.
Disclosure of Invention
An exemplary embodiment of the present invention is to provide a method and a system for generating combined features of machine learning samples, so as to solve the problem in the prior art that automatic feature combination cannot be conveniently performed in a machine learning system.
According to an exemplary embodiment of the invention, there is provided a method of generating combined features of machine learning samples, comprising: (A) acquiring unit features which can be combined; (B) providing a graphical interface for setting feature combination configuration items for defining how feature combinations are to be made between unit features to a user; (C) receiving input operation executed on a graphical interface by a user for setting a feature combination configuration item, and acquiring the feature combination configuration item set by the user according to the input operation; and (D) combining the features to be combined in the unit features based on the acquired feature combination configuration items to generate combined features of the machine learning samples.
Optionally, the feature combination configuration item comprises at least one of: a feature configuration item for specifying features to be combined among the unit features so that the specified features to be combined are combined in step (D); an evaluation index configuration item, which is used for specifying the evaluation index of the combined features, so that the effect of the machine learning model corresponding to various combined features is measured according to the specified evaluation index in the step (D) to determine the combination mode of the features to be combined; and (D) a training parameter configuration item used for appointing the training parameters of the machine learning model, so that the combination mode of the features to be combined is determined by measuring the effect of the machine learning model corresponding to various combination features obtained under the appointed training parameters in the step (D).
Optionally, the feature combination configuration item further includes: and (D) a bucket operation configuration item for specifying one or more bucket operations to be performed on at least one continuous feature among the features to be combined, so that the specified one or more bucket operations are performed on the at least one continuous feature in the step (D) to obtain corresponding one or more bucket features, and the obtained bucket features are combined with other features to be combined as a whole.
Optionally, the bucket operation configuration item is used for respectively specifying one or more kinds of bucket operations for each continuous feature; or the bucket operation configuration item is used for uniformly appointing one or more bucket operations aiming at all continuous characteristics.
Optionally, the method further comprises: (E) the generated combined features are displayed to a user.
Optionally, in step (E), an evaluation value of each combined feature with respect to the evaluation index is also displayed to the user.
Optionally, the method further comprises: (F) the generated combined features are directly applied to subsequent machine learning steps.
Optionally, the method further comprises: (G) the combined features selected by the user from the displayed combined features are applied to a subsequent machine learning step.
Optionally, the method further comprises: (H) and (D) storing the combination mode of the combination characteristics generated in the step (D) in a configuration file.
Optionally, the method further comprises: (I) and (G) storing the combination mode of the combination characteristics selected by the user in the step (G) in the form of a configuration file.
Alternatively, in step (a), the unit feature is obtained by performing feature processing on the attribute information of the data record.
According to another exemplary embodiment of the invention, there is provided a system for generating combined features of machine learning samples, comprising: unit feature acquisition means for acquiring unit features that can be combined; display means for providing a user with a graphical interface for setting a feature combination configuration item for defining how feature combinations are to be made between unit features; configuration item acquisition means for receiving an input operation performed by a user on a graphical interface in order to set a feature combination configuration item, and acquiring the feature combination configuration item set by the user according to the input operation; and the combined feature generation device is used for combining the features to be combined in the unit features based on the acquired feature combination configuration items so as to generate the combined features of the machine learning samples.
Optionally, the feature combination configuration item comprises at least one of: the characteristic configuration item is used for appointing the characteristics to be combined in the unit characteristics, so that the combined characteristic generating device combines the appointed characteristics to be combined; the evaluation index configuration item is used for specifying the evaluation indexes of the combined features, so that the combined feature generation device can measure the effects of the machine learning models corresponding to various combined features according to the specified evaluation indexes to determine the combination mode of the features to be combined; and the training parameter configuration item is used for appointing the training parameters of the machine learning model, so that the combined feature generation device determines the combination mode of the features to be combined by measuring the effect of the machine learning model corresponding to various combined features obtained under the appointed training parameters.
Optionally, the feature combination configuration item further includes: and the barrel operation configuration item is used for appointing one or more barrel operation to be executed on at least one continuous feature in the features to be combined respectively, so that the combined feature generation device executes the appointed one or more barrel operation on the at least one continuous feature respectively to obtain one or more corresponding barrel characteristics, and combines the obtained barrel characteristics with other features to be combined as a whole.
Optionally, the bucket operation configuration item is used for respectively specifying one or more kinds of bucket operations for each continuous feature; or the bucket operation configuration item is used for uniformly appointing one or more bucket operations aiming at all continuous characteristics.
Optionally, the display means also displays the generated combined feature to the user.
Optionally, the display means further displays the evaluation value of each generated combined feature with respect to the evaluation index to the user.
Optionally, the system further comprises: and the application device is used for directly applying the generated combined features to the subsequent machine learning step.
Optionally, the system further comprises: and the application device is used for applying the combined feature selected by the user from the displayed combined features to the subsequent machine learning step.
Optionally, the system further comprises: and a storage device for storing the combination mode of the combination features generated by the combination feature generation device in the form of a configuration file.
Optionally, the system further comprises: and the storage device is used for storing the combination mode of the combination characteristics selected by the user from the displayed combination characteristics in the form of a configuration file.
Alternatively, the unit feature obtaining means obtains the unit feature by performing feature processing on the attribute information of the data record.
According to another exemplary embodiment of the present invention, a computer-readable medium for generating combined features of machine-learned samples is provided, wherein a computer program for performing the method of generating combined features of machine-learned samples as described above is recorded on the computer-readable medium.
According to another exemplary embodiment of the present invention, a computing apparatus for generating combined features of machine-learned samples is provided, comprising a storage component and a processor, wherein the storage component has stored therein a set of computer-executable instructions which, when executed by the processor, perform the method of generating combined features of machine-learned samples as described above.
According to the method and the system for generating the combined features of the machine learning samples, a convenient, efficient and interactive friendly feature combination process is provided, a user can realize automatic feature combination only by setting related configuration items for limiting how to perform feature combination through an interactive interface, and not only is user experience improved, but also the effect of a machine learning model is improved.
Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.
Drawings
The above and other objects and features of exemplary embodiments of the present invention will become more apparent from the following description taken in conjunction with the accompanying drawings which illustrate exemplary embodiments, wherein:
FIG. 1 illustrates a flow diagram of a method of generating combined features of machine learning samples according to an exemplary embodiment of the invention;
FIG. 2 shows a flow diagram of a method of generating combined features of machine learning samples according to another example embodiment of the present invention;
FIG. 3 illustrates an example of a graphical interface for setting feature combination configuration items according to an exemplary embodiment of the present invention;
FIG. 4 illustrates an example of a feature combination analysis report according to an exemplary embodiment of the present invention;
FIG. 5 illustrates an example of a DAG map for generating combined features of machine learning samples, according to an illustrative embodiment of the invention;
FIG. 6 illustrates a block diagram of a system that generates combined features of machine learning samples according to an exemplary embodiment of the invention.
Detailed Description
Reference will now be made in detail to the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.
Here, machine learning is a necessary product of the development of artificial intelligence research to a certain stage, which is directed to improving the performance of the system itself by means of calculation, using experience. In a computer system, "experience" is usually in the form of "data" from which a "model" can be generated by a machine learning algorithm, i.e. by providing empirical data to a machine learning algorithm, a model can be generated based on these empirical data, which provides a corresponding judgment, i.e. a prediction, in the face of a new situation. Whether the machine learning model is trained or predicted using the trained machine learning model, the data needs to be converted into machine learning samples including various features. Machine learning may be implemented in the form of "supervised learning," "unsupervised learning," or "semi-supervised learning," it being noted that exemplary embodiments of the present invention do not impose particular limitations on specific machine learning algorithms. It should also be noted that other means such as statistical algorithms may also be incorporated during the training and application of the model.
Fig. 1 illustrates a flowchart of a method of generating combined features of machine learning samples according to an exemplary embodiment of the present invention. Here, the method may be performed by a computer program, or by a specialized system or computing device that generates combined features of machine-learned samples, as examples.
In step S10, unit features that can be combined are acquired. Here, the unit feature is the smallest unit that can be combined.
As an example, the unit characteristics may be obtained by performing characteristic processing on attribute information of the data record. Here, each data record may be regarded as a description about an event or object, corresponding to an example or a sample. In a data record, attribute information (i.e., fields) is included that reflects the performance or nature of an event or object in some respect. For example, the feature processing may be any suitable feature processing manner, for example, a part of a value of a field may be cut, or various arithmetic operations such as discretization and logarithm may be performed on the value, or a combination of different fields may be performed, which is not limited in the present invention. The resulting unit features may indicate the result of various field processes or operations, such as the field itself, or parts of the field, or combinations of fields.
In step S20, a graphical interface for setting a feature combination configuration item for defining how feature combinations are to be made between unit features is provided to the user. According to an exemplary embodiment of the present invention, the combination between unit features may be performed based on a feature combination configuration item set by a user. Specifically, machine learning models corresponding to candidate combination features between unit features can be trained, the prediction power of each candidate combination feature is reflected based on effect differences between the machine learning models, and then more important or functional candidate combination features are screened out to serve as combination features of machine learning samples. As an example, the user may set the feature combination configuration items involved in the above-described flow through a graphical interface, and may also set other related feature combination configuration items.
In step S30, an input operation performed on the graphical interface by the user in order to set the feature combination configuration item is received, and the feature combination configuration item set by the user is acquired in accordance with the input operation.
As an example, the graphical interface provided to the user may include an input control corresponding to each feature combination configuration item to select and/or edit content, so that the feature combination configuration item set by the user may be acquired by receiving a selection operation and/or an editing operation of the user.
In step S40, features to be combined among the unit features are combined based on the acquired feature combination configuration items to generate combined features of the machine learning sample.
As an example, the feature combination configuration item may include at least one of: the method comprises a characteristic configuration item, an evaluation index configuration item, a training parameter configuration item and a bucket operation configuration item. It should be understood that the feature combination configuration items may also include other configuration items for defining how to combine features between unit features.
Specifically, the feature configuration item is used to specify the features to be combined among the unit features so that the specified features to be combined are combined in step S40. As an example, all or part of the unit features acquired in step S10 may be specified as features to be combined by the feature configuration item. Specifically, the feature configuration item may be used to help the user confirm whether all unit features are used as features to be combined, and may also be used to help the user specifically specify each feature to be combined.
The evaluation index configuration item is used to specify the evaluation index of the combined feature, so that the effect of the machine learning model corresponding to various combined features is measured according to the specified evaluation index to determine the combination manner of the features to be combined in step S40. Here, as an example, a machine learning model corresponding to a particular combined feature may indicate that the sample of the machine learning model includes the particular combined feature.
As described above, according to the exemplary embodiments of the present invention, when the combination of unit features is performed, whether or not the combined feature is employed can be determined by measuring the effect of the machine learning model corresponding to the combined feature. Here, the set evaluation index may be used to measure the effect of the machine learning model corresponding to various combined features, and if the evaluation index of a certain machine learning model is higher, the combined feature corresponding to the machine learning model is more easily determined as the combined feature of the machine learning sample. As an example, the evaluation index may be various model evaluation indexes for measuring the effect of the machine learning model. For example, the evaluation index may be an Area Under an AUC (Receiver Operating Characteristic) Curve, an Area Under an ROC (Receiver Operating Characteristic), an MAE (Mean Absolute Error), a log loss function (logloss), or the like.
The training parameter configuration item is used to specify the training parameters of the machine learning model, so that in step S40, the combination mode of the features to be combined is determined by measuring the effect of the machine learning model corresponding to the various combined features obtained under the specified training parameters.
As an example, a training parameter configuration item may include a configuration item for one or more different training parameters. For example, the training parameter matching items can comprise learning rate configuration items and/or parameter adjusting times configuration items and the like.
However, it should be noted that the above examples are only used for illustrating and explaining exemplary embodiments of the present invention, and the exemplary embodiments of the present invention do not necessarily require a user to configure the above items, for example, all unit features generated through feature processing may be directly used as features to be combined by default, or a preset evaluation index may be used to weigh a machine learning model, or model training may be performed under default training parameters.
In addition, the feature combination configuration item may further include a binning operation configuration item for specifying one or more binning operations to be performed on at least one continuous feature among the features to be combined, respectively, so that the specified one or more binning operations are performed on the at least one continuous feature in step S40 to obtain corresponding one or more binning features, and the obtained binning features are combined with other features to be combined as a whole. As an example, the bucketizing operation configuration item may be used to specify one or more bucketizing operations for each successive feature, respectively. As another example, the bucketizing operation configuration item may be used to uniformly specify one or more bucketizing operations for all consecutive features.
Here, for each successive feature, each sub-bucket operation performed on it may result in one sub-bucket feature, and accordingly, a feature composed of all sub-bucket features may participate in automatic combination between features to be combined instead of the original successive features. As an example, it may be specified by the bucket operation configuration item that a plurality of kinds of bucket operations are to be performed on each of the successive features among the features to be combined, respectively, so that the specified plurality of kinds of bucket operations are performed on each of the successive features, respectively, in step S40 to obtain a corresponding plurality of bucket features.
In particular, a continuous feature is a feature as opposed to a discrete feature (e.g., a category feature), the value of which may be a numerical value having some continuity, e.g., age, amount, etc. In contrast, as an example, the values of the discrete features do not have continuity, and may be the features of unordered classification such as "from beijing", "from shanghai", or "from tianjin", "sex is male", and "sex is female", for example. Accordingly, the bucket operation refers to a specific way of diversifying the continuous features, that is, dividing the value range of the continuous features into a plurality of intervals (i.e., a plurality of buckets), and determining corresponding bucket feature values based on the divided buckets. That is, according to an exemplary embodiment of the present invention, for each continuous feature, after the corresponding at least one sub-bucket feature is obtained by performing at least one sub-bucket operation, a feature corresponding to the continuous feature may be obtained by taking each sub-bucket feature as one constituent element, and the feature may be regarded as a set of sub-bucket features, combined with the continuous feature and/or the discrete feature. Here, it should be understood that the execution of the binning operation causes consecutive features to be placed discretely in a corresponding specific bin, and in the converted plurality of binning features, each dimension may indicate whether a discrete value (e.g., "0" or "1") of the consecutive features is assigned in the bin or not, or may indicate a specific continuous numerical value (e.g., an actual feature value or a normalized value thereof of the consecutive features, an average value, a median value, a boundary value, etc. of each consecutive feature in the bin). Accordingly, when discrete values (e.g., for a classification problem) or continuous values (e.g., for a regression problem) of each dimension are specifically applied in machine learning, a combination between discrete values (e.g., cartesian products, etc.) or a combination between continuous values (e.g., arithmetic operation combination, etc.) may be performed.
As an example, the bucket operation configuration items may further include a bucket mode configuration item and/or a bucket parameter configuration item. The bucket dividing mode configuration item is used for appointing a bucket dividing mode used by the bucket dividing operation. The bucket parameter configuration item is used for specifying the bucket parameters of the bucket dividing mode. For example, the equal-width bucket dividing method or the equal-depth bucket dividing method can be specified by the bucket dividing method configuration item, and the number of buckets, the width of the buckets, the depth of the buckets and the like can be specified by the bucket dividing parameter configuration item. Here, the user may manually input or select the values of the bucket parameter configuration items, and in particular, may be prompted to set the respective widths/depths of the equal-width/equal-depth buckets in an equal-ratio or equal-difference relationship.
Here, as an example, the plurality of kinds of bucket operations specified by the bucket operation configuration item may be bucket operations having the same bucket manner but different bucket parameters (e.g., the number of buckets, the depth of the bucket, the width of the bucket, etc.), or may be bucket operations of different bucket manners. As an example, the corresponding features obtained by performing the specified multiple kinds of barrel operations on the continuous features may be composed of the features obtained by performing each kind of barrel operation on the continuous features, respectively, so that the obtained features corresponding to the continuous features can simultaneously depict some attributes of the original data record from different angles, scales/layers.
It should be understood that the above manner of generating the combination features based on the configuration items is merely for explanation and illustration, and the exemplary embodiments of the present invention are not limited to the above examples.
As an example, after generating the combined features of the machine learning samples based on the feature combination configuration item, the method of generating the combined features of the machine learning samples according to an exemplary embodiment of the present invention may further include: the generated combined features are directly applied to subsequent machine learning steps. For example, the model may be learned based on machine learning samples that include at least the generated combined features.
As an example, the method of generating combined features of machine learning samples according to an exemplary embodiment of the present invention may further include: and storing the generated combination mode of the combination characteristics in a configuration file form so as to be directly called according to user requirements when subsequent machine learning steps are executed, or directly called according to user requirements when other machine learning processes are carried out.
Fig. 2 shows a flowchart of a method of generating combined features of machine learning samples according to another exemplary embodiment of the invention. As shown in fig. 2, a method of generating a combined feature of machine learning samples according to another exemplary embodiment of the present invention may include step S50 in addition to step S10, step S20, step S30, and step S40 shown in fig. 1. Step S10, step S20, step S30 and step S40 can be implemented with reference to the specific embodiment described with reference to fig. 1, and will not be described herein again.
In step S50, the combined feature generated in step S40 is displayed to the user. The particular combination of features may be shown in any effective form herein.
As an example, the evaluation value of each combined feature with respect to the evaluation index is also displayed to the user. Here, the evaluation index may be an evaluation index specified by an evaluation index configuration item set by a user, or may be any other evaluation index.
As an example, the method of generating combined features of machine learning samples according to another exemplary embodiment of the present invention may further include: the combined features selected by the user from the displayed combined features are applied to a subsequent machine learning step.
As another example, the method of generating combined features of machine learning samples according to another exemplary embodiment of the present invention may further include: and storing the combination mode of the combination characteristics selected by the user in a configuration file form, so that the combination mode can be directly called according to the user requirement when a subsequent machine learning step is executed, or can be directly called according to the user requirement when other machine learning processes are carried out.
As an example, the method of generating combined features of machine learning samples according to another exemplary embodiment of the present invention may further include: and applying the combined features selected by the user from the displayed combined features to the subsequent machine learning step, and saving the combination mode of the selected combined features in the form of a configuration file.
An example of setting a feature combination configuration item by a user through a graphical interface according to an exemplary embodiment of the present invention is described below with reference to fig. 3. Fig. 3 illustrates an example of a graphic interface for setting a feature combination configuration item according to an exemplary embodiment of the present invention. It should be understood that the specific interaction details of the exemplary embodiments of the present invention in setting up the respective feature combination configuration items are not limited to the example shown in fig. 3.
As shown in fig. 3, the graphical interface for setting the feature combination configuration item may display content options and/or content input boxes corresponding to the feature configuration item, the evaluation index configuration item, the training parameter configuration item, and the bucket operation configuration item, respectively. Specifically, the feature configuration items may be set according to an input operation in which the user selects the "select all features" option, so that all of the unit features acquired in step S10 are specified as features to be combined; alternatively, a user interface for customizing the features to be combined may be popped up according to an input operation of the user selecting the "customization" option, so that the user selects the features to be combined from the candidate unit features (e.g., all the unit features acquired in step S10) provided by the user interface, or the user inputs identification information of the features to be combined to complete setting of the feature configuration items. The evaluation index configuration item may be set according to a selection operation of the user in the pull-down menu such that the content selected by the user (e.g., "AUC" as shown in fig. 3) is specified as the evaluation index. The user may implement the setting of the training parameter configuration item through an editing operation (e.g., inputting a value of "0.5" as shown in fig. 3) corresponding to the content input box (e.g., the learning rate configuration item as shown in fig. 3). The user may implement the setting of the split-bucket operation configuration item by an editing operation (e.g., inputting a value "10/100/1000/10000/100000" as shown in fig. 3) on a content input box corresponding to the split-bucket operation configuration item (e.g., a split-bucket parameter configuration item (a bucket number configuration item) as shown in fig. 3), that is, the split-bucket operation configuration item set by the user specifies that each continuous feature in the features to be combined respectively performs five split-bucket operations, wherein the bucket number corresponding to the first split-bucket operation is "10", the bucket number corresponding to the second split-bucket operation is "100", …, and the bucket number corresponding to the fifth split-bucket operation is "100000", and here, the split-bucket operation manner may default to equal-width split buckets.
An example of displaying the generated combined features to a user according to an exemplary embodiment of the present invention is described below in conjunction with fig. 4. In the example of fig. 4, the combined features are shown in the form of a feature combination analysis report.
As shown in fig. 4, the unit feature obtained in step S10 is displayed on the left side of the upper table in the form of "output feature name ═ processing method (field name of original attribute information)", for example, a field cons _ price _ idx in which discrete value is represented by discrete _ feature _1729_0 (discrete _ price _ idx) as the unit feature discrete _ feature _1729_ 0; the left side of the table below shows the combined feature generated in step S40, and the combined feature is displayed in the form of "combine (original feature name 1, original feature name 2, original feature name 3 …)", and for example, discrete _ feature _1729_23 is a discrete (combine) feature that combines the features default and month to obtain a new combined feature discrete _ feature _1729_ 23. The right side of the two tables shows the evaluation value of each feature with respect to the evaluation index. As an example, the above table may not be displayed, but only the following table is displayed.
Further, as an example, the user may select a combined feature from the feature combination analysis report shown in FIG. 4 to apply to subsequent machine learning steps and/or save in the form of a configuration file.
According to an exemplary embodiment of the invention, a machine learning process may be performed in the form of a directed acyclic graph (DAG graph), which may encompass all or part of the steps for performing machine learning model training, testing, or prediction. For example, a DAG graph including historical data import steps, data splitting steps, feature extraction steps, automatic feature combination steps may be built for automatic feature combination. That is, the various steps described above may be performed as nodes in a DAG graph.
FIG. 5 illustrates an example of a DAG map for generating combined features of machine learning samples, according to an exemplary embodiment of the invention.
Referring to fig. 5, a first step: and establishing a data import node. For example, as shown in fig. 5, the data import node may be set to import a banking data table named "bank" into the machine learning platform in response to a user operation, where the data table may contain a plurality of historical data records.
The second step is that: and establishing a data splitting node, and connecting a data import node to the data splitting node so as to split the imported data table into a training set and a verification set, wherein data records in the training set are used for being converted into machine learning samples to learn the model, and data records in the verification set are used for being converted into test samples to verify the effect of the learned model. The data splitting node may be set in response to a user operation to split the imported data table into a training set and a validation set in a set manner.
The third step: establishing two feature extraction nodes, and respectively connecting the data splitting nodes to the two feature extraction nodes, so as to respectively perform feature extraction on a training set and a verification set output by the data splitting nodes, for example, the left side of the default data splitting node outputs the training set, and the right side outputs the verification set. The training set and the validation set may be feature extracted based on a feature configuration set by a user in the feature extraction node or written code. It should be understood that the feature extraction modes of the machine learning sample and the test sample are consistent correspondingly. The user can directly apply the feature extraction mode configured for the left-side feature extraction node to the feature extraction for the right-side feature extraction node, or the platform can set the left-side feature extraction node and the right-side feature extraction node as automatic synchronous setting.
The fourth step: and establishing an automatic feature combination node, and respectively connecting the two feature extraction nodes to the automatic feature combination node. The automatic feature combination node may be set in response to a user operation, for example, when an operation of clicking the "automatic feature combination" node by the user is received, a graphical interface for setting feature combination configuration items as shown in fig. 3 may be provided to the user so that the user can set the feature combination configuration items through the graphical interface.
After the DAG graph including the above steps is built, the entire DAG graph can be run according to the user's instructions. In the operation process, the machine learning platform can automatically generate the combined features of the machine learning samples according to the configuration items set by the user and output the corresponding combined features.
Further, as an example, after the automatic feature combining node, a model training node may also be established and connected to the model training node to apply the extracted features and the generated combined features directly to subsequent model training. Accordingly, the model training nodes may be set in response to user operations to train the model based on the machine learning samples in a set manner. Therefore, when the whole DAG graph is operated, the machine learning model can be directly learned according to the configuration items set by the user.
FIG. 6 illustrates a block diagram of a system that generates combined features of machine learning samples according to an exemplary embodiment of the invention. As shown in fig. 6, the system for generating combined features of machine learning samples according to an exemplary embodiment of the present invention includes: unit feature acquisition means 10, display means 20, arrangement item acquisition means 30, combination feature generation means 40.
The unit feature acquiring device 10 is used to acquire unit features that can be combined.
As an example, the unit feature acquisition means 10 may obtain the unit feature by performing feature processing on the attribute information of the data record.
The display device 20 is used to provide a graphical interface for setting a feature combination configuration item for defining how feature combinations are made between unit features to a user.
The configuration item acquisition means 30 is used for receiving an input operation performed by a user on a graphical interface in order to set a feature combination configuration item, and acquiring the feature combination configuration item set by the user according to the input operation.
The combined feature generating device 40 is configured to combine features to be combined from the unit features based on the acquired feature combination configuration items to generate combined features of the machine learning samples.
Optionally, the feature combination configuration item may include at least one of: the method comprises a characteristic configuration item, an evaluation index configuration item, a training parameter configuration item and a bucket operation configuration item.
Specifically, the feature configuration item is used to specify the features to be combined among the unit features, so that the combined feature generating means 40 combines the specified features to be combined.
The evaluation index configuration item is used for specifying an evaluation index of the combined feature, so that the combined feature generation device 40 measures the effect of the machine learning model corresponding to various combined features according to the specified evaluation index to determine the combination mode of the features to be combined.
The training parameter configuration item is used to specify a training parameter of the machine learning model, so that the combined feature generation apparatus 40 determines a combination mode of the features to be combined by measuring effects of the machine learning model corresponding to various combined features obtained under the specified training parameter.
The bucket operation configuration item is used for designating one or more bucket operations to be executed on at least one continuous feature among the features to be combined, so that the combined feature generation device 40 executes the designated one or more bucket operations on the at least one continuous feature to obtain corresponding one or more bucket features, and combines the obtained bucket features with other features to be combined as a whole.
As an example, the bucketizing operation configuration item may be used to specify one or more bucketizing operations for each successive feature, respectively. As another example, the bucketizing operation configuration item may be used to uniformly specify one or more bucketizing operations for all consecutive features.
As an example, the display device 20 may also display the combined feature generated by the combined feature generating device 40 to the user. Further, as an example, the display device 20 may also display, to the user, the evaluation value of each combined feature generated by the combined feature generation device 40 with respect to the evaluation index.
As an example, the system for generating combined features of machine learning samples according to an exemplary embodiment of the present invention may further include: application means (not shown).
The application means is used to directly apply the combined feature generated by the combined feature generation means 40 to the subsequent machine learning step, or to apply a combined feature selected by the user from the combined features displayed by the display means 20 to the subsequent machine learning step.
As an example, the system for generating combined features of machine learning samples according to an exemplary embodiment of the present invention may further include: a holding device (not shown).
The storage means is configured to store the combination pattern of the combination feature generated by the combination feature generation means 40 in the form of a profile, or store the combination pattern of the combination feature selected by the user from the combination features displayed on the display means 20 in the form of a profile.
It should be understood that the specific implementation of the system for generating combined features of machine learning samples according to the exemplary embodiment of the present invention may be implemented with reference to the related specific implementation described in conjunction with fig. 1 to 5, and will not be described herein again.
The system for generating combined features of machine learning samples according to exemplary embodiments of the present invention may include devices that are respectively configured as software, hardware, firmware, or any combination thereof to perform a particular function. These means may correspond, for example, to a dedicated integrated circuit, to pure software code, or to a module combining software and hardware. Further, one or more functions implemented by these apparatuses may also be collectively performed by components in a physical entity device (e.g., a processor, a client, a server, or the like).
It is to be understood that the method of generating combined features of machine-learned samples according to an exemplary embodiment of the present invention may be implemented by a program recorded on a computer-readable medium, for example, according to an exemplary embodiment of the present invention, there may be provided a computer-readable medium for generating combined features of machine-learned samples, wherein the computer program for executing the following method steps is recorded on the computer-readable medium: (A) acquiring unit features which can be combined; (B) providing a graphical interface for setting feature combination configuration items for defining how feature combinations are to be made between unit features to a user; (C) receiving input operation executed on a graphical interface by a user for setting a feature combination configuration item, and acquiring the feature combination configuration item set by the user according to the input operation; and (D) combining the features to be combined in the unit features based on the acquired feature combination configuration items to generate combined features of the machine learning samples.
The computer program in the computer-readable medium may be executed in an environment deployed in a computer device such as a client, a host, a proxy device, a server, etc., and it should be noted that the computer program may also be used to perform additional steps other than the above steps or perform more specific processing when the above steps are performed, and the contents of the additional steps and the further processing are described with reference to fig. 1 to 5, and will not be described again to avoid repetition.
It should be noted that the system for generating combined features of machine learning samples according to an exemplary embodiment of the present invention may completely rely on the execution of a computer program to realize corresponding functions, i.e., each device corresponds to each step in the functional architecture of the computer program, so that the whole system is called by a special software package (e.g., lib library) to realize the corresponding functions.
On the other hand, the respective means included in the system for generating combined features of machine learning samples according to the exemplary embodiment of the present invention may also be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the corresponding operations may be stored in a computer-readable medium such as a storage medium, so that a processor may perform the corresponding operations by reading and executing the corresponding program code or code segments.
For example, exemplary embodiments of the present invention may also be implemented as a computing device comprising a storage component having stored therein a set of computer-executable instructions that, when executed by the processor, perform a method of generating combined features of machine-learned samples.
In particular, the computing devices may be deployed in servers or clients, as well as on node devices in a distributed network environment. Further, the computing device may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing the set of instructions described above.
The computing device need not be a single computing device, but can be any device or collection of circuits capable of executing the instructions (or sets of instructions) described above, individually or in combination. The computing device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).
In the computing device, the processor may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.
Certain operations described in the method of generating combined features of machine-learned samples according to the exemplary embodiments of the present invention may be implemented by software, certain operations may be implemented by hardware, or a combination of both.
The processor may execute instructions or code stored in one of the memory components, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.
The memory component may be integral to the processor, e.g., having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage component may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The storage component and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor can read files stored in the storage component.
Further, the computing device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the computing device may be connected to each other via a bus and/or a network.
The operations involved in a method of generating combined features of machine-learned samples according to an exemplary embodiment of the present invention may be described as various interconnected or coupled functional blocks or functional diagrams. However, these functional blocks or functional diagrams may be equally integrated into a single logic device or operated on by non-exact boundaries.
For example, as described above, a computing device for generating combined features of machine learning samples according to exemplary embodiments of the present invention may include a storage component and a processor, wherein the storage component has stored therein a set of computer-executable instructions that, when executed by the processor, perform the steps of: (A) acquiring unit features which can be combined; (B) providing a graphical interface for setting feature combination configuration items for defining how feature combinations are to be made between unit features to a user; (C) receiving input operation executed on a graphical interface by a user for setting a feature combination configuration item, and acquiring the feature combination configuration item set by the user according to the input operation; and (D) combining the features to be combined in the unit features based on the acquired feature combination configuration items to generate combined features of the machine learning samples.
While exemplary embodiments of the invention have been described above, it should be understood that the above description is illustrative only and not exhaustive, and that the invention is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. Therefore, the protection scope of the present invention should be subject to the scope of the claims.

Claims (22)

1. A method of generating combined features of machine learning samples, comprising:
(A) acquiring unit features which can be combined;
(B) providing a graphical interface to a user for setting feature combination configuration items, wherein the feature combination configuration items include the following: the method comprises the following steps of (1) configuring a feature item, an evaluation index configuring item and a bucket dividing operation configuring item;
(C) receiving input operation executed on a graphical interface by a user for setting a feature combination configuration item, and acquiring the feature combination configuration item set by the user according to the input operation; and
(D) combining the features to be combined in the unit features based on the obtained feature combination configuration items to generate combined features of the machine learning samples,
wherein, the feature configuration item is used for appointing the features to be combined in the unit features, so that the appointed features to be combined are combined in various ways in the step (D); an evaluation index configuration item, which is used for specifying the evaluation index of the combined features, so that the effect of the machine learning model corresponding to various combined features is measured according to the specified evaluation index in the step (D) to determine the combination mode of the features to be combined; and (D) a bucket operation configuration item, configured to specify one or more bucket operations to be performed on at least one continuous feature among the features to be combined, so that the specified one or more bucket operations are performed on the at least one continuous feature in step (D) to obtain corresponding one or more bucket features, and the obtained bucket features are combined with other features to be combined in various ways as a whole.
2. The method of claim 1, wherein a feature combination configuration item further comprises: and (D) a training parameter configuration item used for appointing the training parameters of the machine learning model, so that the combination mode of the features to be combined is determined by measuring the effect of the machine learning model corresponding to various combination features obtained under the appointed training parameters in the step (D).
3. The method of claim 1, wherein the bucketizing operation configuration item is used to specify one or more bucketizing operations for each successive feature, respectively; or the bucket operation configuration item is used for uniformly appointing one or more bucket operations aiming at all continuous characteristics.
4. The method of claim 1, further comprising:
(E) the generated combined features are displayed to a user.
5. The method according to claim 4, wherein in step (E), the evaluation value of each combined feature with respect to the evaluation index is also displayed to the user.
6. The method of claim 1, further comprising:
(F) the generated combined features are directly applied to subsequent machine learning steps.
7. The method of claim 4 or 5, further comprising:
(G) the combined features selected by the user from the displayed combined features are applied to a subsequent machine learning step.
8. The method of claim 1, further comprising:
(H) and (D) storing the combination mode of the combination characteristics generated in the step (D) in a configuration file.
9. The method of claim 7, further comprising:
(I) and (G) storing the combination mode of the combination characteristics selected by the user in the step (G) in the form of a configuration file.
10. The method according to claim 1, wherein in step (a), the unit feature is obtained by performing feature processing on attribute information of the data record.
11. A system for generating combined features of machine-learned samples, comprising:
unit feature acquisition means for acquiring unit features that can be combined;
display means for providing a graphical interface for setting feature combination configuration items to a user, wherein the feature combination configuration items include the following: the method comprises the following steps of (1) configuring a feature item, an evaluation index configuring item and a bucket dividing operation configuring item;
configuration item acquisition means for receiving an input operation performed by a user on a graphical interface in order to set a feature combination configuration item, and acquiring the feature combination configuration item set by the user according to the input operation; and
a combined feature generation means for combining features to be combined among the unit features based on the acquired feature combination configuration items to generate combined features of the machine learning samples,
wherein, the feature configuration item is used for appointing the features to be combined in the unit features, so that the appointed features to be combined are combined in various ways in the step (D); an evaluation index configuration item, which is used for specifying the evaluation index of the combined features, so that the effect of the machine learning model corresponding to various combined features is measured according to the specified evaluation index in the step (D) to determine the combination mode of the features to be combined; and the barrel operation configuration item is used for appointing one or more barrel operation to be executed on at least one continuous characteristic in the characteristics to be combined respectively, so that the combined characteristic generating device executes the appointed one or more barrel operation on the at least one continuous characteristic respectively to obtain one or more corresponding barrel characteristics, and the obtained barrel characteristics are combined with other characteristics to be combined in various ways as a whole.
12. The system of claim 11, wherein the feature combination configuration item further comprises: and the training parameter configuration item is used for appointing the training parameters of the machine learning model, so that the combined feature generation device determines the combination mode of the features to be combined by measuring the effect of the machine learning model corresponding to various combined features obtained under the appointed training parameters.
13. The system of claim 11, wherein the bucketizing operation configuration item is configured to specify one or more bucketizing operations for each successive feature, respectively; or the bucket operation configuration item is used for uniformly appointing one or more bucket operations aiming at all continuous characteristics.
14. The system of claim 11, wherein,
the display means also displays the generated combined feature to the user.
15. The system according to claim 14, wherein the display means further displays the evaluation value of each generated combined feature with respect to the evaluation index to the user.
16. The system of claim 11, further comprising:
and the application device is used for directly applying the generated combined features to the subsequent machine learning step.
17. The system of claim 14 or 15, further comprising:
and the application device is used for applying the combined feature selected by the user from the displayed combined features to the subsequent machine learning step.
18. The system of claim 11, further comprising:
and a storage device for storing the combination mode of the combination features generated by the combination feature generation device in the form of a configuration file.
19. The system of claim 17, further comprising:
and the storage device is used for storing the combination mode of the combination characteristics selected by the user from the displayed combination characteristics in the form of a configuration file.
20. The system according to claim 11, wherein the unit feature obtaining means obtains the unit feature by performing feature processing on attribute information of the data record.
21. A computer-readable medium for generating combined features of machine-learned samples, wherein a computer program for performing the method of generating combined features of machine-learned samples according to any one of claims 1 to 10 is recorded on the computer-readable medium.
22. A computing device for generating combined features of machine-learned samples, comprising a storage component and a processor, wherein the storage component has stored therein a set of computer-executable instructions which, when executed by the processor, perform a method of generating combined features of machine-learned samples as claimed in any one of claims 1 to 10.
CN201710898898.2A 2017-09-28 2017-09-28 Method and system for generating combined features of machine learning samples Active CN107766946B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710898898.2A CN107766946B (en) 2017-09-28 2017-09-28 Method and system for generating combined features of machine learning samples
CN202010658034.5A CN111797998B (en) 2017-09-28 2017-09-28 Method and system for generating combined features of machine learning samples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710898898.2A CN107766946B (en) 2017-09-28 2017-09-28 Method and system for generating combined features of machine learning samples

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202010658034.5A Division CN111797998B (en) 2017-09-28 2017-09-28 Method and system for generating combined features of machine learning samples

Publications (2)

Publication Number Publication Date
CN107766946A CN107766946A (en) 2018-03-06
CN107766946B true CN107766946B (en) 2020-06-23

Family

ID=61267329

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010658034.5A Active CN111797998B (en) 2017-09-28 2017-09-28 Method and system for generating combined features of machine learning samples
CN201710898898.2A Active CN107766946B (en) 2017-09-28 2017-09-28 Method and system for generating combined features of machine learning samples

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202010658034.5A Active CN111797998B (en) 2017-09-28 2017-09-28 Method and system for generating combined features of machine learning samples

Country Status (1)

Country Link
CN (2) CN111797998B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710949A (en) * 2018-04-26 2018-10-26 第四范式(北京)技术有限公司 The method and system of template are modeled for creating machine learning
CN112130723B (en) * 2018-05-25 2023-04-18 第四范式(北京)技术有限公司 Method and system for performing feature processing on data
CN108985459A (en) * 2018-05-30 2018-12-11 华为技术有限公司 The method and apparatus of training pattern
CN110895718A (en) * 2018-09-07 2020-03-20 第四范式(北京)技术有限公司 Method and system for training machine learning model
CN109634961B (en) * 2018-12-05 2021-06-04 杭州大拿科技股份有限公司 Test paper sample generation method and device, electronic equipment and storage medium
CN109685583B (en) * 2019-01-10 2020-12-25 博拉网络股份有限公司 Supply chain demand prediction method based on big data
CN110956272B (en) * 2019-11-01 2023-08-08 第四范式(北京)技术有限公司 Method and system for realizing data processing
CN110851500B (en) * 2019-11-07 2022-10-28 北京集奥聚合科技有限公司 Method for generating expert characteristic dimension required by machine learning modeling
CN111625692B (en) * 2020-05-27 2023-08-22 抖音视界有限公司 Feature extraction method, device, electronic equipment and computer readable medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103353936A (en) * 2013-07-26 2013-10-16 上海交通大学 Method and system for face identification
CN105260171A (en) * 2015-09-10 2016-01-20 深圳市创梦天地科技有限公司 Virtual item generation method and apparatus
CN105677353A (en) * 2016-01-08 2016-06-15 北京物思创想科技有限公司 Feature extraction method and machine learning method and device thereof
CN106127531A (en) * 2016-07-14 2016-11-16 北京物思创想科技有限公司 The method and system of differentiation price are performed based on machine learning
CN106779088A (en) * 2016-12-06 2017-05-31 北京物思创想科技有限公司 Perform the method and system of machine learning flow
CN107045503A (en) * 2016-02-05 2017-08-15 华为技术有限公司 The method and device that a kind of feature set is determined

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4891197B2 (en) * 2007-11-01 2012-03-07 キヤノン株式会社 Image processing apparatus and image processing method
US9152884B2 (en) * 2012-06-05 2015-10-06 Drvision Technologies Llc Teachable pattern scoring method
CN106897918A (en) * 2017-02-24 2017-06-27 上海易贷网金融信息服务有限公司 A kind of hybrid machine learning credit scoring model construction method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103353936A (en) * 2013-07-26 2013-10-16 上海交通大学 Method and system for face identification
CN105260171A (en) * 2015-09-10 2016-01-20 深圳市创梦天地科技有限公司 Virtual item generation method and apparatus
CN105677353A (en) * 2016-01-08 2016-06-15 北京物思创想科技有限公司 Feature extraction method and machine learning method and device thereof
CN107045503A (en) * 2016-02-05 2017-08-15 华为技术有限公司 The method and device that a kind of feature set is determined
CN106127531A (en) * 2016-07-14 2016-11-16 北京物思创想科技有限公司 The method and system of differentiation price are performed based on machine learning
CN106779088A (en) * 2016-12-06 2017-05-31 北京物思创想科技有限公司 Perform the method and system of machine learning flow

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"基于SVM和CRF多特征组合的微博情感分析";李婷婷等;《计算机应用研究》;20150430;第32卷(第4期);第979-980页第3.1节 *
"基于互信息的组合特征选择算法";李叶紫等;《计算机系统应用》;20170815;第26卷(第8期);第173-179页 *
"特征选择方法与算法的研究";李敏等;《计算机技术与发展》;20121231;第23卷(第12期);第16-21页 *

Also Published As

Publication number Publication date
CN107766946A (en) 2018-03-06
CN111797998A (en) 2020-10-20
CN111797998B (en) 2024-06-11

Similar Documents

Publication Publication Date Title
CN107766946B (en) Method and system for generating combined features of machine learning samples
WO2019129060A1 (en) Method and system for automatically generating machine learning sample
CN107844837B (en) Method and system for adjusting and optimizing algorithm parameters aiming at machine learning algorithm
CN112101562B (en) Implementation method and system of machine learning modeling process
US11327935B2 (en) Intelligent data quality
US10671507B2 (en) Application performance analytics platform
CN108008942B (en) Method and system for processing data records
CN113822440A (en) Method and system for determining feature importance of machine learning samples
CN108228861B (en) Method and system for performing feature engineering for machine learning
US20200019881A1 (en) Feature processing method and feature processing system for machine learning
EP3126957A1 (en) Scalable business process intelligence and predictive analytics for distributed architectures
CN112990486A (en) Method and system for generating combined features of machine learning samples
CN110188910A (en) The method and system of on-line prediction service are provided using machine learning model
CN107273979B (en) Method and system for performing machine learning prediction based on service level
CN111797927A (en) Method and system for determining important features of machine learning samples
US20230252274A1 (en) Method of providing neural network model and electronic apparatus for performing the same
CN116882520A (en) Prediction method and system for predetermined prediction problem
CN114443639A (en) Method and system for processing data table and automatically training machine learning model
JP2019082874A (en) Design support device and design support system
JP2021500639A (en) Prediction engine for multi-step pattern discovery and visual analysis recommendations
CN110895718A (en) Method and system for training machine learning model
US20210326761A1 (en) Method and System for Uniform Execution of Feature Extraction
CN111078500A (en) Method and device for adjusting operation configuration parameters, computer equipment and storage medium
CN114764296A (en) Machine learning model training method and device, electronic equipment and storage medium
KR20210143460A (en) Apparatus for feature recommendation and method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant