CN108090516A - Automatically generate the method and system of the feature of machine learning sample - Google Patents
Automatically generate the method and system of the feature of machine learning sample Download PDFInfo
- Publication number
- CN108090516A CN108090516A CN201711445538.3A CN201711445538A CN108090516A CN 108090516 A CN108090516 A CN 108090516A CN 201711445538 A CN201711445538 A CN 201711445538A CN 108090516 A CN108090516 A CN 108090516A
- Authority
- CN
- China
- Prior art keywords
- feature
- machine learning
- generation
- data
- learning sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2155—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Abstract
A kind of method and system for the feature for automatically generating machine learning sample are provided.The described method includes:(A) user's specified data table is obtained, wherein, a line of tables of data corresponds to a data record, the corresponding field of a row of tables of data;(B) characteristic type corresponding to each non-targeted value field in claim data table, wherein, characteristic type includes discrete features and/or continuous feature;(C) it is unit feature by each non-targeted value field processing according to the characteristic type of statement;(D) combinations of features is carried out based on the unit character of generation, to generate assemblage characteristic;And (E) unit character based on generation and assemblage characteristic obtain the feature of machine learning sample.According to the method and system, the feature of machine learning sample can be automatically generated based on tables of data, both reduced Feature Engineering uses threshold, improves the ease for use of Feature Engineering, and improves the efficiency of Feature Engineering.
Description
Technical field
All things considered of the present invention is related to artificial intelligence field, more particularly, is related to one kind and automatically generates machine learning sample
The method and system of this feature.
Background technology
With the appearance of mass data, people tend to excavate bid value from data using machine learning techniques.
The basic process of training machine learning model mainly includes:
1st, the data set (for example, tables of data) for including historgraphic data recording is imported;
2nd, Feature Engineering is completed, wherein, various processing are carried out by the attribute information for the data record concentrated to data, with
Each feature is obtained, the feature vector that these features are formed can be used as machine learning sample;
3rd, training pattern, wherein, according to setting machine learning algorithm (for example, logistic regression algorithm, decision Tree algorithms,
Neural network algorithm etc.), learn model based on the obtained machine learning sample of Feature Engineering is passed through.
In above process, generate that the processing of feature is critically important, it can influence the quality of model.Per data in tables of data
Record may include multiple attribute informations (that is, field), and feature may indicate that each field in itself or the various fields such as the combination of field
(or computing) is handled as a result, preferably to reflect the internal association of data distribution and interfield and latent meaning, it is therefore, special
The quality of sign construction quality directly determines the accuracy that Machine Learning Problems are portrayed, and then influences the quality of model.
On existing machine learning platform, it can be used based on the interactive mode of graphical interfaces to complete machine learning model
Training flow, program code is write without user in person.However, in Feature Engineering link, but will often be manually set
Feature generating mode be manually input in plateform system.That is, it is necessary to which user presets machine learning sample
Feature, on the one hand, user is needed to have deep understanding to business scenario, i.e. user sets feature by business experience;It is another
Aspect, generally in machine-learning process, the data volume for using data is all bigger, and user cannot comprehensively analyze number sometimes
According to can cause to set some invalid features, in order to improve the effect of the feature of machine learning sample, this just needs user to carry out
It continuously attempts to, when in face of big data quantity and high dimensional feature, such need of work spends longer time.In this case,
User is not only needed to have deep understanding to business scenario, adds the workload of user, also reduces the efficiency of machine learning.
The content of the invention
Exemplary embodiment of the present invention is to provide a kind of method of feature for automatically generating machine learning sample and is
System, to solve the problems, such as the feature existing in the prior art that cannot easily generate machine learning sample.
Exemplary embodiment according to the present invention provides a kind of method for the feature for automatically generating machine learning sample, bag
It includes:(A) user's specified data table is obtained, wherein, a line of tables of data corresponds to a data record, and a row of tables of data correspond to
One field;(B) characteristic type corresponding to each non-targeted value field in claim data table, wherein, characteristic type includes
Discrete features and/or continuous feature;(C) it is unit feature by each non-targeted value field processing according to the characteristic type of statement;
(D) combinations of features is carried out based on the unit character of generation, to generate assemblage characteristic;And the unit character of (E) based on generation
The feature of machine learning sample is obtained with assemblage characteristic.
Optionally, the method is by starting operator corresponding with automated characterization generation step and automated execution.
Optionally, the operator corresponds to the node in directed acyclic graph corresponding with machine learning flow.
Optionally, non-targeted value field obtains in the following manner:User is removed from all fields in tables of data
The target value field specified.
Optionally, the operator provides abnormity prompt when being activated in the case of the not specified target value field of user.
Optionally, in step (B), all non-targeted value fields are claimed as discrete by the automatic or instruction according to user
Feature, alternatively, each non-targeted value field is claimed as discrete features corresponding with its field value data type or continuous feature.
Optionally, step (D) includes:Whole unit characters of generation are carried out with various combinations to obtain candidate combinations spy
Sign is waited alternatively, carrying out various combinations to the unit character of high importance of feature among whole unit characters of generation to obtain
Select assemblage characteristic;By weighing the effect of corresponding with each candidate combinations feature machine learning model come from candidate combinations feature
In filter out assemblage characteristic.
Optionally, in step (E), using whole unit characters of generation and whole assemblage characteristics as machine learning sample
Feature;Alternatively, by among whole unit characters of generation and whole assemblage characteristics, feature feature of high importance is as machine
The feature of device learning sample;Alternatively, by feature unit character of high importance and generation among whole unit characters of generation
Whole assemblage characteristics, the feature as machine learning sample;Alternatively, by feature importance among whole assemblage characteristics of generation
Higher assemblage characteristic and whole unit characters of generation, the feature as machine learning sample.
Optionally, the method further includes:(F) feature of the machine learning sample shown to user.
Optionally, in step (F), the feature importance of each feature is also shown to user.
Optionally, the method further includes:(G) feature of obtained machine learning sample is directly applied to subsequent machine
Device learning procedure.
Optionally, in step (C), for continuous type and discrete features are declared as each field value data type
Non-targeted value field, perform one or more and divide bucket computing and divide bucket feature to obtain corresponding one or more, and will obtain
Divide bucket feature whole feature as a unit.
In accordance with an alternative illustrative embodiment of the present invention, provide a kind of feature for automatically generating machine learning sample is
System, including:Tables of data acquisition device, for obtaining user's specified data table, wherein, a line of tables of data corresponds to a data
Record, the corresponding field of a row of tables of data;State device, it is right for each non-targeted value field institute in claim data table
The characteristic type answered, wherein, characteristic type includes discrete features and/or continuous feature;Unit character generating means, for according to
Each non-targeted value field processing is unit feature by the characteristic type of statement;Assemblage characteristic generating means, for being based on generating
Unit character carry out combinations of features, to generate assemblage characteristic;And feature acquisition device, it is special for the unit based on generation
Assemblage characteristic seek peace to obtain the feature of machine learning sample.
Optionally, operate the system automated execution by starting operator corresponding with automated characterization generation step.
Optionally, the operator corresponds to the node in directed acyclic graph corresponding with machine learning flow.
Optionally, non-targeted value field obtains in the following manner:User is removed from all fields in tables of data
The target value field specified.
Optionally, the system also includes:Alarm set, for the operator the not specified target value field of user feelings
When being activated under condition, abnormity prompt is provided.
Optionally, state that device is automatic or the instruction according to user, all non-targeted value fields be claimed as discrete features,
Alternatively, each non-targeted value field is claimed as discrete features corresponding with its field value data type or continuous feature.
Optionally, assemblage characteristic generating means include:Candidate combinations feature acquiring unit, for whole units to generation
Feature carries out various combinations and obtains candidate combinations feature, alternatively, to feature importance among whole unit characters of generation compared with
High unit character carries out various combinations to obtain candidate combinations feature;Assemblage characteristic screening unit, for pass through weigh with it is every
The effect of a corresponding machine learning model of candidate combinations feature filters out assemblage characteristic from candidate combinations feature.
Optionally, feature acquisition device is using whole unit characters of generation and whole assemblage characteristics as machine learning sample
Feature;Alternatively, feature acquisition device, by among whole unit characters of generation and whole assemblage characteristics, feature importance is higher
Feature of the feature as machine learning sample;Alternatively, feature acquisition device is by feature weight among whole unit characters of generation
The higher unit character of the property wanted and whole assemblage characteristics of generation, the feature as machine learning sample;Alternatively, feature obtains dress
Feature assemblage characteristic of high importance and whole unit characters of generation among whole assemblage characteristics by generation are put, as machine
The feature of device learning sample.
Optionally, the system also includes:Display device, for the spy of the machine learning sample shown to user
Sign.
Optionally, display device also shows the feature importance of each feature to user.
Optionally, the system also includes:Application apparatus, for directly by the feature application of obtained machine learning sample
In subsequent machine learning step.
Optionally, unit character generating means for continuous type and are declared as discrete for each field value data type
The non-targeted value field of feature, execution one or more divide bucket computing and divide bucket feature to obtain corresponding one or more, and will
What is obtained divides bucket feature whole feature as a unit.
In accordance with an alternative illustrative embodiment of the present invention, a kind of feature for being used to automatically generate machine learning sample is provided
Computer-readable medium, wherein, record is useful for performing on the computer-readable medium automatically generates machine as described above
The computer program of the method for the feature of device learning sample.
In accordance with an alternative illustrative embodiment of the present invention, a kind of feature for being used to automatically generate machine learning sample is provided
Computing device, including storage unit and processor, wherein, set of computer-executable instructions conjunction is stored in storage unit, works as institute
Set of computer-executable instructions conjunction is stated when being performed by the processor, performs and automatically generates machine learning sample as described above
The method of feature.
In the method and system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample, energy
Enough features that machine learning sample is automatically generated based on tables of data, both reduced Feature Engineering uses threshold, improves feature
The ease for use of engineering, and improve the efficiency of Feature Engineering.
Part in following description is illustrated into the other aspect and/or advantage of present general inventive concept, also has one
Divide and will be apparent by description or the implementation of present general inventive concept can be passed through and learnt.
Description of the drawings
By with reference to be exemplarily illustrated embodiment attached drawing carry out description, exemplary embodiment of the present it is upper
It states and will become apparent with other purposes and feature, wherein:
Fig. 1 shows the flow of the method for the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample
Figure;
Fig. 2 shows according to an exemplary embodiment of the present invention to specify the corresponding characteristic type of non-targeted value field by user
Example;
The method that Fig. 3 shows the feature for automatically generating machine learning sample in accordance with an alternative illustrative embodiment of the present invention
Flow chart;
The method that Fig. 4 shows the feature for automatically generating machine learning sample in accordance with an alternative illustrative embodiment of the present invention
Flow chart;
The method that Fig. 5 shows the feature for automatically generating machine learning sample in accordance with an alternative illustrative embodiment of the present invention
Flow chart;
Fig. 6 shows the example of the DAG figures according to an exemplary embodiment of the present invention for training machine learning model;
Fig. 7 shows the frame of the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample
Figure.
Specific embodiment
The embodiment of the present invention is reference will now be made in detail, the example of the embodiment is shown in the drawings, wherein, identical mark
Number identical component is referred to always.It will illustrate the embodiment by referring to accompanying drawing below, to explain the present invention.
Here, machine learning is the inevitable outcome that artificial intelligence study develops to certain phase, is directed to passing through calculating
Means, improve the performance of system itself using experience.In computer systems, " experience " is usually deposited in the form of " data "
By machine learning algorithm, " model " can be being generated from data, that is to say, that machine learning is supplied to calculate empirical data
Method can just be based on these empirical datas and generate model, and when in face of news, model can provide corresponding judgement, i.e. prediction
As a result.Whether training machine learning model or predicted using trained machine learning model, data are required for turning
It is changed to the machine learning sample including various features.Machine learning can be implemented as " supervised learning ", " unsupervised learning " or
The form of " semi-supervised learning ", it should be noted that exemplary embodiment of the present invention is to specific machine learning algorithm and without spy
Definite limitation.Further, it should also be noted that during training and application model, other means such as statistic algorithm are may also be combined with.
Fig. 1 shows the flow of the method for the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample
Figure.Here, as an example, the method can be performed by computer program, also can machine learning be automatically generated by special
The system of the feature of sample or computing device perform.
As an example, the method can be by starting operator corresponding with automated characterization generation step and automated execution.It changes
Yan Zhi, when operator corresponding with automated characterization generation step is activated, by automated execution the method.Further, as
Example, the operator correspond to the node in directed acyclic graph corresponding with machine learning flow (DAG figures).For example, and machine
The corresponding DAG figures of learning process may include that feature generates node, when running entire DAG figures, go to the feature generation
During node, by automated execution the method.In the following, exemplary embodiment according to the present invention will be instructed with reference to Fig. 6
The DAG figures for practicing machine learning model are described in detail.
With reference to Fig. 1, in step S101, user's specified data table is obtained.Here, a line of tables of data corresponds to a number
According to record, the corresponding field of a row of tables of data.In other words, every data record in tables of data has and each field phase
The field value answered.As an example, record is seen as the description as described in an event or object per data, show corresponding to one
Example or sample, each field can be used for the performance in one aspect of description event or object or property (for example, name, age, duty
Industry etc.).
As an example, can provide a user to specify the graphical interfaces of tables of data, and according to user in the graphical interfaces
The input operation of upper execution, to determine the tables of data specified by user.
In step s 102, the characteristic type corresponding to each non-targeted value field in claim data table, wherein, feature
Type includes discrete features and/or continuous feature.
Here, even if field of the target value field corresponding to the mark to be estimated of machine learning techniques (that is, label),
The field correspond to supervised learning in the case of prediction target rather than target value field, that is, tables of data among remove target value field
Outside field.
In the case of supervised learning, as an example, non-targeted value field can obtain in the following manner:From data
The target value field that user specifies is removed in all fields in table.As an example, it can provide a user to specify desired value
The graphical interfaces of field, and the input performed according to user on the graphical interfaces operates, to determine the target specified by user
Value field.Further, as an example, the operator can be carried when being activated in the case of the not specified target value field of user
For abnormity prompt, user to be reminded to specify target value field.
Moreover, it should be understood that may include target value field in tables of data, target value field also may not include.
Continuously it is characterized in and a kind of opposite feature of discrete features (for example, category feature), value can have one
Fixed successional numerical value, for example, age, amount of money etc..Relatively, as an example, the value of discrete features does not have continuity, example
Such as, can be " from Beijing ", " from Shanghai " or the unordered classification such as " from Tianjin ", " gender is man ", " gender is female "
Feature.
As an example, can automatically or the instruction according to user, by all non-targeted value fields be claimed as discrete features or
Each non-targeted value field is claimed as discrete features corresponding with its field value data type or continuous feature by person.
As an example, the field value data type of field can be continuous type (for example, numeric type (for example, integer int)) or
Discrete type (for example, text-type (for example, character string type string)).As an example, by each non-targeted value field be claimed as with
The step of its corresponding discrete features of field value data type or continuous feature, may include:By the field value data class in tables of data
Type is claimed as discrete features for the non-targeted value field of discrete type, and is continuous type by the field value data type in tables of data
Non-targeted value field is claimed as continuous feature.
As an example, can provide a user to specify the graphical interfaces of the corresponding characteristic type of non-targeted value field, and
The input performed according to user on the graphical interfaces operates, and all non-targeted value fields are claimed as discrete features, alternatively, will
Each non-targeted value field is claimed as discrete features corresponding with its field value data type or continuous feature.
It describes according to an exemplary embodiment of the present invention to specify non-mesh by graphical interfaces by user with reference to Fig. 2
The example of the corresponding characteristic type of scale value field.As shown in Fig. 2, for specifying the figure of the corresponding characteristic type of non-targeted value field
Shape interface can show radio button " all discrete " and radio button " discrete+continuous " (the two buttons can be selected one and be chosen),
Selection operation of the user to radio button " all discrete " is may be in response to, all non-targeted value fields in tables of data are claimed as
Discrete features;Selection operation of the user to radio button " discrete+continuous " is may be in response to, according to the number of each non-targeted value field
According to type by the field declaration be corresponding discrete features or continuous feature, here, can according to the characteristic of field value come automatically
Judge the data type of field, and and then be that field declaration is discrete features by discrete type or continuous type according to data type
Or continuous feature.In addition, may also display to specify the control of target value field in the graphical interfaces, user can be by this
Target value field is specified in the operation of control.In addition, it may also display each field in tables of data on the left of the graphical interfaces
Field name and field value data type.
Referring back to Fig. 1, in step s 103, each non-targeted value field is handled as unit according to the characteristic type of statement
Feature.In other words, it is respectively a unit character by each non-targeted value field processing according to the characteristic type of statement.
As an example, for continuous type and the non-targeted value of discrete features can be declared as to each field value data type
Field carries out sliding-model control, to obtain a unit character.
It is to be understood that unit character here refers to that this feature corresponds to single field, itself can be according to the definition of value
And with one or more dimensions.Optionally, each field value data type can be directed to as continuous type and be declared as discrete
The non-targeted value field of feature, execution one or more divide bucket computing and divide bucket feature to obtain corresponding one or more, and will
What is obtained divides bucket feature whole feature as a unit.
Here, bucket (binning) computing is divided to refer to carry out a kind of decentralized ad hoc fashion to the field of continuous type, i.e.
The codomain of the field of continuous type is divided into multiple sections (that is, multiple buckets), and divides bucket accordingly based on the bucket of division come definite
Characteristic value.Divide bucket computing that can generally be divided into supervision and divide bucket and unsupervised point of bucket, both types each include some tools
Body divides bucket mode, for example, have supervision that bucket is divided to may include that minimum entropy divides bucket, minimum description length to divide bucket etc., and unsupervised point of bucket
May include wide point of bucket, etc. deep divide bucket, divide bucket etc. based on k mean clusters.Under each point of bucket mode, corresponding point can be set
Bucket parameter, for example, width, depth etc..
It should be noted that exemplary embodiment according to the present invention, to field value data type for continuous type and be declared as from
Dissipate the non-targeted value field execution of feature divides bucket computing not limit a point species for bucket mode, does not also limit a point ginseng for bucket computing
Number, also, the specific representation for dividing bucket feature accordingly generated is also unrestricted.
As an example, it for continuous type and is declared as the non-targeted value fields of discrete features for field value data type and holds
A variety of points of capable bucket computings can have differences in terms of dividing bucket mode and/or dividing bucket parameter.For example, a variety of points of bucket computings
Can be that species is identical but divide bucket computing or variety classes with nonidentity operation parameter (for example, depth and width etc.)
Divide bucket computing.Correspondingly, each divides bucket computing to can obtain a point of bucket feature, these point of bucket feature collectively constitutes one point
Bucket group feature, this point of bucket group feature reflects difference and divides bucket computing, is machine so as to improve the validity of machine learning material
Training/prediction of device learning model provides preferable basis.
That is, exemplary embodiment according to the present invention, it is continuous type that can be directed to each field value data type
And the non-targeted value field for being declared as discrete features performs at least one and divides bucket computing and obtain corresponding at least one point of bucket
Feature obtains feature corresponding with the field using each point of bucket feature as a component, and using this feature as
Unit character.Here, it should be appreciated that divide the execution of bucket computing so that field value data type is continuous type and is declared as discrete spy
The non-targeted value field of sign by decentralized is inserted in corresponding specific bucket, in multiple points of bucket features after conversion, Mei Gewei
Degree can both indicate the centrifugal pump (for example, " 0 " or " 1 ") of continuous feature whether is assigned in bucket, can also indicate that specific
Serial number (for example, the average value of each continuous feature in the continuous actual characteristic value of feature or its normalized value, the bucket, in
Between value, boundary value etc.).Correspondingly, in machine learning each dimension of concrete application centrifugal pump (for example, for classification problem)
Or during serial number (for example, for regression problem), the combination (for example, cartesian product etc.) or continuous between centrifugal pump can be carried out
Combination (for example, arithmetical operation combination etc.) between numerical value.
In step S104, combinations of features is carried out based on the unit character of generation, to generate assemblage characteristic.
As an example, can whole unit characters of generation be carried out with various combinations to obtain candidate combinations feature, alternatively, right
Feature unit character of high importance carries out various combinations to obtain candidate combinations feature among whole unit characters of generation;
Then, can be sieved by weighing the effect of machine learning model corresponding with each candidate combinations feature from candidate combinations feature
Select assemblage characteristic.Particularly, machine learning model corresponding with each candidate combinations feature can be trained, due to corresponding machine
The effect of device learning model can reflect the feature importance (for example, predictive power) of candidate combinations feature, be weighed so as to pass through
The effect of machine learning model corresponding with each candidate combinations feature filters out assemblage characteristic, example from candidate combinations feature
Such as, the effect of machine learning model is better, and corresponding candidate combinations feature is more easily screened as assemblage characteristic.As an example,
The model-evaluation index specified can be used to evaluate the effect of machine learning model corresponding with each candidate combinations feature.As
Example can carry out designated model evaluation index automatically or the instruction according to user.
As an example, model-evaluation index can be AUC (ROC (Receiver Operating Characteristics, ReceiverOperating
Characteristic) area under a curve, Area Under ROC Curve), MAE (mean absolute error, Mean
Absolute Error) or logarithm loss function (logloss) etc..
As an example, the unit character of important the first preset condition of sexual satisfaction of feature among whole unit characters can be carried out
It is various to combine to obtain candidate combinations feature.For example, feature importance among whole unit characters can be in the first default threshold
Unit character in the range of value carries out various combinations to obtain candidate combinations feature, alternatively, important according to the feature of unit character
Property whole unit characters are ranked up from high to low, and the unit character of preceding first predetermined quantity is subjected to various combinations to obtain
Take candidate combinations feature.
As an example, it can determine the feature of unit character by weighing the effect of machine learning model corresponding with feature
Importance, the effect of corresponding machine learning model is better, and the feature importance of unit character is higher.For example, it can be used and spy
Levy the feature importance that corresponding machine learning model carrys out unit of measurement feature on the evaluation of estimate of model-evaluation index.Here,
It as an example, can automatically or the instruction according to user, to specify the model-evaluation index.
In step S105, unit character and assemblage characteristic based on generation obtain the feature of machine learning sample.
It as an example, can be using whole unit characters of generation and whole assemblage characteristics as the feature of machine learning sample.
As another example, can be by among whole unit characters of generation and whole assemblage characteristics, feature importance is higher
Feature of the feature as machine learning sample.As an example, can be by among whole unit characters and whole assemblage characteristics, feature
Feature of the feature of important the second preset condition of sexual satisfaction as machine learning sample, for example, can feature importance be in the
Feature of the feature as machine learning sample in two preset threshold ranges, alternatively, according to feature feature importance by height to
It is low to be ranked up whole unit characters and whole assemblage characteristics jointly, and using the feature of preceding second predetermined quantity as engineering
Practise the feature of sample.
It as another example, can be by feature unit character of high importance and generation among whole unit characters of generation
Whole assemblage characteristics, the feature as machine learning sample.It as an example, can be by whole assemblage characteristics together with feature importance
Meet feature of the unit character of the 3rd preset condition as machine learning sample, for example, can be by whole assemblage characteristics together with spy
Sign importance is in feature of the unit character as machine learning sample in the range of third predetermined threshold value, alternatively, according to unit
Whole unit characters are ranked up by the feature importance of feature from high to low, and the unit character of preceding 3rd predetermined quantity is connected
With feature of the whole assemblage characteristic as machine learning sample.
It as another example, can be by feature importance among whole unit characters of generation and whole assemblage characteristics of generation
Higher assemblage characteristic, the feature as machine learning sample.It as an example, can be by whole unit characters together with feature importance
Meet feature of the assemblage characteristic of the 4th preset condition as machine learning sample, for example, can be by whole unit characters together with spy
Sign importance is in feature of the assemblage characteristic as machine learning sample in the 4th preset threshold range, alternatively, according to combination
Whole assemblage characteristics are ranked up by the feature importance of feature from high to low, and the assemblage characteristic of preceding 4th predetermined quantity is connected
With feature of the whole unit character as machine learning sample.
In addition, the as an example, side of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample
Method may also include:After step S105, the feature of the machine learning sample shown to user.It further, can also be to
User shows the feature importance of each feature.
As an example, the method for the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample may be used also
Including:After step S105, the feature of obtained machine learning sample is directly applied to subsequent machine learning step.Example
Such as, the feature of the machine learning sample that can be directly based upon learns model.
The method that Fig. 3 shows the feature for automatically generating machine learning sample in accordance with an alternative illustrative embodiment of the present invention
Flow chart.
With reference to Fig. 3, in step s 201, user's specified data table is obtained.
In step S202, the characteristic type corresponding to each non-targeted value field in claim data table.
In step S203, each non-targeted value field is handled as unit feature according to the characteristic type of statement.
In step S204, various combinations are carried out to obtain candidate combinations feature to whole unit characters of generation, and are led to
Measurement is crossed to combine to filter out from candidate combinations feature with the effect of each corresponding machine learning model of candidate combinations feature
Feature.
In step S205, using whole unit characters of generation with whole assemblage characteristics as the spy of machine learning sample
Sign.
The method that Fig. 4 shows the feature for automatically generating machine learning sample in accordance with an alternative illustrative embodiment of the present invention
Flow chart.
With reference to Fig. 4, in step S301, user's specified data table is obtained.
In step s 302, the characteristic type corresponding to each non-targeted value field in claim data table.
In step S303, each non-targeted value field is handled as unit feature according to the characteristic type of statement.
In step s 304, the unit character of high importance of feature among whole unit characters of generation is carried out various
Combination obtains candidate combinations feature, and by weigh the effect of machine learning model corresponding with each candidate combinations feature come
Assemblage characteristic is filtered out from candidate combinations feature.
In step S305, by feature unit character of high importance among whole unit characters of generation and generation
Feature of whole assemblage characteristics as machine learning sample.
As an example, machine learning model corresponding with feature can be used to come on the evaluation of estimate of model-evaluation index AUC
The feature importance of feature is weighed, in step s 304, corresponding AUC value among whole unit characters of generation can be more than
0.5 and unit character less than 1 carry out various combinations to obtain candidate combinations feature, also, in step S305, can will generate
Whole unit characters among corresponding AUC value be more than 0.5 and less than 1 unit character and generation whole assemblage characteristic conducts
The feature of machine learning sample.
The method that Fig. 5 shows the feature for automatically generating machine learning sample in accordance with an alternative illustrative embodiment of the present invention
Flow chart.
With reference to Fig. 5, in step S401, user's specified data table is obtained.
In step S402, the characteristic type corresponding to each non-targeted value field in claim data table.
In step S403, each non-targeted value field is handled as unit feature according to the characteristic type of statement.
In step s 404, various combinations are carried out to obtain candidate combinations feature to whole unit characters of generation, and led to
Measurement is crossed to combine to filter out from candidate combinations feature with the effect of each corresponding machine learning model of candidate combinations feature
Feature.
In step S405, among whole unit characters of generation and whole assemblage characteristics, feature is of high importance
Feature of the feature as machine learning sample.
As an example, machine learning model corresponding with feature can be used to come on the evaluation of estimate of model-evaluation index AUC
The feature importance of feature is weighed, it, can be right by among whole unit characters of generation and whole assemblage characteristics in step S405
The AUC value answered is more than 0.5 and feature of the feature less than 1 as machine learning sample.
The illustrative methods of some features for automatically generating machine learning sample are enumerated above, however, art technology
Personnel should be understood that exemplary embodiment of the present invention is not limited to these methods, and it is (single that any appropriate feature may be employed
Position feature, candidate combinations feature or assemblage characteristic) generation or screening mode.
Exemplary embodiment according to the present invention can perform machine learning flow by the form of directed acyclic graph, should
Machine learning flow can cover all or part of step for carrying out machine learning model training, testing or estimating.It for example, can
The DAG of at least one step among comprising the following steps figures are established for machine learning model training:Historical data imports
Step, data splitting step, feature generation step, logistic regression step and model prediction step.That is, above-mentioned each step can
It is performed as the node in DAG figures.
Fig. 6 shows the example of the DAG figures according to an exemplary embodiment of the present invention for training machine learning model.
With reference to Fig. 6, the first step:Establish data delivery node.It is saved as an example, may be in response to user's operation and data are imported
The tables of data (that is, is imported machine learning platform by the banking business data table that point is configured to obtain entitled " bank "
In), wherein, a plurality of historgraphic data recording can be included in the tables of data.
Second step:It establishes data and splits node, and import data to node and be connected to data fractionation node, led above-mentioned
The tables of data entered is split as training set and verification collects, wherein, the data record in training set is used to be converted to machine learning sample
To learn model, and the data record concentrated is verified for being converted to test sample to verify the effect for the model for learning.
It may be in response to user's operation data fractionation node is configured in an arranged manner to split the tables of data of above-mentioned importing
Collect for training set and verification.
3rd step:Two feature generation nodes are established, and data fractionation node is connected respectively to the generation of the two features
Node carries out feature generation, for example, default data splits section respectively data are split with the training set of node output and verification collection
The output of point left side is training set, and right side output is verification collection.It is to be understood that for machine learning sample and test sample and
Speech, the feature generating mode of the two is corresponding consistent.It may be in response to user's operation to be configured feature generation node, example
Such as, it can refer to measurement index of target value field, the corresponding characteristic type of non-targeted value field, feature importance etc..
4th step:Feature algorithm (for example, logistic regression) node (that is, model training node) is established, and left side is special
Sign generation node is connected to logistic regression node, to train engineering based on machine learning sample using logistic regression algorithm
Practise model.User's operation is may be in response to logistic regression node is configured to carry out training airplane according to the logistic regression algorithm of setting
Device learning model.
5th step:Model prediction node is established, and logistic regression node and right feature generation node are connected to model
Node is predicted, to verify the effect of the machine learning model trained originally based on test specimens.User's operation be may be in response to mould
Type prediction node is configured the effect for carrying out verifier learning model with the verification mode according to setting.
After foundation includes the DAG figures of above-mentioned steps, entire DAG figures can be run according to the instruction of user.It is performing
When generating node to the feature, can automated execution above-mentioned example embodiment the feature for automatically generating machine learning sample
Method.
Fig. 7 shows the frame of the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample
Figure.As shown in fig. 7, the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample includes:Number
Dress is obtained according to table acquisition device 10, statement device 20, unit character generating means 30, assemblage characteristic generating means 40 and feature
Put 50.
Particularly, tables of data acquisition device 10 is used to obtain user's specified data table, wherein, a line pair of tables of data
A data is answered to record, the corresponding field of a row of tables of data.
State characteristic type of the device 20 corresponding to for each non-targeted value field in claim data table, wherein, it is special
Levying type includes discrete features and/or continuous feature.
As an example, non-targeted value field can obtain in the following manner:It is removed from all fields in tables of data
The target value field that user specifies.
As an example, statement device 20 can automatically or the instruction according to user, by all non-targeted value fields be claimed as from
Feature is dissipated, alternatively, each non-targeted value field is claimed as discrete features corresponding with its field value data type or continuous special
Sign.
It is unit by each non-targeted value field processing that unit character generating means 30, which are used for according to the characteristic type of statement,
Feature.
As an example, unit character generating means 30 can be directed to each field value data type as continuous type and be declared
For the non-targeted value field of discrete features, execution one or more divide bucket computing and divide bucket special to obtain corresponding one or more
Sign, and divide bucket feature whole feature as a unit by what is obtained.
Assemblage characteristic generating means 40 are used to carry out combinations of features based on the unit character of generation, special with generation combination
Sign.
As an example, assemblage characteristic generating means 40 may include:Candidate combinations feature acquiring unit (not shown) and combination
Feature Selection unit (not shown).
Candidate combinations feature acquiring unit is used to carry out whole unit characters of generation various combinations to obtain candidate set
Feature is closed, alternatively, carrying out various combinations to the unit character of high importance of feature among whole unit characters of generation to obtain
Take candidate combinations feature.
Assemblage characteristic screening unit is used for the effect by weighing machine learning model corresponding with each candidate combinations feature
Fruit from candidate combinations feature filters out assemblage characteristic.
Feature acquisition device 50 obtains the spy of machine learning sample for the unit character based on generation and assemblage characteristic
Sign.
As an example, feature acquisition device 50 can be using whole unit characters of generation and whole assemblage characteristics as engineering
Practise the feature of sample.
As another example, feature acquisition device 50 can by among whole unit characters of generation and whole assemblage characteristics,
Feature of the feature feature of high importance as machine learning sample.
As another example, feature acquisition device 50 can be of high importance by feature among whole unit characters of generation
Unit character and whole assemblage characteristics of generation, the feature as machine learning sample.
As another example, feature acquisition device 50 can be of high importance by feature among whole assemblage characteristics of generation
Assemblage characteristic and whole unit characters of generation, the feature as machine learning sample.
As an example, the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample may be used also
Including:Display device (not shown), display device are used for the machine learning sample obtained to user's display feature acquisition device 50
Feature.Further, as an example, display device can also show the feature importance of each feature to user.
As an example, the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample may be used also
Including:Application apparatus (not shown), application apparatus are used for the spy for the machine learning sample for directly obtaining feature acquisition device 50
Sign is applied to subsequent machine learning step.
As an example, exemplary reality according to the present invention can be made by starting operator corresponding with automated characterization generation step
Apply the system automated execution operation of the feature for automatically generating machine learning sample of example.
As an example, the operator may correspond to the node in directed acyclic graph corresponding with machine learning flow.
In addition, as an example, the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample is
System may also include:Alarm set (not shown), alarm set are used for situation of the operator in the not specified target value field of user
Under when being activated, abnormity prompt is provided.
It should be understood that the tool of the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample
Body realization method may be incorporated by reference the related specific implementation that Fig. 1 to Fig. 6 describes to realize, details are not described herein.
Device included by the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample
It can be individually configured any combination of the software for execution specific function, hardware, firmware or above-mentioned item.For example, these devices can
Corresponding to dedicated integrated circuit, pure software code is can also correspond to, also corresponds to the mould that software is combined with hardware
Block.In addition, the one or more functions realized of these devices also can by physical entity equipment (for example, processor, client or
Server etc.) in component seek unity of action.
It is to be understood that the method for the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample can pass through
The program in computer-readable media is recorded in realize, for example, exemplary embodiment according to the present invention, it is possible to provide one kind is used for
The computer-readable medium of the feature of machine learning sample is automatically generated, wherein, being recorded on the computer-readable medium has
For performing the computer program of following methods step:(A) user's specified data table is obtained, wherein, a line pair of tables of data
A data is answered to record, the corresponding field of a row of tables of data;(B) each non-targeted value field institute in claim data table is right
The characteristic type answered, wherein, characteristic type includes discrete features and/or continuous feature;It (C) will be each according to the characteristic type of statement
A non-targeted value field processing is unit feature;(D) combinations of features is carried out based on the unit character of generation, it is special with generation combination
Sign;And (E) unit character based on generation and assemblage characteristic obtain the feature of machine learning sample.
Computer program in above computer readable medium can be in client, host, agent apparatus, server etc.
Run in the environment disposed in computer equipment, it should be noted that the computer program can be additionally used in perform except above-mentioned steps with
Outer additional step or performed when performing above-mentioned steps more specifically handles, these additional steps and is further processed
Content is described referring to figs. 1 to Fig. 6, here in order to avoid repetition will be repeated no longer.
It should be noted that the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample can be complete
The operation of computer program is relied on to realize corresponding function, i.e. in the function structure of each device and computer program and each
Step is corresponding so that whole system is called by special software package (for example, lib storehouses), to realize corresponding function.
On the other hand, the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample is wrapped
The each device included can also be realized by hardware, software, firmware, middleware, microcode or its any combination.When with soft
When part, firmware, middleware or microcode are realized, for perform the program code of corresponding operating or code segment can be stored in it is all
As storage medium computer-readable medium in so that processor can be by reading and running corresponding program code or code
Section performs corresponding operation.
For example, exemplary embodiment of the present invention is also implemented as computing device, which includes storage unit
And processor, set of computer-executable instructions conjunction is stored in storage unit, when the set of computer-executable instructions is closed by institute
State processor perform when, perform automatically generate machine learning sample feature method.
Particularly, the computing device can be deployed in server or client, can also be deployed in distributed network
On node apparatus in network environment.In addition, the computing device can be PC computers, board device, personal digital assistant, intelligence
Can mobile phone, web applications or other be able to carry out the device of above-metioned instruction set.
Here, the computing device is not necessarily single computing device, can also be it is any can be alone or in combination
Perform the device of above-metioned instruction (or instruction set) or the aggregate of circuit.Computing device can also be integrated control system or system
A part for manager can be configured as with Local or Remote (for example, via wireless transmission) with the portable of interface inter-link
Formula electronic device.
In the computing device, processor may include central processing unit (CPU), graphics processor (GPU), may be programmed and patrol
Collect device, dedicated processor systems, microcontroller or microprocessor.As an example, not a limit, processor may also include simulation
Processor, digital processing unit, microprocessor, polycaryon processor, processor array, network processing unit etc..
Certain described in the method for the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample
A little operations can realize that some operations can be realized by hardware mode, in addition, can also pass through software and hardware knot by software mode
The mode of conjunction realizes these operations.
Processor can run the instruction being stored in one of storage unit or code, wherein, the storage unit can be with
Store data.Instruction and data can be also sent and received via Network Interface Unit and by network, wherein, the network connects
Any of transport protocol can be used in mouth device.
Storage unit can be integral to the processor and be integrated, for example, RAM or flash memory are arranged in integrated circuit microprocessor etc.
Within.In addition, storage unit may include independent device, such as, external dish driving, storage array or any Database Systems can
Other storage devices used.Storage unit and processor can operationally be coupled or can for example by I/O ports,
Network connection etc. communicates so that processor can read the file being stored in storage unit.
In addition, the computing device may also include video display (such as, liquid crystal display) and user interaction interface is (all
Such as, keyboard, mouse, touch input device etc.).The all components of computing device can be connected to each other via bus and/or network.
Operation involved by the method for the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample
It can be described as various interconnections or the functional block of coupling or function diagram.However, these functional blocks or function diagram can be impartial
Ground is integrated into single logic device or is operated according to non-exact border.
For example, the as described above, feature according to an exemplary embodiment of the present invention for being used to automatically generate machine learning sample
Computing device may include storage unit and processor, wherein, be stored in storage unit set of computer-executable instructions conjunction, when
When the set of computer-executable instructions conjunction is performed by the processor, following step is performed:(A) user's specified data is obtained
Table, wherein, a line of tables of data corresponds to a data record, the corresponding field of a row of tables of data;(B) in claim data table
Each non-targeted value field corresponding to characteristic type, wherein, characteristic type include discrete features and/or continuous feature;(C)
Each non-targeted value field is handled as unit feature according to the characteristic type of statement;(D) based on the unit character of generation come into
Row combinations of features, to generate assemblage characteristic;And (E) unit character based on generation and assemblage characteristic obtain machine learning sample
This feature.
The foregoing describe each exemplary embodiments of the present invention, it should be appreciated that foregoing description is only exemplary, and exhaustive
Property, the invention is not restricted to disclosed each exemplary embodiments.Without departing from the scope and spirit of the invention, it is right
Many modifications and changes will be apparent from for those skilled in the art.Therefore, protection of the invention
Scope should be subject to the scope of claim.
Claims (10)
1. a kind of method for the feature for automatically generating machine learning sample, including:
(A) user's specified data table is obtained, wherein, a line of tables of data corresponds to a data record, a row pair of tables of data
Answer a field;
(B) characteristic type corresponding to each non-targeted value field in claim data table, wherein, characteristic type includes discrete spy
Sign and/or continuous feature;
(C) it is unit feature by each non-targeted value field processing according to the characteristic type of statement;
(D) combinations of features is carried out based on the unit character of generation, to generate assemblage characteristic;And
(E) unit character based on generation and assemblage characteristic obtain the feature of machine learning sample.
2. according to the method described in claim 1, wherein, the method is by starting calculate corresponding with automated characterization generation step
Son and automated execution.
3. according to the method described in claim 2, wherein, the operator corresponds to directed acyclic corresponding with machine learning flow
Node in figure.
4. according to the method described in claim 1, wherein, in step (B),
All non-targeted value fields are claimed as discrete features by the automatic or instruction according to user, alternatively, by each non-targeted value
Field declaration is discrete features corresponding with its field value data type or continuous feature.
5. according to the method described in claim 1, wherein, step (D) includes:
Various combinations are carried out to whole unit characters of generation to obtain candidate combinations feature, alternatively, whole units to generation
Feature unit character of high importance carries out various combinations to obtain candidate combinations feature among feature;
It is screened by weighing the effect of machine learning model corresponding with each candidate combinations feature from candidate combinations feature
Go out assemblage characteristic.
6. according to the method described in claim 1, wherein, in step (E),
Using whole unit characters of generation with whole assemblage characteristics as the feature of machine learning sample;
Alternatively, by among whole unit characters of generation and whole assemblage characteristics, feature feature of high importance is as machine
The feature of learning sample;
It is alternatively, whole combinations of feature unit character of high importance among whole unit characters of generation and generation are special
Sign, the feature as machine learning sample;
It is alternatively, feature assemblage characteristic of high importance among whole assemblage characteristics of generation and whole units of generation is special
Sign, the feature as machine learning sample.
7. according to the method described in claim 4, wherein, in step (C),
For continuous type and the non-targeted value field of discrete features is declared as each field value data type, performs one kind
Or a variety of points of bucket computings divide bucket feature to obtain corresponding one or more, and divide bucket feature integrally as a list using what is obtained
Position feature.
8. a kind of system for the feature for automatically generating machine learning sample, including:
Tables of data acquisition device, for obtaining user's specified data table, wherein, a line of tables of data corresponds to a data note
Record, the corresponding field of a row of tables of data;
State device, for the characteristic type corresponding to each non-targeted value field in claim data table, wherein, characteristic type
Including discrete features and/or continuous feature;
Unit character generating means, for handling each non-targeted value field for unit feature according to the characteristic type of statement;
Assemblage characteristic generating means, for carrying out combinations of features based on the unit character of generation, to generate assemblage characteristic;And
Feature acquisition device obtains the feature of machine learning sample for the unit character based on generation and assemblage characteristic.
9. it is a kind of for automatically generating the computer-readable medium of the feature of machine learning sample, wherein, it can in the computer
Read the side that record on medium is useful for performing the feature for automatically generating machine learning sample as described in claim 1 to 7 is any
The computer program of method.
10. it is a kind of for automatically generating the computing device of the feature of machine learning sample, including storage unit and processor,
In, set of computer-executable instructions conjunction is stored in storage unit, when the set of computer-executable instructions is closed by the processing
When device performs, the method for the feature that automatically generates machine learning sample of the execution as described in claim 1 to 7 is any.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711445538.3A CN108090516A (en) | 2017-12-27 | 2017-12-27 | Automatically generate the method and system of the feature of machine learning sample |
PCT/CN2018/123910 WO2019129060A1 (en) | 2017-12-27 | 2018-12-26 | Method and system for automatically generating machine learning sample |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711445538.3A CN108090516A (en) | 2017-12-27 | 2017-12-27 | Automatically generate the method and system of the feature of machine learning sample |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108090516A true CN108090516A (en) | 2018-05-29 |
Family
ID=62179713
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711445538.3A Pending CN108090516A (en) | 2017-12-27 | 2017-12-27 | Automatically generate the method and system of the feature of machine learning sample |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108090516A (en) |
WO (1) | WO2019129060A1 (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408592A (en) * | 2018-10-12 | 2019-03-01 | 北京聚云位智信息科技有限公司 | The Feature Engineering knowledge base and its implementation of AI in a kind of decision type distributed data base system |
CN109634961A (en) * | 2018-12-05 | 2019-04-16 | 杭州大拿科技股份有限公司 | A kind of paper sample generating method, device, electronic equipment and storage medium |
CN109697066A (en) * | 2018-12-28 | 2019-04-30 | 第四范式(北京)技术有限公司 | Realize the method and system of tables of data splicing and automatic training machine learning model |
CN109739855A (en) * | 2018-12-28 | 2019-05-10 | 第四范式(北京)技术有限公司 | Realize the method and system of tables of data splicing and automatic training machine learning model |
WO2019129060A1 (en) * | 2017-12-27 | 2019-07-04 | 第四范式(北京)技术有限公司 | Method and system for automatically generating machine learning sample |
CN110297833A (en) * | 2019-07-05 | 2019-10-01 | 税安科技(杭州)有限公司 | A kind of bordereau error correction method |
CN110443864A (en) * | 2019-07-24 | 2019-11-12 | 北京大学 | A kind of characters in a fancy style body automatic generation method based on single phase a small amount of sample learning |
CN110457329A (en) * | 2019-08-16 | 2019-11-15 | 第四范式(北京)技术有限公司 | A kind of method and device for realizing personalized recommendation |
CN110851500A (en) * | 2019-11-07 | 2020-02-28 | 北京集奥聚合科技有限公司 | Method for generating expert characteristic dimension required by machine learning modeling |
CN111325578A (en) * | 2020-02-20 | 2020-06-23 | 深圳市腾讯计算机系统有限公司 | Prediction model sample determination method, prediction model sample determination device, prediction model sample determination medium, and prediction model sample determination device |
CN111832740A (en) * | 2019-12-30 | 2020-10-27 | 上海氪信信息技术有限公司 | Method for deriving machine learning characteristics from structured data in real time |
CN112184279A (en) * | 2019-07-05 | 2021-01-05 | 上海哔哩哔哩科技有限公司 | AUC index rapid calculation method and device and computer equipment |
CN112380205A (en) * | 2020-11-17 | 2021-02-19 | 北京融七牛信息技术有限公司 | Method and system for automatically generating characteristics of distributed architecture |
CN112434032A (en) * | 2020-11-17 | 2021-03-02 | 北京融七牛信息技术有限公司 | Automatic feature generation system and method |
WO2022089652A1 (en) * | 2020-11-02 | 2022-05-05 | 第四范式(北京)技术有限公司 | Method and system for processing data tables and automatically training machine learning model |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11062792B2 (en) | 2017-07-18 | 2021-07-13 | Analytics For Life Inc. | Discovering genomes to use in machine learning techniques |
US11139048B2 (en) | 2017-07-18 | 2021-10-05 | Analytics For Life Inc. | Discovering novel features to use in machine learning techniques, such as machine learning techniques for diagnosing medical conditions |
CN112347320A (en) * | 2020-11-05 | 2021-02-09 | 杭州数梦工场科技有限公司 | Associated field recommendation method and device for data table field |
CN112613983B (en) * | 2020-12-25 | 2023-11-21 | 北京知因智慧科技有限公司 | Feature screening method and device in machine modeling process and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105677353A (en) * | 2016-01-08 | 2016-06-15 | 北京物思创想科技有限公司 | Feature extraction method and machine learning method and device thereof |
CN107316082A (en) * | 2017-06-15 | 2017-11-03 | 第四范式(北京)技术有限公司 | For the method and system for the feature importance for determining machine learning sample |
CN107392319A (en) * | 2017-07-20 | 2017-11-24 | 第四范式(北京)技术有限公司 | Generate the method and system of the assemblage characteristic of machine learning sample |
CN107451266A (en) * | 2017-07-31 | 2017-12-08 | 北京京东尚科信息技术有限公司 | For processing data method and its equipment |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108090516A (en) * | 2017-12-27 | 2018-05-29 | 第四范式(北京)技术有限公司 | Automatically generate the method and system of the feature of machine learning sample |
-
2017
- 2017-12-27 CN CN201711445538.3A patent/CN108090516A/en active Pending
-
2018
- 2018-12-26 WO PCT/CN2018/123910 patent/WO2019129060A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105677353A (en) * | 2016-01-08 | 2016-06-15 | 北京物思创想科技有限公司 | Feature extraction method and machine learning method and device thereof |
CN107316082A (en) * | 2017-06-15 | 2017-11-03 | 第四范式(北京)技术有限公司 | For the method and system for the feature importance for determining machine learning sample |
CN107392319A (en) * | 2017-07-20 | 2017-11-24 | 第四范式(北京)技术有限公司 | Generate the method and system of the assemblage characteristic of machine learning sample |
CN107451266A (en) * | 2017-07-31 | 2017-12-08 | 北京京东尚科信息技术有限公司 | For processing data method and its equipment |
Non-Patent Citations (1)
Title |
---|
管震 等: "《云,就该这么玩儿》", 31 July 2015 * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019129060A1 (en) * | 2017-12-27 | 2019-07-04 | 第四范式(北京)技术有限公司 | Method and system for automatically generating machine learning sample |
CN109408592A (en) * | 2018-10-12 | 2019-03-01 | 北京聚云位智信息科技有限公司 | The Feature Engineering knowledge base and its implementation of AI in a kind of decision type distributed data base system |
CN109408592B (en) * | 2018-10-12 | 2021-09-24 | 北京聚云位智信息科技有限公司 | AI characteristic engineering knowledge base in decision-making type distributed database system and implementation method thereof |
CN109634961A (en) * | 2018-12-05 | 2019-04-16 | 杭州大拿科技股份有限公司 | A kind of paper sample generating method, device, electronic equipment and storage medium |
CN109634961B (en) * | 2018-12-05 | 2021-06-04 | 杭州大拿科技股份有限公司 | Test paper sample generation method and device, electronic equipment and storage medium |
CN109697066A (en) * | 2018-12-28 | 2019-04-30 | 第四范式(北京)技术有限公司 | Realize the method and system of tables of data splicing and automatic training machine learning model |
CN109739855A (en) * | 2018-12-28 | 2019-05-10 | 第四范式(北京)技术有限公司 | Realize the method and system of tables of data splicing and automatic training machine learning model |
CN109697066B (en) * | 2018-12-28 | 2021-02-05 | 第四范式(北京)技术有限公司 | Method and system for realizing data sheet splicing and automatically training machine learning model |
CN112184279A (en) * | 2019-07-05 | 2021-01-05 | 上海哔哩哔哩科技有限公司 | AUC index rapid calculation method and device and computer equipment |
CN110297833A (en) * | 2019-07-05 | 2019-10-01 | 税安科技(杭州)有限公司 | A kind of bordereau error correction method |
CN110443864A (en) * | 2019-07-24 | 2019-11-12 | 北京大学 | A kind of characters in a fancy style body automatic generation method based on single phase a small amount of sample learning |
CN110443864B (en) * | 2019-07-24 | 2021-03-02 | 北京大学 | Automatic artistic font generation method based on single-stage small-amount sample learning |
CN110457329B (en) * | 2019-08-16 | 2022-05-06 | 第四范式(北京)技术有限公司 | Method and device for realizing personalized recommendation |
CN110457329A (en) * | 2019-08-16 | 2019-11-15 | 第四范式(北京)技术有限公司 | A kind of method and device for realizing personalized recommendation |
CN110851500B (en) * | 2019-11-07 | 2022-10-28 | 北京集奥聚合科技有限公司 | Method for generating expert characteristic dimension required by machine learning modeling |
CN110851500A (en) * | 2019-11-07 | 2020-02-28 | 北京集奥聚合科技有限公司 | Method for generating expert characteristic dimension required by machine learning modeling |
CN111832740A (en) * | 2019-12-30 | 2020-10-27 | 上海氪信信息技术有限公司 | Method for deriving machine learning characteristics from structured data in real time |
CN111325578A (en) * | 2020-02-20 | 2020-06-23 | 深圳市腾讯计算机系统有限公司 | Prediction model sample determination method, prediction model sample determination device, prediction model sample determination medium, and prediction model sample determination device |
CN111325578B (en) * | 2020-02-20 | 2023-10-31 | 深圳市腾讯计算机系统有限公司 | Sample determination method and device of prediction model, medium and equipment |
WO2022089652A1 (en) * | 2020-11-02 | 2022-05-05 | 第四范式(北京)技术有限公司 | Method and system for processing data tables and automatically training machine learning model |
CN112434032A (en) * | 2020-11-17 | 2021-03-02 | 北京融七牛信息技术有限公司 | Automatic feature generation system and method |
CN112380205A (en) * | 2020-11-17 | 2021-02-19 | 北京融七牛信息技术有限公司 | Method and system for automatically generating characteristics of distributed architecture |
CN112380205B (en) * | 2020-11-17 | 2024-04-02 | 北京融七牛信息技术有限公司 | Automatic feature generation method and system of distributed architecture |
CN112434032B (en) * | 2020-11-17 | 2024-04-05 | 北京融七牛信息技术有限公司 | Automatic feature generation system and method |
Also Published As
Publication number | Publication date |
---|---|
WO2019129060A1 (en) | 2019-07-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108090516A (en) | Automatically generate the method and system of the feature of machine learning sample | |
CN110399770B (en) | Generating machine learning models for objects based on enhancing objects with physical properties | |
US11836578B2 (en) | Utilizing machine learning models to process resource usage data and to determine anomalous usage of resources | |
US20200287923A1 (en) | Unsupervised learning to simplify distributed systems management | |
CN107844837A (en) | The method and system of algorithm parameter tuning are carried out for machine learning algorithm | |
CN107704871A (en) | Generate the method and system of the assemblage characteristic of machine learning sample | |
CN107004185A (en) | The pipeline generation of the control actuated for data flow | |
Dakos | Identifying best-indicator species for abrupt transitions in multispecies communities | |
CN107766946A (en) | Generate the method and system of the assemblage characteristic of machine learning sample | |
CN107392319A (en) | Generate the method and system of the assemblage characteristic of machine learning sample | |
CN107316082A (en) | For the method and system for the feature importance for determining machine learning sample | |
US11574430B2 (en) | Method and system for creating animal type avatar using human face | |
US10262079B1 (en) | Determining anonymized temporal activity signatures of individuals | |
CN107908566A (en) | Automatic test management method, device, terminal device and storage medium | |
CN108008942A (en) | The method and system handled data record | |
CN109783859A (en) | Model building method, device and computer readable storage medium | |
CN109313720A (en) | The strength neural network of external memory with sparse access | |
CN110995459B (en) | Abnormal object identification method, device, medium and electronic equipment | |
US20190303836A1 (en) | Determining optimal workforce types to fulfill occupational roles in an organization based on occupational attributes | |
CN111243682A (en) | Method, device, medium and apparatus for predicting toxicity of drug | |
CN107679549A (en) | Generate the method and system of the assemblage characteristic of machine learning sample | |
CN104731843B (en) | The method and system of origin and accuracy tradeoff in equilibrium criterion modeling | |
CN107273979A (en) | The method and system of machine learning prediction are performed based on service class | |
CN109858528A (en) | Recommender system training method, device, computer equipment and storage medium | |
CN107909087A (en) | Generate the method and system of the assemblage characteristic of machine learning sample |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |