CN108090516A - Automatically generate the method and system of the feature of machine learning sample - Google Patents

Automatically generate the method and system of the feature of machine learning sample Download PDF

Info

Publication number
CN108090516A
CN108090516A CN201711445538.3A CN201711445538A CN108090516A CN 108090516 A CN108090516 A CN 108090516A CN 201711445538 A CN201711445538 A CN 201711445538A CN 108090516 A CN108090516 A CN 108090516A
Authority
CN
China
Prior art keywords
feature
machine learning
generation
data
learning sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711445538.3A
Other languages
Chinese (zh)
Inventor
杨强
戴文渊
陈雨强
孙迪
杨慧斌
刘守湘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN201711445538.3A priority Critical patent/CN108090516A/en
Publication of CN108090516A publication Critical patent/CN108090516A/en
Priority to PCT/CN2018/123910 priority patent/WO2019129060A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

A kind of method and system for the feature for automatically generating machine learning sample are provided.The described method includes:(A) user's specified data table is obtained, wherein, a line of tables of data corresponds to a data record, the corresponding field of a row of tables of data;(B) characteristic type corresponding to each non-targeted value field in claim data table, wherein, characteristic type includes discrete features and/or continuous feature;(C) it is unit feature by each non-targeted value field processing according to the characteristic type of statement;(D) combinations of features is carried out based on the unit character of generation, to generate assemblage characteristic;And (E) unit character based on generation and assemblage characteristic obtain the feature of machine learning sample.According to the method and system, the feature of machine learning sample can be automatically generated based on tables of data, both reduced Feature Engineering uses threshold, improves the ease for use of Feature Engineering, and improves the efficiency of Feature Engineering.

Description

Automatically generate the method and system of the feature of machine learning sample
Technical field
All things considered of the present invention is related to artificial intelligence field, more particularly, is related to one kind and automatically generates machine learning sample The method and system of this feature.
Background technology
With the appearance of mass data, people tend to excavate bid value from data using machine learning techniques.
The basic process of training machine learning model mainly includes:
1st, the data set (for example, tables of data) for including historgraphic data recording is imported;
2nd, Feature Engineering is completed, wherein, various processing are carried out by the attribute information for the data record concentrated to data, with Each feature is obtained, the feature vector that these features are formed can be used as machine learning sample;
3rd, training pattern, wherein, according to setting machine learning algorithm (for example, logistic regression algorithm, decision Tree algorithms, Neural network algorithm etc.), learn model based on the obtained machine learning sample of Feature Engineering is passed through.
In above process, generate that the processing of feature is critically important, it can influence the quality of model.Per data in tables of data Record may include multiple attribute informations (that is, field), and feature may indicate that each field in itself or the various fields such as the combination of field (or computing) is handled as a result, preferably to reflect the internal association of data distribution and interfield and latent meaning, it is therefore, special The quality of sign construction quality directly determines the accuracy that Machine Learning Problems are portrayed, and then influences the quality of model.
On existing machine learning platform, it can be used based on the interactive mode of graphical interfaces to complete machine learning model Training flow, program code is write without user in person.However, in Feature Engineering link, but will often be manually set Feature generating mode be manually input in plateform system.That is, it is necessary to which user presets machine learning sample Feature, on the one hand, user is needed to have deep understanding to business scenario, i.e. user sets feature by business experience;It is another Aspect, generally in machine-learning process, the data volume for using data is all bigger, and user cannot comprehensively analyze number sometimes According to can cause to set some invalid features, in order to improve the effect of the feature of machine learning sample, this just needs user to carry out It continuously attempts to, when in face of big data quantity and high dimensional feature, such need of work spends longer time.In this case, User is not only needed to have deep understanding to business scenario, adds the workload of user, also reduces the efficiency of machine learning.
The content of the invention
Exemplary embodiment of the present invention is to provide a kind of method of feature for automatically generating machine learning sample and is System, to solve the problems, such as the feature existing in the prior art that cannot easily generate machine learning sample.
Exemplary embodiment according to the present invention provides a kind of method for the feature for automatically generating machine learning sample, bag It includes:(A) user's specified data table is obtained, wherein, a line of tables of data corresponds to a data record, and a row of tables of data correspond to One field;(B) characteristic type corresponding to each non-targeted value field in claim data table, wherein, characteristic type includes Discrete features and/or continuous feature;(C) it is unit feature by each non-targeted value field processing according to the characteristic type of statement; (D) combinations of features is carried out based on the unit character of generation, to generate assemblage characteristic;And the unit character of (E) based on generation The feature of machine learning sample is obtained with assemblage characteristic.
Optionally, the method is by starting operator corresponding with automated characterization generation step and automated execution.
Optionally, the operator corresponds to the node in directed acyclic graph corresponding with machine learning flow.
Optionally, non-targeted value field obtains in the following manner:User is removed from all fields in tables of data The target value field specified.
Optionally, the operator provides abnormity prompt when being activated in the case of the not specified target value field of user.
Optionally, in step (B), all non-targeted value fields are claimed as discrete by the automatic or instruction according to user Feature, alternatively, each non-targeted value field is claimed as discrete features corresponding with its field value data type or continuous feature.
Optionally, step (D) includes:Whole unit characters of generation are carried out with various combinations to obtain candidate combinations spy Sign is waited alternatively, carrying out various combinations to the unit character of high importance of feature among whole unit characters of generation to obtain Select assemblage characteristic;By weighing the effect of corresponding with each candidate combinations feature machine learning model come from candidate combinations feature In filter out assemblage characteristic.
Optionally, in step (E), using whole unit characters of generation and whole assemblage characteristics as machine learning sample Feature;Alternatively, by among whole unit characters of generation and whole assemblage characteristics, feature feature of high importance is as machine The feature of device learning sample;Alternatively, by feature unit character of high importance and generation among whole unit characters of generation Whole assemblage characteristics, the feature as machine learning sample;Alternatively, by feature importance among whole assemblage characteristics of generation Higher assemblage characteristic and whole unit characters of generation, the feature as machine learning sample.
Optionally, the method further includes:(F) feature of the machine learning sample shown to user.
Optionally, in step (F), the feature importance of each feature is also shown to user.
Optionally, the method further includes:(G) feature of obtained machine learning sample is directly applied to subsequent machine Device learning procedure.
Optionally, in step (C), for continuous type and discrete features are declared as each field value data type Non-targeted value field, perform one or more and divide bucket computing and divide bucket feature to obtain corresponding one or more, and will obtain Divide bucket feature whole feature as a unit.
In accordance with an alternative illustrative embodiment of the present invention, provide a kind of feature for automatically generating machine learning sample is System, including:Tables of data acquisition device, for obtaining user's specified data table, wherein, a line of tables of data corresponds to a data Record, the corresponding field of a row of tables of data;State device, it is right for each non-targeted value field institute in claim data table The characteristic type answered, wherein, characteristic type includes discrete features and/or continuous feature;Unit character generating means, for according to Each non-targeted value field processing is unit feature by the characteristic type of statement;Assemblage characteristic generating means, for being based on generating Unit character carry out combinations of features, to generate assemblage characteristic;And feature acquisition device, it is special for the unit based on generation Assemblage characteristic seek peace to obtain the feature of machine learning sample.
Optionally, operate the system automated execution by starting operator corresponding with automated characterization generation step.
Optionally, the operator corresponds to the node in directed acyclic graph corresponding with machine learning flow.
Optionally, non-targeted value field obtains in the following manner:User is removed from all fields in tables of data The target value field specified.
Optionally, the system also includes:Alarm set, for the operator the not specified target value field of user feelings When being activated under condition, abnormity prompt is provided.
Optionally, state that device is automatic or the instruction according to user, all non-targeted value fields be claimed as discrete features, Alternatively, each non-targeted value field is claimed as discrete features corresponding with its field value data type or continuous feature.
Optionally, assemblage characteristic generating means include:Candidate combinations feature acquiring unit, for whole units to generation Feature carries out various combinations and obtains candidate combinations feature, alternatively, to feature importance among whole unit characters of generation compared with High unit character carries out various combinations to obtain candidate combinations feature;Assemblage characteristic screening unit, for pass through weigh with it is every The effect of a corresponding machine learning model of candidate combinations feature filters out assemblage characteristic from candidate combinations feature.
Optionally, feature acquisition device is using whole unit characters of generation and whole assemblage characteristics as machine learning sample Feature;Alternatively, feature acquisition device, by among whole unit characters of generation and whole assemblage characteristics, feature importance is higher Feature of the feature as machine learning sample;Alternatively, feature acquisition device is by feature weight among whole unit characters of generation The higher unit character of the property wanted and whole assemblage characteristics of generation, the feature as machine learning sample;Alternatively, feature obtains dress Feature assemblage characteristic of high importance and whole unit characters of generation among whole assemblage characteristics by generation are put, as machine The feature of device learning sample.
Optionally, the system also includes:Display device, for the spy of the machine learning sample shown to user Sign.
Optionally, display device also shows the feature importance of each feature to user.
Optionally, the system also includes:Application apparatus, for directly by the feature application of obtained machine learning sample In subsequent machine learning step.
Optionally, unit character generating means for continuous type and are declared as discrete for each field value data type The non-targeted value field of feature, execution one or more divide bucket computing and divide bucket feature to obtain corresponding one or more, and will What is obtained divides bucket feature whole feature as a unit.
In accordance with an alternative illustrative embodiment of the present invention, a kind of feature for being used to automatically generate machine learning sample is provided Computer-readable medium, wherein, record is useful for performing on the computer-readable medium automatically generates machine as described above The computer program of the method for the feature of device learning sample.
In accordance with an alternative illustrative embodiment of the present invention, a kind of feature for being used to automatically generate machine learning sample is provided Computing device, including storage unit and processor, wherein, set of computer-executable instructions conjunction is stored in storage unit, works as institute Set of computer-executable instructions conjunction is stated when being performed by the processor, performs and automatically generates machine learning sample as described above The method of feature.
In the method and system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample, energy Enough features that machine learning sample is automatically generated based on tables of data, both reduced Feature Engineering uses threshold, improves feature The ease for use of engineering, and improve the efficiency of Feature Engineering.
Part in following description is illustrated into the other aspect and/or advantage of present general inventive concept, also has one Divide and will be apparent by description or the implementation of present general inventive concept can be passed through and learnt.
Description of the drawings
By with reference to be exemplarily illustrated embodiment attached drawing carry out description, exemplary embodiment of the present it is upper It states and will become apparent with other purposes and feature, wherein:
Fig. 1 shows the flow of the method for the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample Figure;
Fig. 2 shows according to an exemplary embodiment of the present invention to specify the corresponding characteristic type of non-targeted value field by user Example;
The method that Fig. 3 shows the feature for automatically generating machine learning sample in accordance with an alternative illustrative embodiment of the present invention Flow chart;
The method that Fig. 4 shows the feature for automatically generating machine learning sample in accordance with an alternative illustrative embodiment of the present invention Flow chart;
The method that Fig. 5 shows the feature for automatically generating machine learning sample in accordance with an alternative illustrative embodiment of the present invention Flow chart;
Fig. 6 shows the example of the DAG figures according to an exemplary embodiment of the present invention for training machine learning model;
Fig. 7 shows the frame of the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample Figure.
Specific embodiment
The embodiment of the present invention is reference will now be made in detail, the example of the embodiment is shown in the drawings, wherein, identical mark Number identical component is referred to always.It will illustrate the embodiment by referring to accompanying drawing below, to explain the present invention.
Here, machine learning is the inevitable outcome that artificial intelligence study develops to certain phase, is directed to passing through calculating Means, improve the performance of system itself using experience.In computer systems, " experience " is usually deposited in the form of " data " By machine learning algorithm, " model " can be being generated from data, that is to say, that machine learning is supplied to calculate empirical data Method can just be based on these empirical datas and generate model, and when in face of news, model can provide corresponding judgement, i.e. prediction As a result.Whether training machine learning model or predicted using trained machine learning model, data are required for turning It is changed to the machine learning sample including various features.Machine learning can be implemented as " supervised learning ", " unsupervised learning " or The form of " semi-supervised learning ", it should be noted that exemplary embodiment of the present invention is to specific machine learning algorithm and without spy Definite limitation.Further, it should also be noted that during training and application model, other means such as statistic algorithm are may also be combined with.
Fig. 1 shows the flow of the method for the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample Figure.Here, as an example, the method can be performed by computer program, also can machine learning be automatically generated by special The system of the feature of sample or computing device perform.
As an example, the method can be by starting operator corresponding with automated characterization generation step and automated execution.It changes Yan Zhi, when operator corresponding with automated characterization generation step is activated, by automated execution the method.Further, as Example, the operator correspond to the node in directed acyclic graph corresponding with machine learning flow (DAG figures).For example, and machine The corresponding DAG figures of learning process may include that feature generates node, when running entire DAG figures, go to the feature generation During node, by automated execution the method.In the following, exemplary embodiment according to the present invention will be instructed with reference to Fig. 6 The DAG figures for practicing machine learning model are described in detail.
With reference to Fig. 1, in step S101, user's specified data table is obtained.Here, a line of tables of data corresponds to a number According to record, the corresponding field of a row of tables of data.In other words, every data record in tables of data has and each field phase The field value answered.As an example, record is seen as the description as described in an event or object per data, show corresponding to one Example or sample, each field can be used for the performance in one aspect of description event or object or property (for example, name, age, duty Industry etc.).
As an example, can provide a user to specify the graphical interfaces of tables of data, and according to user in the graphical interfaces The input operation of upper execution, to determine the tables of data specified by user.
In step s 102, the characteristic type corresponding to each non-targeted value field in claim data table, wherein, feature Type includes discrete features and/or continuous feature.
Here, even if field of the target value field corresponding to the mark to be estimated of machine learning techniques (that is, label), The field correspond to supervised learning in the case of prediction target rather than target value field, that is, tables of data among remove target value field Outside field.
In the case of supervised learning, as an example, non-targeted value field can obtain in the following manner:From data The target value field that user specifies is removed in all fields in table.As an example, it can provide a user to specify desired value The graphical interfaces of field, and the input performed according to user on the graphical interfaces operates, to determine the target specified by user Value field.Further, as an example, the operator can be carried when being activated in the case of the not specified target value field of user For abnormity prompt, user to be reminded to specify target value field.
Moreover, it should be understood that may include target value field in tables of data, target value field also may not include.
Continuously it is characterized in and a kind of opposite feature of discrete features (for example, category feature), value can have one Fixed successional numerical value, for example, age, amount of money etc..Relatively, as an example, the value of discrete features does not have continuity, example Such as, can be " from Beijing ", " from Shanghai " or the unordered classification such as " from Tianjin ", " gender is man ", " gender is female " Feature.
As an example, can automatically or the instruction according to user, by all non-targeted value fields be claimed as discrete features or Each non-targeted value field is claimed as discrete features corresponding with its field value data type or continuous feature by person.
As an example, the field value data type of field can be continuous type (for example, numeric type (for example, integer int)) or Discrete type (for example, text-type (for example, character string type string)).As an example, by each non-targeted value field be claimed as with The step of its corresponding discrete features of field value data type or continuous feature, may include:By the field value data class in tables of data Type is claimed as discrete features for the non-targeted value field of discrete type, and is continuous type by the field value data type in tables of data Non-targeted value field is claimed as continuous feature.
As an example, can provide a user to specify the graphical interfaces of the corresponding characteristic type of non-targeted value field, and The input performed according to user on the graphical interfaces operates, and all non-targeted value fields are claimed as discrete features, alternatively, will Each non-targeted value field is claimed as discrete features corresponding with its field value data type or continuous feature.
It describes according to an exemplary embodiment of the present invention to specify non-mesh by graphical interfaces by user with reference to Fig. 2 The example of the corresponding characteristic type of scale value field.As shown in Fig. 2, for specifying the figure of the corresponding characteristic type of non-targeted value field Shape interface can show radio button " all discrete " and radio button " discrete+continuous " (the two buttons can be selected one and be chosen), Selection operation of the user to radio button " all discrete " is may be in response to, all non-targeted value fields in tables of data are claimed as Discrete features;Selection operation of the user to radio button " discrete+continuous " is may be in response to, according to the number of each non-targeted value field According to type by the field declaration be corresponding discrete features or continuous feature, here, can according to the characteristic of field value come automatically Judge the data type of field, and and then be that field declaration is discrete features by discrete type or continuous type according to data type Or continuous feature.In addition, may also display to specify the control of target value field in the graphical interfaces, user can be by this Target value field is specified in the operation of control.In addition, it may also display each field in tables of data on the left of the graphical interfaces Field name and field value data type.
Referring back to Fig. 1, in step s 103, each non-targeted value field is handled as unit according to the characteristic type of statement Feature.In other words, it is respectively a unit character by each non-targeted value field processing according to the characteristic type of statement.
As an example, for continuous type and the non-targeted value of discrete features can be declared as to each field value data type Field carries out sliding-model control, to obtain a unit character.
It is to be understood that unit character here refers to that this feature corresponds to single field, itself can be according to the definition of value And with one or more dimensions.Optionally, each field value data type can be directed to as continuous type and be declared as discrete The non-targeted value field of feature, execution one or more divide bucket computing and divide bucket feature to obtain corresponding one or more, and will What is obtained divides bucket feature whole feature as a unit.
Here, bucket (binning) computing is divided to refer to carry out a kind of decentralized ad hoc fashion to the field of continuous type, i.e. The codomain of the field of continuous type is divided into multiple sections (that is, multiple buckets), and divides bucket accordingly based on the bucket of division come definite Characteristic value.Divide bucket computing that can generally be divided into supervision and divide bucket and unsupervised point of bucket, both types each include some tools Body divides bucket mode, for example, have supervision that bucket is divided to may include that minimum entropy divides bucket, minimum description length to divide bucket etc., and unsupervised point of bucket May include wide point of bucket, etc. deep divide bucket, divide bucket etc. based on k mean clusters.Under each point of bucket mode, corresponding point can be set Bucket parameter, for example, width, depth etc..
It should be noted that exemplary embodiment according to the present invention, to field value data type for continuous type and be declared as from Dissipate the non-targeted value field execution of feature divides bucket computing not limit a point species for bucket mode, does not also limit a point ginseng for bucket computing Number, also, the specific representation for dividing bucket feature accordingly generated is also unrestricted.
As an example, it for continuous type and is declared as the non-targeted value fields of discrete features for field value data type and holds A variety of points of capable bucket computings can have differences in terms of dividing bucket mode and/or dividing bucket parameter.For example, a variety of points of bucket computings Can be that species is identical but divide bucket computing or variety classes with nonidentity operation parameter (for example, depth and width etc.) Divide bucket computing.Correspondingly, each divides bucket computing to can obtain a point of bucket feature, these point of bucket feature collectively constitutes one point Bucket group feature, this point of bucket group feature reflects difference and divides bucket computing, is machine so as to improve the validity of machine learning material Training/prediction of device learning model provides preferable basis.
That is, exemplary embodiment according to the present invention, it is continuous type that can be directed to each field value data type And the non-targeted value field for being declared as discrete features performs at least one and divides bucket computing and obtain corresponding at least one point of bucket Feature obtains feature corresponding with the field using each point of bucket feature as a component, and using this feature as Unit character.Here, it should be appreciated that divide the execution of bucket computing so that field value data type is continuous type and is declared as discrete spy The non-targeted value field of sign by decentralized is inserted in corresponding specific bucket, in multiple points of bucket features after conversion, Mei Gewei Degree can both indicate the centrifugal pump (for example, " 0 " or " 1 ") of continuous feature whether is assigned in bucket, can also indicate that specific Serial number (for example, the average value of each continuous feature in the continuous actual characteristic value of feature or its normalized value, the bucket, in Between value, boundary value etc.).Correspondingly, in machine learning each dimension of concrete application centrifugal pump (for example, for classification problem) Or during serial number (for example, for regression problem), the combination (for example, cartesian product etc.) or continuous between centrifugal pump can be carried out Combination (for example, arithmetical operation combination etc.) between numerical value.
In step S104, combinations of features is carried out based on the unit character of generation, to generate assemblage characteristic.
As an example, can whole unit characters of generation be carried out with various combinations to obtain candidate combinations feature, alternatively, right Feature unit character of high importance carries out various combinations to obtain candidate combinations feature among whole unit characters of generation; Then, can be sieved by weighing the effect of machine learning model corresponding with each candidate combinations feature from candidate combinations feature Select assemblage characteristic.Particularly, machine learning model corresponding with each candidate combinations feature can be trained, due to corresponding machine The effect of device learning model can reflect the feature importance (for example, predictive power) of candidate combinations feature, be weighed so as to pass through The effect of machine learning model corresponding with each candidate combinations feature filters out assemblage characteristic, example from candidate combinations feature Such as, the effect of machine learning model is better, and corresponding candidate combinations feature is more easily screened as assemblage characteristic.As an example, The model-evaluation index specified can be used to evaluate the effect of machine learning model corresponding with each candidate combinations feature.As Example can carry out designated model evaluation index automatically or the instruction according to user.
As an example, model-evaluation index can be AUC (ROC (Receiver Operating Characteristics, ReceiverOperating Characteristic) area under a curve, Area Under ROC Curve), MAE (mean absolute error, Mean Absolute Error) or logarithm loss function (logloss) etc..
As an example, the unit character of important the first preset condition of sexual satisfaction of feature among whole unit characters can be carried out It is various to combine to obtain candidate combinations feature.For example, feature importance among whole unit characters can be in the first default threshold Unit character in the range of value carries out various combinations to obtain candidate combinations feature, alternatively, important according to the feature of unit character Property whole unit characters are ranked up from high to low, and the unit character of preceding first predetermined quantity is subjected to various combinations to obtain Take candidate combinations feature.
As an example, it can determine the feature of unit character by weighing the effect of machine learning model corresponding with feature Importance, the effect of corresponding machine learning model is better, and the feature importance of unit character is higher.For example, it can be used and spy Levy the feature importance that corresponding machine learning model carrys out unit of measurement feature on the evaluation of estimate of model-evaluation index.Here, It as an example, can automatically or the instruction according to user, to specify the model-evaluation index.
In step S105, unit character and assemblage characteristic based on generation obtain the feature of machine learning sample.
It as an example, can be using whole unit characters of generation and whole assemblage characteristics as the feature of machine learning sample.
As another example, can be by among whole unit characters of generation and whole assemblage characteristics, feature importance is higher Feature of the feature as machine learning sample.As an example, can be by among whole unit characters and whole assemblage characteristics, feature Feature of the feature of important the second preset condition of sexual satisfaction as machine learning sample, for example, can feature importance be in the Feature of the feature as machine learning sample in two preset threshold ranges, alternatively, according to feature feature importance by height to It is low to be ranked up whole unit characters and whole assemblage characteristics jointly, and using the feature of preceding second predetermined quantity as engineering Practise the feature of sample.
It as another example, can be by feature unit character of high importance and generation among whole unit characters of generation Whole assemblage characteristics, the feature as machine learning sample.It as an example, can be by whole assemblage characteristics together with feature importance Meet feature of the unit character of the 3rd preset condition as machine learning sample, for example, can be by whole assemblage characteristics together with spy Sign importance is in feature of the unit character as machine learning sample in the range of third predetermined threshold value, alternatively, according to unit Whole unit characters are ranked up by the feature importance of feature from high to low, and the unit character of preceding 3rd predetermined quantity is connected With feature of the whole assemblage characteristic as machine learning sample.
It as another example, can be by feature importance among whole unit characters of generation and whole assemblage characteristics of generation Higher assemblage characteristic, the feature as machine learning sample.It as an example, can be by whole unit characters together with feature importance Meet feature of the assemblage characteristic of the 4th preset condition as machine learning sample, for example, can be by whole unit characters together with spy Sign importance is in feature of the assemblage characteristic as machine learning sample in the 4th preset threshold range, alternatively, according to combination Whole assemblage characteristics are ranked up by the feature importance of feature from high to low, and the assemblage characteristic of preceding 4th predetermined quantity is connected With feature of the whole unit character as machine learning sample.
In addition, the as an example, side of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample Method may also include:After step S105, the feature of the machine learning sample shown to user.It further, can also be to User shows the feature importance of each feature.
As an example, the method for the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample may be used also Including:After step S105, the feature of obtained machine learning sample is directly applied to subsequent machine learning step.Example Such as, the feature of the machine learning sample that can be directly based upon learns model.
The method that Fig. 3 shows the feature for automatically generating machine learning sample in accordance with an alternative illustrative embodiment of the present invention Flow chart.
With reference to Fig. 3, in step s 201, user's specified data table is obtained.
In step S202, the characteristic type corresponding to each non-targeted value field in claim data table.
In step S203, each non-targeted value field is handled as unit feature according to the characteristic type of statement.
In step S204, various combinations are carried out to obtain candidate combinations feature to whole unit characters of generation, and are led to Measurement is crossed to combine to filter out from candidate combinations feature with the effect of each corresponding machine learning model of candidate combinations feature Feature.
In step S205, using whole unit characters of generation with whole assemblage characteristics as the spy of machine learning sample Sign.
The method that Fig. 4 shows the feature for automatically generating machine learning sample in accordance with an alternative illustrative embodiment of the present invention Flow chart.
With reference to Fig. 4, in step S301, user's specified data table is obtained.
In step s 302, the characteristic type corresponding to each non-targeted value field in claim data table.
In step S303, each non-targeted value field is handled as unit feature according to the characteristic type of statement.
In step s 304, the unit character of high importance of feature among whole unit characters of generation is carried out various Combination obtains candidate combinations feature, and by weigh the effect of machine learning model corresponding with each candidate combinations feature come Assemblage characteristic is filtered out from candidate combinations feature.
In step S305, by feature unit character of high importance among whole unit characters of generation and generation Feature of whole assemblage characteristics as machine learning sample.
As an example, machine learning model corresponding with feature can be used to come on the evaluation of estimate of model-evaluation index AUC The feature importance of feature is weighed, in step s 304, corresponding AUC value among whole unit characters of generation can be more than 0.5 and unit character less than 1 carry out various combinations to obtain candidate combinations feature, also, in step S305, can will generate Whole unit characters among corresponding AUC value be more than 0.5 and less than 1 unit character and generation whole assemblage characteristic conducts The feature of machine learning sample.
The method that Fig. 5 shows the feature for automatically generating machine learning sample in accordance with an alternative illustrative embodiment of the present invention Flow chart.
With reference to Fig. 5, in step S401, user's specified data table is obtained.
In step S402, the characteristic type corresponding to each non-targeted value field in claim data table.
In step S403, each non-targeted value field is handled as unit feature according to the characteristic type of statement.
In step s 404, various combinations are carried out to obtain candidate combinations feature to whole unit characters of generation, and led to Measurement is crossed to combine to filter out from candidate combinations feature with the effect of each corresponding machine learning model of candidate combinations feature Feature.
In step S405, among whole unit characters of generation and whole assemblage characteristics, feature is of high importance Feature of the feature as machine learning sample.
As an example, machine learning model corresponding with feature can be used to come on the evaluation of estimate of model-evaluation index AUC The feature importance of feature is weighed, it, can be right by among whole unit characters of generation and whole assemblage characteristics in step S405 The AUC value answered is more than 0.5 and feature of the feature less than 1 as machine learning sample.
The illustrative methods of some features for automatically generating machine learning sample are enumerated above, however, art technology Personnel should be understood that exemplary embodiment of the present invention is not limited to these methods, and it is (single that any appropriate feature may be employed Position feature, candidate combinations feature or assemblage characteristic) generation or screening mode.
Exemplary embodiment according to the present invention can perform machine learning flow by the form of directed acyclic graph, should Machine learning flow can cover all or part of step for carrying out machine learning model training, testing or estimating.It for example, can The DAG of at least one step among comprising the following steps figures are established for machine learning model training:Historical data imports Step, data splitting step, feature generation step, logistic regression step and model prediction step.That is, above-mentioned each step can It is performed as the node in DAG figures.
Fig. 6 shows the example of the DAG figures according to an exemplary embodiment of the present invention for training machine learning model.
With reference to Fig. 6, the first step:Establish data delivery node.It is saved as an example, may be in response to user's operation and data are imported The tables of data (that is, is imported machine learning platform by the banking business data table that point is configured to obtain entitled " bank " In), wherein, a plurality of historgraphic data recording can be included in the tables of data.
Second step:It establishes data and splits node, and import data to node and be connected to data fractionation node, led above-mentioned The tables of data entered is split as training set and verification collects, wherein, the data record in training set is used to be converted to machine learning sample To learn model, and the data record concentrated is verified for being converted to test sample to verify the effect for the model for learning. It may be in response to user's operation data fractionation node is configured in an arranged manner to split the tables of data of above-mentioned importing Collect for training set and verification.
3rd step:Two feature generation nodes are established, and data fractionation node is connected respectively to the generation of the two features Node carries out feature generation, for example, default data splits section respectively data are split with the training set of node output and verification collection The output of point left side is training set, and right side output is verification collection.It is to be understood that for machine learning sample and test sample and Speech, the feature generating mode of the two is corresponding consistent.It may be in response to user's operation to be configured feature generation node, example Such as, it can refer to measurement index of target value field, the corresponding characteristic type of non-targeted value field, feature importance etc..
4th step:Feature algorithm (for example, logistic regression) node (that is, model training node) is established, and left side is special Sign generation node is connected to logistic regression node, to train engineering based on machine learning sample using logistic regression algorithm Practise model.User's operation is may be in response to logistic regression node is configured to carry out training airplane according to the logistic regression algorithm of setting Device learning model.
5th step:Model prediction node is established, and logistic regression node and right feature generation node are connected to model Node is predicted, to verify the effect of the machine learning model trained originally based on test specimens.User's operation be may be in response to mould Type prediction node is configured the effect for carrying out verifier learning model with the verification mode according to setting.
After foundation includes the DAG figures of above-mentioned steps, entire DAG figures can be run according to the instruction of user.It is performing When generating node to the feature, can automated execution above-mentioned example embodiment the feature for automatically generating machine learning sample Method.
Fig. 7 shows the frame of the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample Figure.As shown in fig. 7, the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample includes:Number Dress is obtained according to table acquisition device 10, statement device 20, unit character generating means 30, assemblage characteristic generating means 40 and feature Put 50.
Particularly, tables of data acquisition device 10 is used to obtain user's specified data table, wherein, a line pair of tables of data A data is answered to record, the corresponding field of a row of tables of data.
State characteristic type of the device 20 corresponding to for each non-targeted value field in claim data table, wherein, it is special Levying type includes discrete features and/or continuous feature.
As an example, non-targeted value field can obtain in the following manner:It is removed from all fields in tables of data The target value field that user specifies.
As an example, statement device 20 can automatically or the instruction according to user, by all non-targeted value fields be claimed as from Feature is dissipated, alternatively, each non-targeted value field is claimed as discrete features corresponding with its field value data type or continuous special Sign.
It is unit by each non-targeted value field processing that unit character generating means 30, which are used for according to the characteristic type of statement, Feature.
As an example, unit character generating means 30 can be directed to each field value data type as continuous type and be declared For the non-targeted value field of discrete features, execution one or more divide bucket computing and divide bucket special to obtain corresponding one or more Sign, and divide bucket feature whole feature as a unit by what is obtained.
Assemblage characteristic generating means 40 are used to carry out combinations of features based on the unit character of generation, special with generation combination Sign.
As an example, assemblage characteristic generating means 40 may include:Candidate combinations feature acquiring unit (not shown) and combination Feature Selection unit (not shown).
Candidate combinations feature acquiring unit is used to carry out whole unit characters of generation various combinations to obtain candidate set Feature is closed, alternatively, carrying out various combinations to the unit character of high importance of feature among whole unit characters of generation to obtain Take candidate combinations feature.
Assemblage characteristic screening unit is used for the effect by weighing machine learning model corresponding with each candidate combinations feature Fruit from candidate combinations feature filters out assemblage characteristic.
Feature acquisition device 50 obtains the spy of machine learning sample for the unit character based on generation and assemblage characteristic Sign.
As an example, feature acquisition device 50 can be using whole unit characters of generation and whole assemblage characteristics as engineering Practise the feature of sample.
As another example, feature acquisition device 50 can by among whole unit characters of generation and whole assemblage characteristics, Feature of the feature feature of high importance as machine learning sample.
As another example, feature acquisition device 50 can be of high importance by feature among whole unit characters of generation Unit character and whole assemblage characteristics of generation, the feature as machine learning sample.
As another example, feature acquisition device 50 can be of high importance by feature among whole assemblage characteristics of generation Assemblage characteristic and whole unit characters of generation, the feature as machine learning sample.
As an example, the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample may be used also Including:Display device (not shown), display device are used for the machine learning sample obtained to user's display feature acquisition device 50 Feature.Further, as an example, display device can also show the feature importance of each feature to user.
As an example, the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample may be used also Including:Application apparatus (not shown), application apparatus are used for the spy for the machine learning sample for directly obtaining feature acquisition device 50 Sign is applied to subsequent machine learning step.
As an example, exemplary reality according to the present invention can be made by starting operator corresponding with automated characterization generation step Apply the system automated execution operation of the feature for automatically generating machine learning sample of example.
As an example, the operator may correspond to the node in directed acyclic graph corresponding with machine learning flow.
In addition, as an example, the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample is System may also include:Alarm set (not shown), alarm set are used for situation of the operator in the not specified target value field of user Under when being activated, abnormity prompt is provided.
It should be understood that the tool of the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample Body realization method may be incorporated by reference the related specific implementation that Fig. 1 to Fig. 6 describes to realize, details are not described herein.
Device included by the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample It can be individually configured any combination of the software for execution specific function, hardware, firmware or above-mentioned item.For example, these devices can Corresponding to dedicated integrated circuit, pure software code is can also correspond to, also corresponds to the mould that software is combined with hardware Block.In addition, the one or more functions realized of these devices also can by physical entity equipment (for example, processor, client or Server etc.) in component seek unity of action.
It is to be understood that the method for the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample can pass through The program in computer-readable media is recorded in realize, for example, exemplary embodiment according to the present invention, it is possible to provide one kind is used for The computer-readable medium of the feature of machine learning sample is automatically generated, wherein, being recorded on the computer-readable medium has For performing the computer program of following methods step:(A) user's specified data table is obtained, wherein, a line pair of tables of data A data is answered to record, the corresponding field of a row of tables of data;(B) each non-targeted value field institute in claim data table is right The characteristic type answered, wherein, characteristic type includes discrete features and/or continuous feature;It (C) will be each according to the characteristic type of statement A non-targeted value field processing is unit feature;(D) combinations of features is carried out based on the unit character of generation, it is special with generation combination Sign;And (E) unit character based on generation and assemblage characteristic obtain the feature of machine learning sample.
Computer program in above computer readable medium can be in client, host, agent apparatus, server etc. Run in the environment disposed in computer equipment, it should be noted that the computer program can be additionally used in perform except above-mentioned steps with Outer additional step or performed when performing above-mentioned steps more specifically handles, these additional steps and is further processed Content is described referring to figs. 1 to Fig. 6, here in order to avoid repetition will be repeated no longer.
It should be noted that the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample can be complete The operation of computer program is relied on to realize corresponding function, i.e. in the function structure of each device and computer program and each Step is corresponding so that whole system is called by special software package (for example, lib storehouses), to realize corresponding function.
On the other hand, the system of the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample is wrapped The each device included can also be realized by hardware, software, firmware, middleware, microcode or its any combination.When with soft When part, firmware, middleware or microcode are realized, for perform the program code of corresponding operating or code segment can be stored in it is all As storage medium computer-readable medium in so that processor can be by reading and running corresponding program code or code Section performs corresponding operation.
For example, exemplary embodiment of the present invention is also implemented as computing device, which includes storage unit And processor, set of computer-executable instructions conjunction is stored in storage unit, when the set of computer-executable instructions is closed by institute State processor perform when, perform automatically generate machine learning sample feature method.
Particularly, the computing device can be deployed in server or client, can also be deployed in distributed network On node apparatus in network environment.In addition, the computing device can be PC computers, board device, personal digital assistant, intelligence Can mobile phone, web applications or other be able to carry out the device of above-metioned instruction set.
Here, the computing device is not necessarily single computing device, can also be it is any can be alone or in combination Perform the device of above-metioned instruction (or instruction set) or the aggregate of circuit.Computing device can also be integrated control system or system A part for manager can be configured as with Local or Remote (for example, via wireless transmission) with the portable of interface inter-link Formula electronic device.
In the computing device, processor may include central processing unit (CPU), graphics processor (GPU), may be programmed and patrol Collect device, dedicated processor systems, microcontroller or microprocessor.As an example, not a limit, processor may also include simulation Processor, digital processing unit, microprocessor, polycaryon processor, processor array, network processing unit etc..
Certain described in the method for the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample A little operations can realize that some operations can be realized by hardware mode, in addition, can also pass through software and hardware knot by software mode The mode of conjunction realizes these operations.
Processor can run the instruction being stored in one of storage unit or code, wherein, the storage unit can be with Store data.Instruction and data can be also sent and received via Network Interface Unit and by network, wherein, the network connects Any of transport protocol can be used in mouth device.
Storage unit can be integral to the processor and be integrated, for example, RAM or flash memory are arranged in integrated circuit microprocessor etc. Within.In addition, storage unit may include independent device, such as, external dish driving, storage array or any Database Systems can Other storage devices used.Storage unit and processor can operationally be coupled or can for example by I/O ports, Network connection etc. communicates so that processor can read the file being stored in storage unit.
In addition, the computing device may also include video display (such as, liquid crystal display) and user interaction interface is (all Such as, keyboard, mouse, touch input device etc.).The all components of computing device can be connected to each other via bus and/or network.
Operation involved by the method for the feature according to an exemplary embodiment of the present invention for automatically generating machine learning sample It can be described as various interconnections or the functional block of coupling or function diagram.However, these functional blocks or function diagram can be impartial Ground is integrated into single logic device or is operated according to non-exact border.
For example, the as described above, feature according to an exemplary embodiment of the present invention for being used to automatically generate machine learning sample Computing device may include storage unit and processor, wherein, be stored in storage unit set of computer-executable instructions conjunction, when When the set of computer-executable instructions conjunction is performed by the processor, following step is performed:(A) user's specified data is obtained Table, wherein, a line of tables of data corresponds to a data record, the corresponding field of a row of tables of data;(B) in claim data table Each non-targeted value field corresponding to characteristic type, wherein, characteristic type include discrete features and/or continuous feature;(C) Each non-targeted value field is handled as unit feature according to the characteristic type of statement;(D) based on the unit character of generation come into Row combinations of features, to generate assemblage characteristic;And (E) unit character based on generation and assemblage characteristic obtain machine learning sample This feature.
The foregoing describe each exemplary embodiments of the present invention, it should be appreciated that foregoing description is only exemplary, and exhaustive Property, the invention is not restricted to disclosed each exemplary embodiments.Without departing from the scope and spirit of the invention, it is right Many modifications and changes will be apparent from for those skilled in the art.Therefore, protection of the invention Scope should be subject to the scope of claim.

Claims (10)

1. a kind of method for the feature for automatically generating machine learning sample, including:
(A) user's specified data table is obtained, wherein, a line of tables of data corresponds to a data record, a row pair of tables of data Answer a field;
(B) characteristic type corresponding to each non-targeted value field in claim data table, wherein, characteristic type includes discrete spy Sign and/or continuous feature;
(C) it is unit feature by each non-targeted value field processing according to the characteristic type of statement;
(D) combinations of features is carried out based on the unit character of generation, to generate assemblage characteristic;And
(E) unit character based on generation and assemblage characteristic obtain the feature of machine learning sample.
2. according to the method described in claim 1, wherein, the method is by starting calculate corresponding with automated characterization generation step Son and automated execution.
3. according to the method described in claim 2, wherein, the operator corresponds to directed acyclic corresponding with machine learning flow Node in figure.
4. according to the method described in claim 1, wherein, in step (B),
All non-targeted value fields are claimed as discrete features by the automatic or instruction according to user, alternatively, by each non-targeted value Field declaration is discrete features corresponding with its field value data type or continuous feature.
5. according to the method described in claim 1, wherein, step (D) includes:
Various combinations are carried out to whole unit characters of generation to obtain candidate combinations feature, alternatively, whole units to generation Feature unit character of high importance carries out various combinations to obtain candidate combinations feature among feature;
It is screened by weighing the effect of machine learning model corresponding with each candidate combinations feature from candidate combinations feature Go out assemblage characteristic.
6. according to the method described in claim 1, wherein, in step (E),
Using whole unit characters of generation with whole assemblage characteristics as the feature of machine learning sample;
Alternatively, by among whole unit characters of generation and whole assemblage characteristics, feature feature of high importance is as machine The feature of learning sample;
It is alternatively, whole combinations of feature unit character of high importance among whole unit characters of generation and generation are special Sign, the feature as machine learning sample;
It is alternatively, feature assemblage characteristic of high importance among whole assemblage characteristics of generation and whole units of generation is special Sign, the feature as machine learning sample.
7. according to the method described in claim 4, wherein, in step (C),
For continuous type and the non-targeted value field of discrete features is declared as each field value data type, performs one kind Or a variety of points of bucket computings divide bucket feature to obtain corresponding one or more, and divide bucket feature integrally as a list using what is obtained Position feature.
8. a kind of system for the feature for automatically generating machine learning sample, including:
Tables of data acquisition device, for obtaining user's specified data table, wherein, a line of tables of data corresponds to a data note Record, the corresponding field of a row of tables of data;
State device, for the characteristic type corresponding to each non-targeted value field in claim data table, wherein, characteristic type Including discrete features and/or continuous feature;
Unit character generating means, for handling each non-targeted value field for unit feature according to the characteristic type of statement;
Assemblage characteristic generating means, for carrying out combinations of features based on the unit character of generation, to generate assemblage characteristic;And
Feature acquisition device obtains the feature of machine learning sample for the unit character based on generation and assemblage characteristic.
9. it is a kind of for automatically generating the computer-readable medium of the feature of machine learning sample, wherein, it can in the computer Read the side that record on medium is useful for performing the feature for automatically generating machine learning sample as described in claim 1 to 7 is any The computer program of method.
10. it is a kind of for automatically generating the computing device of the feature of machine learning sample, including storage unit and processor, In, set of computer-executable instructions conjunction is stored in storage unit, when the set of computer-executable instructions is closed by the processing When device performs, the method for the feature that automatically generates machine learning sample of the execution as described in claim 1 to 7 is any.
CN201711445538.3A 2017-12-27 2017-12-27 Automatically generate the method and system of the feature of machine learning sample Pending CN108090516A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201711445538.3A CN108090516A (en) 2017-12-27 2017-12-27 Automatically generate the method and system of the feature of machine learning sample
PCT/CN2018/123910 WO2019129060A1 (en) 2017-12-27 2018-12-26 Method and system for automatically generating machine learning sample

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711445538.3A CN108090516A (en) 2017-12-27 2017-12-27 Automatically generate the method and system of the feature of machine learning sample

Publications (1)

Publication Number Publication Date
CN108090516A true CN108090516A (en) 2018-05-29

Family

ID=62179713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711445538.3A Pending CN108090516A (en) 2017-12-27 2017-12-27 Automatically generate the method and system of the feature of machine learning sample

Country Status (2)

Country Link
CN (1) CN108090516A (en)
WO (1) WO2019129060A1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408592A (en) * 2018-10-12 2019-03-01 北京聚云位智信息科技有限公司 The Feature Engineering knowledge base and its implementation of AI in a kind of decision type distributed data base system
CN109634961A (en) * 2018-12-05 2019-04-16 杭州大拿科技股份有限公司 A kind of paper sample generating method, device, electronic equipment and storage medium
CN109697066A (en) * 2018-12-28 2019-04-30 第四范式(北京)技术有限公司 Realize the method and system of tables of data splicing and automatic training machine learning model
CN109739855A (en) * 2018-12-28 2019-05-10 第四范式(北京)技术有限公司 Realize the method and system of tables of data splicing and automatic training machine learning model
WO2019129060A1 (en) * 2017-12-27 2019-07-04 第四范式(北京)技术有限公司 Method and system for automatically generating machine learning sample
CN110297833A (en) * 2019-07-05 2019-10-01 税安科技(杭州)有限公司 A kind of bordereau error correction method
CN110443864A (en) * 2019-07-24 2019-11-12 北京大学 A kind of characters in a fancy style body automatic generation method based on single phase a small amount of sample learning
CN110457329A (en) * 2019-08-16 2019-11-15 第四范式(北京)技术有限公司 A kind of method and device for realizing personalized recommendation
CN110851500A (en) * 2019-11-07 2020-02-28 北京集奥聚合科技有限公司 Method for generating expert characteristic dimension required by machine learning modeling
CN111325578A (en) * 2020-02-20 2020-06-23 深圳市腾讯计算机系统有限公司 Prediction model sample determination method, prediction model sample determination device, prediction model sample determination medium, and prediction model sample determination device
CN111832740A (en) * 2019-12-30 2020-10-27 上海氪信信息技术有限公司 Method for deriving machine learning characteristics from structured data in real time
CN112184279A (en) * 2019-07-05 2021-01-05 上海哔哩哔哩科技有限公司 AUC index rapid calculation method and device and computer equipment
CN112380205A (en) * 2020-11-17 2021-02-19 北京融七牛信息技术有限公司 Method and system for automatically generating characteristics of distributed architecture
CN112434032A (en) * 2020-11-17 2021-03-02 北京融七牛信息技术有限公司 Automatic feature generation system and method
WO2022089652A1 (en) * 2020-11-02 2022-05-05 第四范式(北京)技术有限公司 Method and system for processing data tables and automatically training machine learning model

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11062792B2 (en) 2017-07-18 2021-07-13 Analytics For Life Inc. Discovering genomes to use in machine learning techniques
US11139048B2 (en) 2017-07-18 2021-10-05 Analytics For Life Inc. Discovering novel features to use in machine learning techniques, such as machine learning techniques for diagnosing medical conditions
CN112347320A (en) * 2020-11-05 2021-02-09 杭州数梦工场科技有限公司 Associated field recommendation method and device for data table field
CN112613983B (en) * 2020-12-25 2023-11-21 北京知因智慧科技有限公司 Feature screening method and device in machine modeling process and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677353A (en) * 2016-01-08 2016-06-15 北京物思创想科技有限公司 Feature extraction method and machine learning method and device thereof
CN107316082A (en) * 2017-06-15 2017-11-03 第四范式(北京)技术有限公司 For the method and system for the feature importance for determining machine learning sample
CN107392319A (en) * 2017-07-20 2017-11-24 第四范式(北京)技术有限公司 Generate the method and system of the assemblage characteristic of machine learning sample
CN107451266A (en) * 2017-07-31 2017-12-08 北京京东尚科信息技术有限公司 For processing data method and its equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090516A (en) * 2017-12-27 2018-05-29 第四范式(北京)技术有限公司 Automatically generate the method and system of the feature of machine learning sample

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677353A (en) * 2016-01-08 2016-06-15 北京物思创想科技有限公司 Feature extraction method and machine learning method and device thereof
CN107316082A (en) * 2017-06-15 2017-11-03 第四范式(北京)技术有限公司 For the method and system for the feature importance for determining machine learning sample
CN107392319A (en) * 2017-07-20 2017-11-24 第四范式(北京)技术有限公司 Generate the method and system of the assemblage characteristic of machine learning sample
CN107451266A (en) * 2017-07-31 2017-12-08 北京京东尚科信息技术有限公司 For processing data method and its equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
管震 等: "《云,就该这么玩儿》", 31 July 2015 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019129060A1 (en) * 2017-12-27 2019-07-04 第四范式(北京)技术有限公司 Method and system for automatically generating machine learning sample
CN109408592A (en) * 2018-10-12 2019-03-01 北京聚云位智信息科技有限公司 The Feature Engineering knowledge base and its implementation of AI in a kind of decision type distributed data base system
CN109408592B (en) * 2018-10-12 2021-09-24 北京聚云位智信息科技有限公司 AI characteristic engineering knowledge base in decision-making type distributed database system and implementation method thereof
CN109634961A (en) * 2018-12-05 2019-04-16 杭州大拿科技股份有限公司 A kind of paper sample generating method, device, electronic equipment and storage medium
CN109634961B (en) * 2018-12-05 2021-06-04 杭州大拿科技股份有限公司 Test paper sample generation method and device, electronic equipment and storage medium
CN109697066A (en) * 2018-12-28 2019-04-30 第四范式(北京)技术有限公司 Realize the method and system of tables of data splicing and automatic training machine learning model
CN109739855A (en) * 2018-12-28 2019-05-10 第四范式(北京)技术有限公司 Realize the method and system of tables of data splicing and automatic training machine learning model
CN109697066B (en) * 2018-12-28 2021-02-05 第四范式(北京)技术有限公司 Method and system for realizing data sheet splicing and automatically training machine learning model
CN112184279A (en) * 2019-07-05 2021-01-05 上海哔哩哔哩科技有限公司 AUC index rapid calculation method and device and computer equipment
CN110297833A (en) * 2019-07-05 2019-10-01 税安科技(杭州)有限公司 A kind of bordereau error correction method
CN110443864A (en) * 2019-07-24 2019-11-12 北京大学 A kind of characters in a fancy style body automatic generation method based on single phase a small amount of sample learning
CN110443864B (en) * 2019-07-24 2021-03-02 北京大学 Automatic artistic font generation method based on single-stage small-amount sample learning
CN110457329B (en) * 2019-08-16 2022-05-06 第四范式(北京)技术有限公司 Method and device for realizing personalized recommendation
CN110457329A (en) * 2019-08-16 2019-11-15 第四范式(北京)技术有限公司 A kind of method and device for realizing personalized recommendation
CN110851500B (en) * 2019-11-07 2022-10-28 北京集奥聚合科技有限公司 Method for generating expert characteristic dimension required by machine learning modeling
CN110851500A (en) * 2019-11-07 2020-02-28 北京集奥聚合科技有限公司 Method for generating expert characteristic dimension required by machine learning modeling
CN111832740A (en) * 2019-12-30 2020-10-27 上海氪信信息技术有限公司 Method for deriving machine learning characteristics from structured data in real time
CN111325578A (en) * 2020-02-20 2020-06-23 深圳市腾讯计算机系统有限公司 Prediction model sample determination method, prediction model sample determination device, prediction model sample determination medium, and prediction model sample determination device
CN111325578B (en) * 2020-02-20 2023-10-31 深圳市腾讯计算机系统有限公司 Sample determination method and device of prediction model, medium and equipment
WO2022089652A1 (en) * 2020-11-02 2022-05-05 第四范式(北京)技术有限公司 Method and system for processing data tables and automatically training machine learning model
CN112434032A (en) * 2020-11-17 2021-03-02 北京融七牛信息技术有限公司 Automatic feature generation system and method
CN112380205A (en) * 2020-11-17 2021-02-19 北京融七牛信息技术有限公司 Method and system for automatically generating characteristics of distributed architecture
CN112380205B (en) * 2020-11-17 2024-04-02 北京融七牛信息技术有限公司 Automatic feature generation method and system of distributed architecture
CN112434032B (en) * 2020-11-17 2024-04-05 北京融七牛信息技术有限公司 Automatic feature generation system and method

Also Published As

Publication number Publication date
WO2019129060A1 (en) 2019-07-04

Similar Documents

Publication Publication Date Title
CN108090516A (en) Automatically generate the method and system of the feature of machine learning sample
CN110399770B (en) Generating machine learning models for objects based on enhancing objects with physical properties
US11836578B2 (en) Utilizing machine learning models to process resource usage data and to determine anomalous usage of resources
US20200287923A1 (en) Unsupervised learning to simplify distributed systems management
CN107844837A (en) The method and system of algorithm parameter tuning are carried out for machine learning algorithm
CN107704871A (en) Generate the method and system of the assemblage characteristic of machine learning sample
CN107004185A (en) The pipeline generation of the control actuated for data flow
Dakos Identifying best-indicator species for abrupt transitions in multispecies communities
CN107766946A (en) Generate the method and system of the assemblage characteristic of machine learning sample
CN107392319A (en) Generate the method and system of the assemblage characteristic of machine learning sample
CN107316082A (en) For the method and system for the feature importance for determining machine learning sample
US11574430B2 (en) Method and system for creating animal type avatar using human face
US10262079B1 (en) Determining anonymized temporal activity signatures of individuals
CN107908566A (en) Automatic test management method, device, terminal device and storage medium
CN108008942A (en) The method and system handled data record
CN109783859A (en) Model building method, device and computer readable storage medium
CN109313720A (en) The strength neural network of external memory with sparse access
CN110995459B (en) Abnormal object identification method, device, medium and electronic equipment
US20190303836A1 (en) Determining optimal workforce types to fulfill occupational roles in an organization based on occupational attributes
CN111243682A (en) Method, device, medium and apparatus for predicting toxicity of drug
CN107679549A (en) Generate the method and system of the assemblage characteristic of machine learning sample
CN104731843B (en) The method and system of origin and accuracy tradeoff in equilibrium criterion modeling
CN107273979A (en) The method and system of machine learning prediction are performed based on service class
CN109858528A (en) Recommender system training method, device, computer equipment and storage medium
CN107909087A (en) Generate the method and system of the assemblage characteristic of machine learning sample

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination