CN107766946A - Generate the method and system of the assemblage characteristic of machine learning sample - Google Patents

Generate the method and system of the assemblage characteristic of machine learning sample Download PDF

Info

Publication number
CN107766946A
CN107766946A CN201710898898.2A CN201710898898A CN107766946A CN 107766946 A CN107766946 A CN 107766946A CN 201710898898 A CN201710898898 A CN 201710898898A CN 107766946 A CN107766946 A CN 107766946A
Authority
CN
China
Prior art keywords
configuration item
feature
features
machine learning
combinations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710898898.2A
Other languages
Chinese (zh)
Other versions
CN107766946B (en
Inventor
戴文渊
杨强
陈雨强
张舒羽
栾淑君
孙迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN202010658034.5A priority Critical patent/CN111797998B/en
Priority to CN201710898898.2A priority patent/CN107766946B/en
Publication of CN107766946A publication Critical patent/CN107766946A/en
Application granted granted Critical
Publication of CN107766946B publication Critical patent/CN107766946B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of method and system for the assemblage characteristic for generating machine learning sample are provided.Methods described includes:(A) unit character that can be combined is obtained;(B) graphical interfaces for setting combinations of features configuration item is provided a user, wherein, the combinations of features configuration item is used to limit how to carry out combinations of features between unit character;(C) user is received for the input operation for setting combinations of features configuration item and being performed on graphical interfaces, and the combinations of features configuration item of user's setting is obtained according to the input operation;And the combinations of features configuration item of (D) based on acquisition is combined to the feature to be combined among unit character, to generate the assemblage characteristic of machine learning sample.According to methods described and system, user only need to be provided for limiting the relevant configuration item for how carrying out combinations of features by interactive interface, you can realize that automated characterization combines, both improve Consumer's Experience, also improve the effect of machine learning model.

Description

Generate the method and system of the assemblage characteristic of machine learning sample
Technical field
All things considered of the present invention is related to artificial intelligence field, more particularly, is related to a kind of generation machine learning sample The method and system of assemblage characteristic.
Background technology
At this stage, the basic process of training machine learning model mainly includes:
1st, the data set (for example, tables of data) for including historgraphic data recording is imported;
2nd, Feature Engineering is completed, wherein, various processing are carried out by the attribute information for the data record concentrated to data, with Obtain each feature (such as, it may include assemblage characteristic), these features form characteristic vector can be used as machine learning sample;
3rd, training pattern, wherein, according to setting machine learning algorithm (for example, logistic regression algorithm, decision Tree algorithms, Neural network algorithm etc.), learn model based on the machine learning sample obtained by process Feature Engineering.
In above process, produce that the processing of feature is critically important, it can influence the quality of model.Per data in tables of data Record may include multiple attribute informations (that is, field), and feature may indicate that each field in itself or the part of field or field Various field processing (or computing) results such as combination, preferably to reflect the internal association of data distribution and interfield with diving In implication.With Data Mining as an example, on the basis of accurately extraction feature, different groups can be also carried out between feature Close to help learning process preferably to refine data rule, from the internal association in multiple angles dialysis data distribution and potential culvert Justice.Feature Engineering quality directly determines the accuracy that Machine Learning Problems are portrayed, and then influences the quality of model.
On existing machine learning platform, it can use and machine learning model is completed based on the interactive mode of graphical interfaces Flow is trained, program code is write in person without user.However, in Feature Engineering link, but will often be manually set Combinations of features mode be manually input in plateform system.That is, user needs to obtain specific combinations of features in advance Mode, and automated characterization combination can not be effectively realized by platform.
Also, in order to obtain combinations of features mode in advance, user needs have profound understanding to business scenario, i.e. user with Manually feature is combined by means of business experience, and typically in machine-learning process, uses the data volume of data It is all bigger, user sometimes can not comprehensively analyze data, cause to formulate some invalid assemblage characteristics, it is special in order to improve combination The effect of sign, user needs constantly to be attempted, during in face of big data quantity and high dimensional feature, such need of work cost compared with The long time.In this case workload is not only increased, also reduces operating efficiency.
The content of the invention
The exemplary embodiment of the present invention is to provide a kind of method of assemblage characteristic for generating machine learning sample and is System, to solve the problems, such as easily carry out automated characterization combination existing for prior art in machine learning system.
According to the exemplary embodiment of the present invention, there is provided a kind of method for the assemblage characteristic for generating machine learning sample, bag Include:(A) unit character that can be combined is obtained;(B) figure circle for setting combinations of features configuration item is provided a user Face, wherein, the combinations of features configuration item is used to limit how to carry out combinations of features between unit character;(C) user is received For the input operation for setting combinations of features configuration item and being performed on graphical interfaces, and use is obtained according to the input operation The combinations of features configuration item that family is set;And the combinations of features configuration item of (D) based on acquisition is to be combined among unit character Feature is combined, to generate the assemblage characteristic of machine learning sample.
Alternatively, combinations of features configuration item includes at least one among following item:Feature configuration item, for specifying unit Feature to be combined among feature so that be combined in step (D) to specified feature to be combined;Evaluation index configures , the evaluation index for given combination feature so that weighed and various groups according to specified evaluation index in step (D) The effect of the corresponding machine learning model of feature is closed to determine the combination of feature to be combined;Training parameter configuration item, is used for The training parameter of specified machine learning model so that in step (D) by weigh obtained under specified training parameter with The effects of the various corresponding machine learning models of assemblage characteristic determines the combination of feature to be combined.
Alternatively, combinations of features configuration item also includes:Point bucket computing configuration item, it will be treated among assemblage characteristic for specifying The one or more that perform respectively of at least one continuous feature divide bucket computing so as to described at least one in step (D) Continuous feature performs the one or more specified respectively divides bucket computing and divides bucket feature to obtain corresponding one or more, and will To divide bucket feature to be combined as overall with other features to be combined.
Alternatively, described point of bucket computing configuration item is used to specify one or more to divide bucket for each continuous feature respectively Computing;Or described point of bucket computing configuration item is used to uniformly specify one or more to divide bucket computing for all continuous features.
Alternatively, methods described also includes:(E) assemblage characteristic of generation is shown to user.
Alternatively, in step (E), also evaluation of estimate of each assemblage characteristic on evaluation index is shown to user.
Alternatively, methods described also includes:(F) assemblage characteristic of generation directly is applied into follow-up machine learning to walk Suddenly.
Alternatively, methods described also includes:(G) assemblage characteristic that user selects from the assemblage characteristic of display is applied to Follow-up machine learning step.
Alternatively, methods described also includes:(H) by the combination of the assemblage characteristic of generation in the step (D) to configure The form of file is preserved.
Alternatively, methods described also includes:(I) combination for the assemblage characteristic for selecting user in step (G) with The form of configuration file is preserved.
Alternatively, in step (A), unit spy is obtained by carrying out characteristic processing to the attribute information of data record Sign.
In accordance with an alternative illustrative embodiment of the present invention, there is provided it is a kind of generate machine learning sample assemblage characteristic be System, including:Unit character acquisition device, for obtaining the unit character that can be combined;Display device, for being carried to user For the graphical interfaces for setting combinations of features configuration item, wherein, how the combinations of features configuration item is used to limit in unit Combinations of features is carried out between feature;Configuration item acquisition device, scheming for receiving user to set combinations of features configuration item The input operation performed on shape interface, and the combinations of features configuration item that user is set is obtained according to the input operation;And Assemblage characteristic generating means, group is carried out to the feature to be combined among unit character for the combinations of features configuration item based on acquisition Close, to generate the assemblage characteristic of machine learning sample.
Alternatively, combinations of features configuration item includes at least one among following item:Feature configuration item, for specifying unit Feature to be combined among feature so that assemblage characteristic generating means are combined to specified feature to be combined;Evaluation index Configuration item, the evaluation index for given combination feature so that assemblage characteristic generating means weigh according to specified evaluation index The effect of corresponding with the various assemblage characteristics machine learning model of amount is to determine the combination of feature to be combined;Training parameter is matched somebody with somebody Item is put, the training parameter for specified machine learning model so that assemblage characteristic generating means are by weighing in specified training The effect of the machine learning model corresponding with various assemblage characteristics obtained under parameter determines the combination of feature to be combined.
Alternatively, combinations of features configuration item also includes:Point bucket computing configuration item, it will be treated among assemblage characteristic for specifying The one or more that perform respectively of at least one continuous feature divide bucket computing so that assemblage characteristic generating means to it is described at least One continuous feature performs the one or more specified respectively divides bucket computing and divides bucket feature to obtain corresponding one or more, and Bucket feature is divided to be combined as overall with other features to be combined using what is obtained.
Alternatively, described point of bucket computing configuration item is used to specify one or more to divide bucket for each continuous feature respectively Computing;Or described point of bucket computing configuration item is used to uniformly specify one or more to divide bucket computing for all continuous features.
Alternatively, display device also shows the assemblage characteristic of generation to user.
Alternatively, display device also shows evaluation of estimate of each assemblage characteristic of generation on evaluation index to user.
Alternatively, the system also includes:Application apparatus, for the assemblage characteristic of generation directly to be applied into follow-up machine Device learning procedure.
Alternatively, the system also includes:Application apparatus, for the combination for selecting user from the assemblage characteristic of display Feature is applied to follow-up machine learning step.
Alternatively, the system also includes:Save set, for the assemblage characteristic that generates assemblage characteristic generating means Combination is preserved in the form of configuration file.
Alternatively, the system also includes:Save set, for the combination for selecting user from the assemblage characteristic of display The combination of feature is preserved in the form of configuration file.
Alternatively, unit character acquisition device to the attribute information of data record by carrying out characteristic processing to obtain unit Feature.
In accordance with an alternative illustrative embodiment of the present invention, there is provided a kind of assemblage characteristic for being used to generate machine learning sample Computer-readable medium, wherein, record has for performing generation engineering as described above on the computer-readable medium Practise the computer program of the method for the assemblage characteristic of sample.
In accordance with an alternative illustrative embodiment of the present invention, there is provided a kind of assemblage characteristic for being used to generate machine learning sample Computing device, including memory unit and processor, wherein, set of computer-executable instructions conjunction is stored with memory unit, works as institute When stating set of computer-executable instructions conjunction by the computing device, the combination of generation machine learning sample as described above is performed The method of feature.
The method and system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention, there is provided A kind of convenient and efficient and the combinations of features process of interaction close friend, user only need to be provided for limiting how to enter by interactive interface The relevant configuration item of row combinations of features, you can realize that automated characterization combines, both improve Consumer's Experience, also improve machine learning The effect of model.
By in terms of partly illustrating that present general inventive concept is other in following description and/or advantage, also one Divide and will be apparent by description, or the implementation of present general inventive concept can be passed through and learnt.
Brief description of the drawings
By with reference to be exemplarily illustrated embodiment accompanying drawing carry out description, exemplary embodiment of the present it is upper State and will become apparent with other purposes and feature, wherein:
Fig. 1 shows the flow of the method for the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention Figure;
The method that Fig. 2 shows the assemblage characteristic of generation machine learning sample in accordance with an alternative illustrative embodiment of the present invention Flow chart;
Fig. 3 shows showing for the graphical interfaces according to an exemplary embodiment of the present invention for being used to set combinations of features configuration item Example;
Fig. 4 shows the example of combinations of features analysis report according to an exemplary embodiment of the present invention;
Fig. 5 shows the DAG figures according to an exemplary embodiment of the present invention for being used to generate the assemblage characteristic of machine learning sample Example;
Fig. 6 shows the frame of the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention Figure.
Embodiment
Embodiments of the invention are reference will now be made in detail, the example of the embodiment is shown in the drawings, wherein, identical mark Number identical part is referred to all the time.The embodiment will be illustrated by referring to accompanying drawing below, to explain the present invention.
Here, machine learning is the inevitable outcome that artificial intelligence study develops into certain phase, and it is directed to passing through calculating Means, improve the performance of system itself using experience.In computer systems, " experience " is generally deposited in the form of " data " By machine learning algorithm, " model " can be being produced from data, that is to say, that be supplied to machine learning to calculate empirical data Method, it can just be based on these empirical datas and produce model, when in face of news, model can provide corresponding judgement, i.e. prediction As a result.Whether training machine learning model, or be predicted using the machine learning model trained, data are required for turning It is changed to the machine learning sample including various features.Machine learning can be implemented as " supervised learning ", " unsupervised learning " or The form of " semi-supervised learning ", it should be noted that exemplary embodiment of the invention is to specific machine learning algorithm and without spy Definite limitation.Further, it should also be noted that train and application model during, may also be combined with other means such as statistic algorithm.
Fig. 1 shows the flow of the method for the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention Figure.Here, also can be by special generation machine learning sample as an example, methods described can be performed by computer program Assemblage characteristic system or computing device perform.
In step slo, the unit character that can be combined is obtained.Here, the unit character is can to carry out spy Levy the least unit of combination.
As an example, can be by carrying out characteristic processing to the attribute information of data record to obtain unit character.Here, often Data record is seen as the description as described in an event or object, corresponding to an example or sample.In data record, The attribute information (that is, field) of performance or property including reflection event or object in terms of certain.As an example, at features described above Reason can be any suitable characteristic processing mode, for example, a part can be intercepted to the value of field, or can be to the value Discretization, the combination taken the logarithm etc. between various arithmetical operations or different field are carried out, the present invention is not limited this System.Resulting unit character may indicate that field in itself or the various fields processing such as the combination of part or field of field or fortune Calculate result.
In step S20, the graphical interfaces for setting combinations of features configuration item is provided a user, wherein, the feature Combination configuration item is used to limit how to carry out combinations of features between unit character.It is single according to the exemplary embodiment of the present invention Combination between the feature of position can be performed based on the combinations of features configuration item of user's setting.Particularly, can train special with unit The corresponding machine learning model of candidate combinations feature between sign, and reflected based on the difference on effect between machine learning model The predictive power size of each candidate combinations feature, and then more important or effective candidate combinations feature is filtered out as machine The assemblage characteristic of learning sample.As an example, user can set the combinations of features involved by above-mentioned flow by graphical interfaces Configuration item, other related combinations of features configuration items also can be set.
In step s 30, the input behaviour for receiving user to set combinations of features configuration item and perform on graphical interfaces Make, and the combinations of features configuration item of user's setting is obtained according to the input operation.
As an example, the graphical interfaces provided a user may include input control corresponding to each combinations of features configuration item with Selection and/or content of edit, so as to be obtained by receiving selection operation and/or the edit operation of user set by user Combinations of features configuration item.
In step s 40, the combinations of features configuration item based on acquisition carries out group to the feature to be combined among unit character Close, to generate the assemblage characteristic of machine learning sample.
As an example, combinations of features configuration item may include at least one among following item:Feature configuration item, evaluation index Configuration item, training parameter configuration item, divide bucket computing configuration item.It should be understood that combinations of features configuration item may also comprise other use In limit how between unit character carry out combinations of features configuration item.
Particularly, feature configuration item is used to specify the feature to be combined among unit character so that right in step s 40 The feature to be combined specified is combined.As an example, can be by feature configuration item by the whole obtained in step slo or portion Unit character is divided to be appointed as feature to be combined.Particularly, feature configuration item can be used for helping user to be confirmed whether whole Unit character is used as feature to be combined, it can also be used to helps user specifically to specify each feature to be combined.
Evaluation index configuration item is used for the evaluation index of given combination feature so that is commented in step s 40 according to specified Valency index weighs the effect of machine learning model corresponding with various assemblage characteristics to determine the combination of feature to be combined. Here, as an example, machine learning model corresponding with particular combination feature may indicate that the sample of the machine learning model includes The particular combination feature.
As described above, according to the present invention exemplary embodiment, carry out unit character combination when, can by weigh with The effect of the corresponding machine learning model of assemblage characteristic determines whether to use the assemblage characteristic.Here, set evaluation refers to Mark can be used for the effect for weighing machine learning model corresponding with various assemblage characteristics, if the evaluation of certain machine learning model refers to Mark is higher, then assemblage characteristic corresponding with the machine learning model is more easily confirmed as the assemblage characteristic of machine learning sample. As an example, the evaluation index can be the various model-evaluation indexes for being used to weigh the effect of machine learning model.For example, The evaluation index can be that (ROC (Receiver Operating Characteristics, Receiver Operating Characteristic) is bent by AUC Area under line, Area Under ROC Curve), MAE (mean absolute error, Mean Absolute Error) or logarithm Loss function (logloss) etc..
Training parameter configuration item is used for the training parameter of specified machine learning model so that passes through measurement in step s 40 The effect of the machine learning model corresponding with various assemblage characteristics obtained under specified training parameter determines spy to be combined The combination of sign.
As an example, training parameter configuration item may include the configuration item of one or more different training parameters.For example, training Parameter coordination item may include learning rate configuration item and/or adjust ginseng number configuration item etc..
However, it should be noted that above example be only used for illustrating and explain the present invention exemplary embodiment, and the present invention show Example property embodiment not necessarily needs user to configure above-mentioned project, is produced for example, can directly give tacit consent to by all by characteristic processing Raw unit character is used as feature to be combined, or, the evaluation index pre-set can be used for weighing machine learning model, Or model training can be carried out under the training parameter of acquiescence.
In addition, combinations of features configuration item may also include a point bucket computing configuration item, its be used to specify will treat assemblage characteristic it In the one or more that perform respectively of at least one continuous feature divide bucket (binning) computing so that in step s 40 to institute State at least one continuous feature and perform the one or more specified respectively and divide bucket computing to divide bucket to obtain corresponding one or more Feature, and divide bucket feature to be combined as overall with other features to be combined using what is obtained.As an example, described point of bucket computing Configuration item can be used for specifying one or more to divide bucket computing for each continuous feature respectively.As another example, described point Bucket computing configuration item can be used for uniformly specifying one or more to divide bucket computing for all continuous features.
Here, for each continuous feature, the every kind of point of bucket computing performed to it can produce a point of bucket feature, Accordingly, the feature being made up of all points of bucket features can replace original continuous feature and participate in automatic between feature to be combined Combination.As an example, each the continuous feature treated among assemblage characteristic can be distinguished by dividing bucket computing configuration item to specify Perform a variety of points of bucket computings so that perform a variety of points of buckets fortune specified respectively to each described continuous feature in step s 40 Calculate to obtain corresponding multiple points of bucket features.
Particularly, being continuously characterized in can be with a kind of relative feature of discrete features (for example, category feature), its value It is that there is certain successional numerical value, for example, age, amount of money etc..Relatively, as an example, the value of discrete features does not have Continuity, for example, it may be the nothing such as " coming from Beijing ", " coming from Shanghai " or " coming from Tianjin ", " sex is man ", " sex is female " The feature of sequence classification.Correspondingly, bucket computing is divided to refer to carrying out continuous feature into a kind of decentralized ad hoc fashion, i.e. will be continuous The codomain of feature is divided into multiple sections (that is, multiple buckets), and determines to divide bucket characteristic value accordingly based on the bucket of division.Also To say, according to the exemplary embodiment of the present invention, for each continuous feature, by perform at least one divide bucket computing and , can be by being obtained using each point of bucket feature as a component after having obtained corresponding at least one point of bucket feature Feature corresponding with continuous feature, this feature can regard a point set for bucket feature as, and group is carried out with continuous feature and/or discrete features Close.Here, it should be appreciated that divide the execution of bucket computing so that continuous feature by decentralized is inserted in corresponding specific bucket, changing In multiple points of bucket features afterwards, each dimension can both indicate whether to be assigned in bucket continuous feature centrifugal pump (for example, " 0 " or " 1 "), it can also indicate that specific serial number (for example, the actual characteristic value of continuous feature or its normalized value, described The average value of each continuous feature, median, boundary value etc. in bucket).Correspondingly, each dimension of concrete application in machine learning When centrifugal pump (for example, being directed to classification problem) or serial number (for example, being directed to regression problem), the group between centrifugal pump can be carried out Close the combination (for example, arithmetical operation combination etc.) between (for example, cartesian product etc.) or serial number.
As an example, divide bucket computing configuration item to further comprise point bucket mode configuration item and/or divide bucket parameter configuration item. Bucket mode configuration item is divided to be used to specify to divide to divide bucket mode used in bucket computing.Bucket parameter configuration item is divided to divide bucket mode for specifying Divide bucket parameter.For example, bucket mode can be divided deeply by dividing bucket mode configuration item to specify wide point of bucket mode or wait, can be by dividing bucket Parameter configuration item specifies barrelage amount or bucket width or bucket depth degree etc..Here, user can be manually entered or select a point bucket parameter to match somebody with somebody Put the numerical value of item, especially, user can be prompted set according to ratio or equal difference relation is waited it is wide/wait depth divide each width of bucket/ Depth.
Here, as an example, divide a variety of points of bucket computings specified by bucket computing configuration item can be point bucket mode it is identical but Bucket computing or difference is divided to divide bucket mode with what difference divided bucket parameter (for example, barrelage amount, bucket depth degree, bucket width etc.) Divide bucket computing.Can be by this as an example, performing corresponding feature obtained by a variety of points of bucket computings specified to continuous feature The feature that continuous feature performs obtained by every kind of point of bucket computing respectively collectively constitutes, corresponding with continuous feature special obtained from Sign can simultaneously from different angles, yardstick/aspect portray some attributes of original data record.
It should be understood that example is merely possible to generating the mode of assemblage characteristic based on configuration item above and play explanation and Illustration, exemplary embodiment of the invention are not limited to above-mentioned example.
As an example, after feature based combination configuration item generates the assemblage characteristic of machine learning sample, according to this The method of the assemblage characteristic of the generation machine learning sample of invention exemplary embodiment may also include:It is directly that the combination of generation is special Sign is applied to follow-up machine learning step.For example, can the machine learning sample based on the assemblage characteristic including at least generation come Learn model.
As an example, the method for the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention may be used also Including:The combination of the assemblage characteristic of generation is preserved in the form of configuration file, so as to machine continuous after execution It can be directly invoked during learning procedure according to user's request, or can be straight according to user's request when carrying out other machines learning process Connect calling.
The method that Fig. 2 shows the assemblage characteristic of generation machine learning sample in accordance with an alternative illustrative embodiment of the present invention Flow chart.As shown in Fig. 2 the assemblage characteristic of generation machine learning sample in accordance with an alternative illustrative embodiment of the present invention Method may also include step S50 in addition to including the step S10 shown in Fig. 1, step S20, step S30 and step S40.Step S10, step S20, step S30 and step S40 can refer to according to the embodiment of Fig. 1 descriptions to realize, no longer superfluous herein State.
In step s 50, the assemblage characteristic generated in step s 40 is shown to user.Here, can be according to any effective Form shows the specific combination of assemblage characteristic.
As an example, also show evaluation of estimate of each assemblage characteristic on evaluation index to user.Here, the evaluation refers to Mark can be the evaluation index or other any evaluation indexes specified by the evaluation index configuration item that user is set.
As an example, the side of the assemblage characteristic for generating machine learning sample in accordance with an alternative illustrative embodiment of the present invention Method may also include:The assemblage characteristic that user is selected from the assemblage characteristic of display is applied to follow-up machine learning step.
As another example, the assemblage characteristic of generation machine learning sample in accordance with an alternative illustrative embodiment of the present invention Method may also include:The combination for the assemblage characteristic that user is selected is preserved in the form of configuration file, so as to It can be directly invoked when performing follow-up machine learning step according to user's request, or can when carrying out other machines learning process Directly invoked according to user's request.
As an example, the side of the assemblage characteristic for generating machine learning sample in accordance with an alternative illustrative embodiment of the present invention Method may also include:The assemblage characteristic that user is selected from the assemblage characteristic of display is applied to follow-up machine learning step, and The combination for the assemblage characteristic selected is preserved in the form of configuration file.
Describe according to an exemplary embodiment of the present invention to set feature by graphic interface by user with reference to Fig. 3 Combine the example of configuration item.Fig. 3 shows the figure according to an exemplary embodiment of the present invention for being used to set combinations of features configuration item The example at interface.It should be understood that specific interaction of the exemplary embodiment of the present invention when setting each combinations of features configuration item is thin Section is not limited to the example shown in Fig. 3.
As shown in figure 3, for set the graphical interfaces of combinations of features configuration item can show respectively with feature configuration item, evaluation Index allocation item, training parameter configuration item and divide content options and/or content input frame corresponding to bucket computing configuration item.Specifically In fact, the input operation of " selecting whole features " option can be chosen according to user to set feature configuration item so that in step S10 The unit character of middle acquisition is designated all as feature to be combined;Or the input of " self-defined " option can be chosen to grasp according to user Make, eject the user interface for self-defined feature to be combined, it is special with the candidate's unit provided by user from the user interface Feature to be combined is selected in sign (for example, the whole unit characters obtained in step S10), or, spy to be combined is inputted by user The identification information of sign completes the setting to feature configuration item.It can be commented according to selection operation of the user in drop-down menu to set Valency index allocation item so that the content (for example, " AUC " as shown in Figure 3) of user's selection is designated as evaluation index.User can By to edit operation corresponding to content input frame (for example, learning rate configuration item as shown in Figure 3) (for example, as shown in Figure 3 it is defeated Enter numerical value " 0.5 ") realize the setting to training parameter configuration item.User can be by dividing bucket computing configuration item (for example, as schemed Divide bucket parameter configuration item (barrelage amount configuration item) shown in 3) corresponding to content input frame edit operation (for example, as shown in Figure 3 Input numerical value " 10/100/1000/10000/100000 ") realize the setting to dividing bucket computing configuration item, namely user sets Divide bucket computing configuration item to specify each continuous feature treated among assemblage characteristic to perform five kinds of points of bucket computings respectively, wherein, The first divides barrelage amount corresponding to bucket computing to be " 10 ", barrelage amount corresponding to second point of bucket computing is " 100 " ..., the 5th kind The barrelage amount corresponding to bucket computing of dividing is " 100000 ", here, divides bucket computing mode to be defaulted as wide point of bucket.
Showing for the assemblage characteristic according to an exemplary embodiment of the present invention that generation is shown to user is described with reference to Fig. 4 Example.In the example of fig. 4, assemblage characteristic is shown as the form of combinations of features analysis report.
As shown in figure 4, the unit character obtained in step S10 is shown in table left side above, with " output characteristic name= The form of processing method (field name of former attribute information) " is shown, for example, discrete_feature_1729_0= Discrete (cons_price_idx) represents the field cons_price_idx of discrete value as unit character discrete_ feature_1729_0;The assemblage characteristic in step S40 generations is shown on the left of following table, with " output characteristic name=place The form of reason method (combine (former feature name 1, former feature name 2, former feature name 3 ...)) is shown, for example, discrete_ Feature_1729_23=discrete (combine (default, month)) represents to combine feature default and month Obtained discrete features are new assemblage characteristic discrete_feature_1729_23.Right side is shown each in two tables Evaluation of estimate of the feature on evaluation index.As an example, table above can not be shown, and only show following table.
Further, as an example, user can select assemblage characteristic from the combinations of features analysis report shown in Fig. 4, with Preserved applied to follow-up machine learning step and/or in the form of configuration file.
According to the exemplary embodiment of the present invention, machine learning can be performed by the form of directed acyclic graph (DAG figures) Flow, the machine learning flow can cover all or part of step for carrying out machine learning model training, testing or estimate. For example, can for feature Automatic Combined come establish including historical data steps for importing, data splitting step, feature extraction step, The DAG figures of automated characterization combination step.That is, above-mentioned each step can be performed as the node in DAG figures.
Fig. 5 shows the DAG figures according to an exemplary embodiment of the present invention for being used to generate the assemblage characteristic of machine learning sample Example.
Reference picture 5, the first step:Establish data delivery node.For example, as shown in figure 5, user's operation is may be in response to data Delivery node is configured so that the banking business data table of entitled " bank " is imported in machine learning platform, wherein, the number According to a plurality of historgraphic data recording can be included in table.
Second step:Establish data and split node, and import data to node and be connected to data fractionation node, led above-mentioned The tables of data entered is split as training set and checking collects, wherein, the data record in training set is used to be converted to machine learning sample To learn model, and verify the data record concentrated and be used to be converted to test sample to verify the effect for the model for learning. User's operation is may be in response to be configured in an arranged manner to split the tables of data of above-mentioned importing data fractionation node Collect for training set and checking.
3rd step:Two feature extraction nodes are established, and data fractionation node is connected into spy respectively and taken out to the two features Node is taken, feature extraction is carried out respectively so that data are split with the training set of node output and checking collection, for example, default data is split Output is training set on the left of node, and right side output is checking collection.The spy that can be set based on user in feature extraction node Sign configuration or the code write carry out feature extraction to training set and checking collection.It should be understood that for machine learning sample and test For sample, both feature extraction modes are corresponding consistent.User can will extract the feature of node configuration to left feature Extraction mode directly applies to the feature extraction that node is extracted to right feature, or, the two can be set to automatic synchronization by platform Set.
4th step:Automated characterization combined joint is established, and two feature extraction nodes are connected respectively to automated characterization group Close node.It may be in response to user's operation to be configured automated characterization combined joint, for example, clicking on " automatically when receiving user During the operation of combinations of features " node, figure circle for being used to set combinations of features configuration item as shown in Figure 3 can be provided a user Face, in order to which user sets combinations of features configuration item by the graphical interfaces.
After foundation includes the DAG figures of above-mentioned steps, whole DAG figures can be run according to the instruction of user.Running During, machine learning platform can automatically generate the assemblage characteristic of machine learning sample according to the configuration item that user is set, and Export corresponding assemblage characteristic.
In addition, as an example, after automated characterization combined joint, model training node can be also established, and will be automatic special Sign combined joint is connected to model training node, and the assemblage characteristic of the feature of extraction and generation is directly applied into follow-up mould Type training.Correspondingly, user's operation is may be in response to be configured model training node to be based on machine in an arranged manner Device learning sample training pattern.So as to when running whole DAG figures, you can directly learn machine according to the configuration item that user is set Device learning model.
Fig. 6 shows the frame of the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention Figure.As shown in fig. 6, the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention includes:It is single Position feature acquisition device 10, display device 20, configuration item acquisition device 30, assemblage characteristic generating means 40.
Unit character acquisition device 10 is used to obtain the unit character that can be combined.
As an example, unit character acquisition device 10 can by the attribute information of data record is carried out characteristic processing come To unit character.
Display device 20 is used to provide a user the graphical interfaces for setting combinations of features configuration item, wherein, the spy Sign combination configuration item is used to limit how to carry out combinations of features between unit character.
Configuration item acquisition device 30 is used to receive user to set combinations of features configuration item and perform on graphical interfaces Input operation, and according to the input operation come obtain user setting combinations of features configuration item.
Assemblage characteristic generating means 40 are used for the combinations of features configuration item based on acquisition to be combined among unit character Feature is combined, to generate the assemblage characteristic of machine learning sample.
Alternatively, combinations of features configuration item may include at least one among following item:Feature configuration item, evaluation index are matched somebody with somebody Put item, training parameter configuration item, divide bucket computing configuration item.
Particularly, feature configuration item is used to specify the feature to be combined among unit character so that assemblage characteristic generates Device 40 is combined to specified feature to be combined.
Evaluation index configuration item is used for the evaluation index of given combination feature so that assemblage characteristic generating means 40 are according to finger Fixed evaluation index weighs the effect of machine learning model corresponding with various assemblage characteristics to determine the group of feature to be combined Conjunction mode.
Training parameter configuration item is used for the training parameter of specified machine learning model so that assemblage characteristic generating means 40 are logical Cross and weigh the effect of machine learning model corresponding with various assemblage characteristics that is obtained under specified training parameter to determine to treat The combination of assemblage characteristic.
Point bucket computing configuration item is used to specifying perform at least one continuous feature treated among assemblage characteristic respectively One or more divide bucket computing so that assemblage characteristic generating means 40 perform what is specified respectively at least one continuous feature One or more divide bucket computing and divide bucket feature to obtain corresponding one or more, and using obtain divide bucket feature as it is overall with Other features to be combined are combined.
As an example, described point of bucket computing configuration item can be used for specifying one or more for each continuous feature respectively Divide bucket computing.As another example, described point of bucket computing configuration item can be used for for all continuous features it is unified specify it is a kind of or A variety of points of bucket computings.
As an example, display device 20 can also show the assemblage characteristic of the generation of assemblage characteristic generating means 40 to user.Enter One step, as an example, display device 20 can also show each assemblage characteristic of the generation of assemblage characteristic generating means 40 to user Evaluation of estimate on evaluation index.
As an example, the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention may be used also Including:Application apparatus (not shown).
The assemblage characteristic that application apparatus is used to directly generate assemblage characteristic generating means 40 is applied to follow-up engineering Step is practised, or, the assemblage characteristic selected in the assemblage characteristic that user is shown from display device 20 is applied to follow-up machine Learning procedure.
As an example, the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention may be used also Including:Save set (not shown).
The combination for the assemblage characteristic that save set is used to generate assemblage characteristic generating means 40 is with configuration file Form is preserved, or, the combination for the assemblage characteristic selected in the assemblage characteristic that user is shown from display device 20 Preserved in the form of configuration file.
It should be understood that the tool of the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention The related specific implementation that body implementation may be incorporated by reference Fig. 1 to Fig. 5 descriptions is realized, will not be repeated here.
Device included by the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention It can be individually configured to perform the software of specific function, hardware, firmware or any combination of above-mentioned item.For example, these devices can Corresponding to special integrated circuit, pure software code is can also correspond to, also corresponds to the mould that software is combined with hardware Block.In addition, the one or more functions realized of these devices also can by physical entity equipment (for example, processor, client or Server etc.) in component seek unity of action.
It should be understood that the method for the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention can pass through The program in computer-readable media is recorded in realize, for example, the exemplary embodiment according to the present invention, it is possible to provide one kind is used for The computer-readable medium of the assemblage characteristic of machine learning sample is generated, wherein, being recorded on the computer-readable medium has For performing the computer program of following methods step:(A) unit character that can be combined is obtained;(B) provide a user For setting the graphical interfaces of combinations of features configuration item, wherein, the combinations of features configuration item is used to how limit in unit spy Combinations of features is carried out between sign;(C) user is received for the input for setting combinations of features configuration item and being performed on graphical interfaces Operation, and the combinations of features configuration item that user is set is obtained according to the input operation;And the feature group of (D) based on acquisition Close configuration item to be combined the feature to be combined among unit character, to generate the assemblage characteristic of machine learning sample.
Computer program in above computer computer-readable recording medium can be in client, main frame, agent apparatus, server etc. Run in the environment disposed in computer equipment, it should be noted that the computer program can be additionally used in perform except above-mentioned steps with Outer additional step or performed when performing above-mentioned steps more specifically handles, and these additional steps and further handles Content is described referring to figs. 1 to Fig. 5, here in order to avoid repetition will be repeated no longer.
It should be noted that the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention can be complete Corresponding function is realized in the operation for relying on computer program, i.e. in the function structure of each device and computer program with it is each Step is corresponding so that whole system is called by special software kit (for example, lib storehouses), to realize corresponding function.
On the other hand, the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention is wrapped The each device included can also be realized by hardware, software, firmware, middleware, microcode or its any combination.When with soft When part, firmware, middleware or microcode are realized, program code or code segment for performing corresponding operating can be stored in all As storage medium computer-readable medium in so that processor can be by reading and running corresponding program code or code Section performs corresponding operation.
For example, the exemplary embodiment of the present invention is also implemented as computing device, the computing device includes memory unit And processor, set of computer-executable instructions conjunction is stored with memory unit, when the set of computer-executable instructions is closed by institute When stating computing device, the method for the assemblage characteristic of execution generation machine learning sample.
Particularly, the computing device can be deployed in server or client, can also be deployed in distributed network On node apparatus in network environment.In addition, the computing device can be PC computers, board device, personal digital assistant, intelligence Can mobile phone, web applications or other be able to carry out the device of above-mentioned instruction set.
Here, the computing device is not necessarily single computing device, can also be it is any can be alone or in combination Perform the device of above-mentioned instruction (or instruction set) or the aggregate of circuit.Computing device can also be integrated control system or system A part for manager, or can be configured as with Local or Remote (for example, via be wirelessly transferred) with the portable of interface inter-link Formula electronic installation.
In the computing device, processor may include central processing unit (CPU), graphics processor (GPU), may be programmed and patrol Collect device, dedicated processor systems, microcontroller or microprocessor.Unrestricted as example, processor may also include simulation Processor, digital processing unit, microprocessor, polycaryon processor, processor array, network processing unit etc..
It is according to an exemplary embodiment of the present invention generation machine learning sample assemblage characteristic method described in certain A little operations can realize that some operations can be realized by hardware mode, in addition, can also pass through software and hardware knot by software mode The mode of conjunction realizes these operations.
Processor can run the instruction being stored in one of memory unit or code, wherein, the memory unit can be with Data storage.Instruction and data can be also sent and received via Network Interface Unit and by network, wherein, the network connects Mouth device can use any of host-host protocol.
Memory unit can be integral to the processor and be integrated, for example, RAM or flash memory are arranged in into integrated circuit microprocessor etc. Within.In addition, memory unit may include independent device, such as, outside dish driving, storage array or any Database Systems can Other storage devices used.Memory unit and processor can be coupled operationally, or can for example by I/O ports, Network connection etc. communicates so that processor can read the file being stored in memory unit.
In addition, the computing device may also include video display (such as, liquid crystal display) and user mutual interface is (all Such as, keyboard, mouse, touch input device etc.).The all component of computing device can be connected to each other via bus and/or network.
Operation involved by the method for the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention The functional block or function diagram of various interconnections or coupling can be described as.However, these functional blocks or function diagram can be impartial Ground is integrated into single logic device or operated according to non-definite border.
For example, the as described above, assemblage characteristic according to an exemplary embodiment of the present invention for being used to generate machine learning sample Computing device may include memory unit and processor, wherein, be stored with memory unit set of computer-executable instructions conjunction, when When the set of computer-executable instructions is closed by the computing device, following step is performed:(A) obtain what can be combined Unit character;(B) graphical interfaces for setting combinations of features configuration item is provided a user, wherein, the combinations of features configuration Item is used to limit how to carry out combinations of features between unit character;(C) receive user to set combinations of features configuration item and The input operation performed on graphical interfaces, and the combinations of features configuration item that user is set is obtained according to the input operation; And the combinations of features configuration item of (D) based on acquisition is combined to the feature to be combined among unit character, to generate machine The assemblage characteristic of learning sample.
The foregoing describe each exemplary embodiment of the present invention, it should be appreciated that foregoing description is only exemplary, and exhaustive Property, the invention is not restricted to disclosed each exemplary embodiment.Without departing from the scope and spirit of the invention, it is right Many modifications and changes will be apparent from for those skilled in the art.Therefore, protection of the invention Scope should be defined by the scope of claim.

Claims (10)

1. a kind of method for the assemblage characteristic for generating machine learning sample, including:
(A) unit character that can be combined is obtained;
(B) graphical interfaces for setting combinations of features configuration item is provided a user, wherein, the combinations of features configuration item is used for How restriction carries out combinations of features between unit character;
(C) user is received for the input operation for setting combinations of features configuration item and being performed on graphical interfaces, and according to described Input operation come obtain user setting combinations of features configuration item;And
(D) the combinations of features configuration item based on acquisition is combined to the feature to be combined among unit character, to generate machine The assemblage characteristic of learning sample.
2. according to the method for claim 1, wherein, combinations of features configuration item includes at least one among following item:It is special Configuration item is levied, for specifying the feature to be combined among unit character so that specified feature to be combined is entered in step (D) Row combination;Evaluation index configuration item, the evaluation index for given combination feature so that commented in step (D) according to specified Valency index weighs the effect of machine learning model corresponding with various assemblage characteristics to determine the combination of feature to be combined; Training parameter configuration item, the training parameter for specified machine learning model so that by weighing specified in step (D) The effect of the machine learning model corresponding with various assemblage characteristics obtained under training parameter determines the combination of feature to be combined Mode.
3. according to the method for claim 2, wherein, combinations of features configuration item also includes:Divide bucket computing configuration item, for referring to The fixed one or more for performing at least one continuous feature treated among assemblage characteristic respectively divide bucket computing so that in step (D) at least one continuous feature is performed respectively in the one or more specified divide bucket computing with obtain corresponding one or Multiple points of bucket features, and divide bucket feature to be combined as overall with other features to be combined using what is obtained.
4. according to the method for claim 3, wherein, described point of bucket computing configuration item is used for continuous special for each respectively Sign specifies one or more to divide bucket computing;Or described point of bucket computing configuration item is used to specify for all continuous features are unified One or more divide bucket computing.
5. the method according to claim 11, in addition to:
(E) assemblage characteristic of generation is shown to user.
6. according to the method for claim 5, wherein, in step (E), also show each assemblage characteristic on commenting to user The evaluation of estimate of valency index.
7. the method according to claim 11, in addition to:
(F) assemblage characteristic of generation is directly applied to follow-up machine learning step.
8. a kind of system for the assemblage characteristic for generating machine learning sample, including:
Unit character acquisition device, for obtaining the unit character that can be combined;
Display device, for providing a user the graphical interfaces for setting combinations of features configuration item, wherein, the combinations of features Configuration item is used to limit how to carry out combinations of features between unit character;
Configuration item acquisition device, for receiving user for the input for setting combinations of features configuration item and being performed on graphical interfaces Operation, and the combinations of features configuration item that user is set is obtained according to the input operation;And
Assemblage characteristic generating means, the feature to be combined among unit character is entered for the combinations of features configuration item based on acquisition Row combination, to generate the assemblage characteristic of machine learning sample.
9. a kind of computer-readable medium for being used to generate the assemblage characteristic of machine learning sample, wherein, can in the computer Reading record on medium has the side of the assemblage characteristic for generating machine learning sample for execution as described in claim 1 to 7 is any The computer program of method.
10. a kind of computing device for being used to generate the assemblage characteristic of machine learning sample, including memory unit and processor, its In, set of computer-executable instructions conjunction is stored with memory unit, when the set of computer-executable instructions is closed by the processing When device performs, the method for the assemblage characteristic of generation machine learning sample of the execution as described in claim 1 to 7 is any.
CN201710898898.2A 2017-09-28 2017-09-28 Method and system for generating combined features of machine learning samples Active CN107766946B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010658034.5A CN111797998B (en) 2017-09-28 2017-09-28 Method and system for generating combined features of machine learning samples
CN201710898898.2A CN107766946B (en) 2017-09-28 2017-09-28 Method and system for generating combined features of machine learning samples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710898898.2A CN107766946B (en) 2017-09-28 2017-09-28 Method and system for generating combined features of machine learning samples

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202010658034.5A Division CN111797998B (en) 2017-09-28 2017-09-28 Method and system for generating combined features of machine learning samples

Publications (2)

Publication Number Publication Date
CN107766946A true CN107766946A (en) 2018-03-06
CN107766946B CN107766946B (en) 2020-06-23

Family

ID=61267329

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010658034.5A Active CN111797998B (en) 2017-09-28 2017-09-28 Method and system for generating combined features of machine learning samples
CN201710898898.2A Active CN107766946B (en) 2017-09-28 2017-09-28 Method and system for generating combined features of machine learning samples

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202010658034.5A Active CN111797998B (en) 2017-09-28 2017-09-28 Method and system for generating combined features of machine learning samples

Country Status (1)

Country Link
CN (2) CN111797998B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681426A (en) * 2018-05-25 2018-10-19 第四范式(北京)技术有限公司 Method and system for executing characteristic processing for data
CN108710949A (en) * 2018-04-26 2018-10-26 第四范式(北京)技术有限公司 The method and system of template are modeled for creating machine learning
CN108985459A (en) * 2018-05-30 2018-12-11 华为技术有限公司 The method and apparatus of training pattern
CN109634961A (en) * 2018-12-05 2019-04-16 杭州大拿科技股份有限公司 A kind of paper sample generating method, device, electronic equipment and storage medium
CN109685583A (en) * 2019-01-10 2019-04-26 博拉网络股份有限公司 A kind of supply chain needing forecasting method based on big data
CN110851500A (en) * 2019-11-07 2020-02-28 北京集奥聚合科技有限公司 Method for generating expert characteristic dimension required by machine learning modeling
CN110895718A (en) * 2018-09-07 2020-03-20 第四范式(北京)技术有限公司 Method and system for training machine learning model
CN110956272A (en) * 2019-11-01 2020-04-03 第四范式(北京)技术有限公司 Method and system for realizing data processing
CN111625692A (en) * 2020-05-27 2020-09-04 北京字节跳动网络技术有限公司 Feature extraction method, device, electronic equipment and computer readable medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090116693A1 (en) * 2007-11-01 2009-05-07 Canon Kabushiki Kaisha Image processing apparatus and image processing method
CN103353936A (en) * 2013-07-26 2013-10-16 上海交通大学 Method and system for face identification
US20130322741A1 (en) * 2012-06-05 2013-12-05 DRVision Technologies LLC. Teachable pattern scoring method
CN105260171A (en) * 2015-09-10 2016-01-20 深圳市创梦天地科技有限公司 Virtual item generation method and apparatus
CN105677353A (en) * 2016-01-08 2016-06-15 北京物思创想科技有限公司 Feature extraction method and machine learning method and device thereof
CN106127531A (en) * 2016-07-14 2016-11-16 北京物思创想科技有限公司 The method and system of differentiation price are performed based on machine learning
CN106779088A (en) * 2016-12-06 2017-05-31 北京物思创想科技有限公司 Perform the method and system of machine learning flow
CN107045503A (en) * 2016-02-05 2017-08-15 华为技术有限公司 The method and device that a kind of feature set is determined

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897918A (en) * 2017-02-24 2017-06-27 上海易贷网金融信息服务有限公司 A kind of hybrid machine learning credit scoring model construction method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090116693A1 (en) * 2007-11-01 2009-05-07 Canon Kabushiki Kaisha Image processing apparatus and image processing method
US20130322741A1 (en) * 2012-06-05 2013-12-05 DRVision Technologies LLC. Teachable pattern scoring method
CN103353936A (en) * 2013-07-26 2013-10-16 上海交通大学 Method and system for face identification
CN105260171A (en) * 2015-09-10 2016-01-20 深圳市创梦天地科技有限公司 Virtual item generation method and apparatus
CN105677353A (en) * 2016-01-08 2016-06-15 北京物思创想科技有限公司 Feature extraction method and machine learning method and device thereof
CN107045503A (en) * 2016-02-05 2017-08-15 华为技术有限公司 The method and device that a kind of feature set is determined
CN106127531A (en) * 2016-07-14 2016-11-16 北京物思创想科技有限公司 The method and system of differentiation price are performed based on machine learning
CN106779088A (en) * 2016-12-06 2017-05-31 北京物思创想科技有限公司 Perform the method and system of machine learning flow

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
李叶紫等: ""基于互信息的组合特征选择算法"", 《计算机系统应用》 *
李婷婷等: ""基于SVM和CRF多特征组合的微博情感分析"", 《计算机应用研究》 *
李敏等: ""特征选择方法与算法的研究"", 《计算机技术与发展》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710949A (en) * 2018-04-26 2018-10-26 第四范式(北京)技术有限公司 The method and system of template are modeled for creating machine learning
CN108681426A (en) * 2018-05-25 2018-10-19 第四范式(北京)技术有限公司 Method and system for executing characteristic processing for data
CN108681426B (en) * 2018-05-25 2020-08-11 第四范式(北京)技术有限公司 Method and system for performing feature processing on data
CN108985459A (en) * 2018-05-30 2018-12-11 华为技术有限公司 The method and apparatus of training pattern
CN110895718A (en) * 2018-09-07 2020-03-20 第四范式(北京)技术有限公司 Method and system for training machine learning model
CN109634961A (en) * 2018-12-05 2019-04-16 杭州大拿科技股份有限公司 A kind of paper sample generating method, device, electronic equipment and storage medium
CN109634961B (en) * 2018-12-05 2021-06-04 杭州大拿科技股份有限公司 Test paper sample generation method and device, electronic equipment and storage medium
CN109685583A (en) * 2019-01-10 2019-04-26 博拉网络股份有限公司 A kind of supply chain needing forecasting method based on big data
CN109685583B (en) * 2019-01-10 2020-12-25 博拉网络股份有限公司 Supply chain demand prediction method based on big data
CN110956272A (en) * 2019-11-01 2020-04-03 第四范式(北京)技术有限公司 Method and system for realizing data processing
CN110956272B (en) * 2019-11-01 2023-08-08 第四范式(北京)技术有限公司 Method and system for realizing data processing
CN110851500A (en) * 2019-11-07 2020-02-28 北京集奥聚合科技有限公司 Method for generating expert characteristic dimension required by machine learning modeling
CN110851500B (en) * 2019-11-07 2022-10-28 北京集奥聚合科技有限公司 Method for generating expert characteristic dimension required by machine learning modeling
CN111625692A (en) * 2020-05-27 2020-09-04 北京字节跳动网络技术有限公司 Feature extraction method, device, electronic equipment and computer readable medium
CN111625692B (en) * 2020-05-27 2023-08-22 抖音视界有限公司 Feature extraction method, device, electronic equipment and computer readable medium

Also Published As

Publication number Publication date
CN111797998B (en) 2024-06-11
CN107766946B (en) 2020-06-23
CN111797998A (en) 2020-10-20

Similar Documents

Publication Publication Date Title
CN107766946A (en) Generate the method and system of the assemblage characteristic of machine learning sample
CN107844837A (en) The method and system of algorithm parameter tuning are carried out for machine learning algorithm
CN108090516A (en) Automatically generate the method and system of the feature of machine learning sample
Osaba et al. A tutorial on the design, experimentation and application of metaheuristic algorithms to real-world optimization problems
Sedlmair et al. Visual parameter space analysis: A conceptual framework
CN108710949A (en) The method and system of template are modeled for creating machine learning
CN108008942A (en) The method and system handled data record
US20180365557A1 (en) Information processing method and information processing apparatus
CN107908566A (en) Automatic test management method, device, terminal device and storage medium
CN104798043B (en) A kind of data processing method and computer system
CN110045953A (en) Generate the method and computing device of business rule expression formula
CN108228861A (en) For performing the method and system of the Feature Engineering of machine learning
US20210027514A1 (en) Method and system for creating animal type avatar using human face
CN104834479A (en) Method and system for automatically optimizing configuration of storage system facing cloud platform
CN107316082A (en) For the method and system for the feature importance for determining machine learning sample
CN108960264A (en) The training method and device of disaggregated model
CN107273979A (en) The method and system of machine learning prediction are performed based on service class
CN107578140A (en) Guide analysis system and method
WO2019108371A1 (en) Training neural networks to detect similar three-dimensional objects using fuzzy identification
CN110197004B (en) Circuit simulation method and device based on mobile terminal, computer medium and equipment
Fischer et al. Towards a survey on static and dynamic hypergraph visualizations
CN107909087A (en) Generate the method and system of the assemblage characteristic of machine learning sample
CN108108820A (en) For selecting the method and system of the feature of machine learning sample
CN107909141A (en) A kind of data analysing method and device based on grey wolf optimization algorithm
CN107679549A (en) Generate the method and system of the assemblage characteristic of machine learning sample

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant