CN107766946A - Generate the method and system of the assemblage characteristic of machine learning sample - Google Patents
Generate the method and system of the assemblage characteristic of machine learning sample Download PDFInfo
- Publication number
- CN107766946A CN107766946A CN201710898898.2A CN201710898898A CN107766946A CN 107766946 A CN107766946 A CN 107766946A CN 201710898898 A CN201710898898 A CN 201710898898A CN 107766946 A CN107766946 A CN 107766946A
- Authority
- CN
- China
- Prior art keywords
- configuration item
- feature
- features
- machine learning
- combinations
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Electrically Operated Instructional Devices (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of method and system for the assemblage characteristic for generating machine learning sample are provided.Methods described includes:(A) unit character that can be combined is obtained;(B) graphical interfaces for setting combinations of features configuration item is provided a user, wherein, the combinations of features configuration item is used to limit how to carry out combinations of features between unit character;(C) user is received for the input operation for setting combinations of features configuration item and being performed on graphical interfaces, and the combinations of features configuration item of user's setting is obtained according to the input operation;And the combinations of features configuration item of (D) based on acquisition is combined to the feature to be combined among unit character, to generate the assemblage characteristic of machine learning sample.According to methods described and system, user only need to be provided for limiting the relevant configuration item for how carrying out combinations of features by interactive interface, you can realize that automated characterization combines, both improve Consumer's Experience, also improve the effect of machine learning model.
Description
Technical field
All things considered of the present invention is related to artificial intelligence field, more particularly, is related to a kind of generation machine learning sample
The method and system of assemblage characteristic.
Background technology
At this stage, the basic process of training machine learning model mainly includes:
1st, the data set (for example, tables of data) for including historgraphic data recording is imported;
2nd, Feature Engineering is completed, wherein, various processing are carried out by the attribute information for the data record concentrated to data, with
Obtain each feature (such as, it may include assemblage characteristic), these features form characteristic vector can be used as machine learning sample;
3rd, training pattern, wherein, according to setting machine learning algorithm (for example, logistic regression algorithm, decision Tree algorithms,
Neural network algorithm etc.), learn model based on the machine learning sample obtained by process Feature Engineering.
In above process, produce that the processing of feature is critically important, it can influence the quality of model.Per data in tables of data
Record may include multiple attribute informations (that is, field), and feature may indicate that each field in itself or the part of field or field
Various field processing (or computing) results such as combination, preferably to reflect the internal association of data distribution and interfield with diving
In implication.With Data Mining as an example, on the basis of accurately extraction feature, different groups can be also carried out between feature
Close to help learning process preferably to refine data rule, from the internal association in multiple angles dialysis data distribution and potential culvert
Justice.Feature Engineering quality directly determines the accuracy that Machine Learning Problems are portrayed, and then influences the quality of model.
On existing machine learning platform, it can use and machine learning model is completed based on the interactive mode of graphical interfaces
Flow is trained, program code is write in person without user.However, in Feature Engineering link, but will often be manually set
Combinations of features mode be manually input in plateform system.That is, user needs to obtain specific combinations of features in advance
Mode, and automated characterization combination can not be effectively realized by platform.
Also, in order to obtain combinations of features mode in advance, user needs have profound understanding to business scenario, i.e. user with
Manually feature is combined by means of business experience, and typically in machine-learning process, uses the data volume of data
It is all bigger, user sometimes can not comprehensively analyze data, cause to formulate some invalid assemblage characteristics, it is special in order to improve combination
The effect of sign, user needs constantly to be attempted, during in face of big data quantity and high dimensional feature, such need of work cost compared with
The long time.In this case workload is not only increased, also reduces operating efficiency.
The content of the invention
The exemplary embodiment of the present invention is to provide a kind of method of assemblage characteristic for generating machine learning sample and is
System, to solve the problems, such as easily carry out automated characterization combination existing for prior art in machine learning system.
According to the exemplary embodiment of the present invention, there is provided a kind of method for the assemblage characteristic for generating machine learning sample, bag
Include:(A) unit character that can be combined is obtained;(B) figure circle for setting combinations of features configuration item is provided a user
Face, wherein, the combinations of features configuration item is used to limit how to carry out combinations of features between unit character;(C) user is received
For the input operation for setting combinations of features configuration item and being performed on graphical interfaces, and use is obtained according to the input operation
The combinations of features configuration item that family is set;And the combinations of features configuration item of (D) based on acquisition is to be combined among unit character
Feature is combined, to generate the assemblage characteristic of machine learning sample.
Alternatively, combinations of features configuration item includes at least one among following item:Feature configuration item, for specifying unit
Feature to be combined among feature so that be combined in step (D) to specified feature to be combined;Evaluation index configures
, the evaluation index for given combination feature so that weighed and various groups according to specified evaluation index in step (D)
The effect of the corresponding machine learning model of feature is closed to determine the combination of feature to be combined;Training parameter configuration item, is used for
The training parameter of specified machine learning model so that in step (D) by weigh obtained under specified training parameter with
The effects of the various corresponding machine learning models of assemblage characteristic determines the combination of feature to be combined.
Alternatively, combinations of features configuration item also includes:Point bucket computing configuration item, it will be treated among assemblage characteristic for specifying
The one or more that perform respectively of at least one continuous feature divide bucket computing so as to described at least one in step (D)
Continuous feature performs the one or more specified respectively divides bucket computing and divides bucket feature to obtain corresponding one or more, and will
To divide bucket feature to be combined as overall with other features to be combined.
Alternatively, described point of bucket computing configuration item is used to specify one or more to divide bucket for each continuous feature respectively
Computing;Or described point of bucket computing configuration item is used to uniformly specify one or more to divide bucket computing for all continuous features.
Alternatively, methods described also includes:(E) assemblage characteristic of generation is shown to user.
Alternatively, in step (E), also evaluation of estimate of each assemblage characteristic on evaluation index is shown to user.
Alternatively, methods described also includes:(F) assemblage characteristic of generation directly is applied into follow-up machine learning to walk
Suddenly.
Alternatively, methods described also includes:(G) assemblage characteristic that user selects from the assemblage characteristic of display is applied to
Follow-up machine learning step.
Alternatively, methods described also includes:(H) by the combination of the assemblage characteristic of generation in the step (D) to configure
The form of file is preserved.
Alternatively, methods described also includes:(I) combination for the assemblage characteristic for selecting user in step (G) with
The form of configuration file is preserved.
Alternatively, in step (A), unit spy is obtained by carrying out characteristic processing to the attribute information of data record
Sign.
In accordance with an alternative illustrative embodiment of the present invention, there is provided it is a kind of generate machine learning sample assemblage characteristic be
System, including:Unit character acquisition device, for obtaining the unit character that can be combined;Display device, for being carried to user
For the graphical interfaces for setting combinations of features configuration item, wherein, how the combinations of features configuration item is used to limit in unit
Combinations of features is carried out between feature;Configuration item acquisition device, scheming for receiving user to set combinations of features configuration item
The input operation performed on shape interface, and the combinations of features configuration item that user is set is obtained according to the input operation;And
Assemblage characteristic generating means, group is carried out to the feature to be combined among unit character for the combinations of features configuration item based on acquisition
Close, to generate the assemblage characteristic of machine learning sample.
Alternatively, combinations of features configuration item includes at least one among following item:Feature configuration item, for specifying unit
Feature to be combined among feature so that assemblage characteristic generating means are combined to specified feature to be combined;Evaluation index
Configuration item, the evaluation index for given combination feature so that assemblage characteristic generating means weigh according to specified evaluation index
The effect of corresponding with the various assemblage characteristics machine learning model of amount is to determine the combination of feature to be combined;Training parameter is matched somebody with somebody
Item is put, the training parameter for specified machine learning model so that assemblage characteristic generating means are by weighing in specified training
The effect of the machine learning model corresponding with various assemblage characteristics obtained under parameter determines the combination of feature to be combined.
Alternatively, combinations of features configuration item also includes:Point bucket computing configuration item, it will be treated among assemblage characteristic for specifying
The one or more that perform respectively of at least one continuous feature divide bucket computing so that assemblage characteristic generating means to it is described at least
One continuous feature performs the one or more specified respectively divides bucket computing and divides bucket feature to obtain corresponding one or more, and
Bucket feature is divided to be combined as overall with other features to be combined using what is obtained.
Alternatively, described point of bucket computing configuration item is used to specify one or more to divide bucket for each continuous feature respectively
Computing;Or described point of bucket computing configuration item is used to uniformly specify one or more to divide bucket computing for all continuous features.
Alternatively, display device also shows the assemblage characteristic of generation to user.
Alternatively, display device also shows evaluation of estimate of each assemblage characteristic of generation on evaluation index to user.
Alternatively, the system also includes:Application apparatus, for the assemblage characteristic of generation directly to be applied into follow-up machine
Device learning procedure.
Alternatively, the system also includes:Application apparatus, for the combination for selecting user from the assemblage characteristic of display
Feature is applied to follow-up machine learning step.
Alternatively, the system also includes:Save set, for the assemblage characteristic that generates assemblage characteristic generating means
Combination is preserved in the form of configuration file.
Alternatively, the system also includes:Save set, for the combination for selecting user from the assemblage characteristic of display
The combination of feature is preserved in the form of configuration file.
Alternatively, unit character acquisition device to the attribute information of data record by carrying out characteristic processing to obtain unit
Feature.
In accordance with an alternative illustrative embodiment of the present invention, there is provided a kind of assemblage characteristic for being used to generate machine learning sample
Computer-readable medium, wherein, record has for performing generation engineering as described above on the computer-readable medium
Practise the computer program of the method for the assemblage characteristic of sample.
In accordance with an alternative illustrative embodiment of the present invention, there is provided a kind of assemblage characteristic for being used to generate machine learning sample
Computing device, including memory unit and processor, wherein, set of computer-executable instructions conjunction is stored with memory unit, works as institute
When stating set of computer-executable instructions conjunction by the computing device, the combination of generation machine learning sample as described above is performed
The method of feature.
The method and system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention, there is provided
A kind of convenient and efficient and the combinations of features process of interaction close friend, user only need to be provided for limiting how to enter by interactive interface
The relevant configuration item of row combinations of features, you can realize that automated characterization combines, both improve Consumer's Experience, also improve machine learning
The effect of model.
By in terms of partly illustrating that present general inventive concept is other in following description and/or advantage, also one
Divide and will be apparent by description, or the implementation of present general inventive concept can be passed through and learnt.
Brief description of the drawings
By with reference to be exemplarily illustrated embodiment accompanying drawing carry out description, exemplary embodiment of the present it is upper
State and will become apparent with other purposes and feature, wherein:
Fig. 1 shows the flow of the method for the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention
Figure;
The method that Fig. 2 shows the assemblage characteristic of generation machine learning sample in accordance with an alternative illustrative embodiment of the present invention
Flow chart;
Fig. 3 shows showing for the graphical interfaces according to an exemplary embodiment of the present invention for being used to set combinations of features configuration item
Example;
Fig. 4 shows the example of combinations of features analysis report according to an exemplary embodiment of the present invention;
Fig. 5 shows the DAG figures according to an exemplary embodiment of the present invention for being used to generate the assemblage characteristic of machine learning sample
Example;
Fig. 6 shows the frame of the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention
Figure.
Embodiment
Embodiments of the invention are reference will now be made in detail, the example of the embodiment is shown in the drawings, wherein, identical mark
Number identical part is referred to all the time.The embodiment will be illustrated by referring to accompanying drawing below, to explain the present invention.
Here, machine learning is the inevitable outcome that artificial intelligence study develops into certain phase, and it is directed to passing through calculating
Means, improve the performance of system itself using experience.In computer systems, " experience " is generally deposited in the form of " data "
By machine learning algorithm, " model " can be being produced from data, that is to say, that be supplied to machine learning to calculate empirical data
Method, it can just be based on these empirical datas and produce model, when in face of news, model can provide corresponding judgement, i.e. prediction
As a result.Whether training machine learning model, or be predicted using the machine learning model trained, data are required for turning
It is changed to the machine learning sample including various features.Machine learning can be implemented as " supervised learning ", " unsupervised learning " or
The form of " semi-supervised learning ", it should be noted that exemplary embodiment of the invention is to specific machine learning algorithm and without spy
Definite limitation.Further, it should also be noted that train and application model during, may also be combined with other means such as statistic algorithm.
Fig. 1 shows the flow of the method for the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention
Figure.Here, also can be by special generation machine learning sample as an example, methods described can be performed by computer program
Assemblage characteristic system or computing device perform.
In step slo, the unit character that can be combined is obtained.Here, the unit character is can to carry out spy
Levy the least unit of combination.
As an example, can be by carrying out characteristic processing to the attribute information of data record to obtain unit character.Here, often
Data record is seen as the description as described in an event or object, corresponding to an example or sample.In data record,
The attribute information (that is, field) of performance or property including reflection event or object in terms of certain.As an example, at features described above
Reason can be any suitable characteristic processing mode, for example, a part can be intercepted to the value of field, or can be to the value
Discretization, the combination taken the logarithm etc. between various arithmetical operations or different field are carried out, the present invention is not limited this
System.Resulting unit character may indicate that field in itself or the various fields processing such as the combination of part or field of field or fortune
Calculate result.
In step S20, the graphical interfaces for setting combinations of features configuration item is provided a user, wherein, the feature
Combination configuration item is used to limit how to carry out combinations of features between unit character.It is single according to the exemplary embodiment of the present invention
Combination between the feature of position can be performed based on the combinations of features configuration item of user's setting.Particularly, can train special with unit
The corresponding machine learning model of candidate combinations feature between sign, and reflected based on the difference on effect between machine learning model
The predictive power size of each candidate combinations feature, and then more important or effective candidate combinations feature is filtered out as machine
The assemblage characteristic of learning sample.As an example, user can set the combinations of features involved by above-mentioned flow by graphical interfaces
Configuration item, other related combinations of features configuration items also can be set.
In step s 30, the input behaviour for receiving user to set combinations of features configuration item and perform on graphical interfaces
Make, and the combinations of features configuration item of user's setting is obtained according to the input operation.
As an example, the graphical interfaces provided a user may include input control corresponding to each combinations of features configuration item with
Selection and/or content of edit, so as to be obtained by receiving selection operation and/or the edit operation of user set by user
Combinations of features configuration item.
In step s 40, the combinations of features configuration item based on acquisition carries out group to the feature to be combined among unit character
Close, to generate the assemblage characteristic of machine learning sample.
As an example, combinations of features configuration item may include at least one among following item:Feature configuration item, evaluation index
Configuration item, training parameter configuration item, divide bucket computing configuration item.It should be understood that combinations of features configuration item may also comprise other use
In limit how between unit character carry out combinations of features configuration item.
Particularly, feature configuration item is used to specify the feature to be combined among unit character so that right in step s 40
The feature to be combined specified is combined.As an example, can be by feature configuration item by the whole obtained in step slo or portion
Unit character is divided to be appointed as feature to be combined.Particularly, feature configuration item can be used for helping user to be confirmed whether whole
Unit character is used as feature to be combined, it can also be used to helps user specifically to specify each feature to be combined.
Evaluation index configuration item is used for the evaluation index of given combination feature so that is commented in step s 40 according to specified
Valency index weighs the effect of machine learning model corresponding with various assemblage characteristics to determine the combination of feature to be combined.
Here, as an example, machine learning model corresponding with particular combination feature may indicate that the sample of the machine learning model includes
The particular combination feature.
As described above, according to the present invention exemplary embodiment, carry out unit character combination when, can by weigh with
The effect of the corresponding machine learning model of assemblage characteristic determines whether to use the assemblage characteristic.Here, set evaluation refers to
Mark can be used for the effect for weighing machine learning model corresponding with various assemblage characteristics, if the evaluation of certain machine learning model refers to
Mark is higher, then assemblage characteristic corresponding with the machine learning model is more easily confirmed as the assemblage characteristic of machine learning sample.
As an example, the evaluation index can be the various model-evaluation indexes for being used to weigh the effect of machine learning model.For example,
The evaluation index can be that (ROC (Receiver Operating Characteristics, Receiver Operating Characteristic) is bent by AUC
Area under line, Area Under ROC Curve), MAE (mean absolute error, Mean Absolute Error) or logarithm
Loss function (logloss) etc..
Training parameter configuration item is used for the training parameter of specified machine learning model so that passes through measurement in step s 40
The effect of the machine learning model corresponding with various assemblage characteristics obtained under specified training parameter determines spy to be combined
The combination of sign.
As an example, training parameter configuration item may include the configuration item of one or more different training parameters.For example, training
Parameter coordination item may include learning rate configuration item and/or adjust ginseng number configuration item etc..
However, it should be noted that above example be only used for illustrating and explain the present invention exemplary embodiment, and the present invention show
Example property embodiment not necessarily needs user to configure above-mentioned project, is produced for example, can directly give tacit consent to by all by characteristic processing
Raw unit character is used as feature to be combined, or, the evaluation index pre-set can be used for weighing machine learning model,
Or model training can be carried out under the training parameter of acquiescence.
In addition, combinations of features configuration item may also include a point bucket computing configuration item, its be used to specify will treat assemblage characteristic it
In the one or more that perform respectively of at least one continuous feature divide bucket (binning) computing so that in step s 40 to institute
State at least one continuous feature and perform the one or more specified respectively and divide bucket computing to divide bucket to obtain corresponding one or more
Feature, and divide bucket feature to be combined as overall with other features to be combined using what is obtained.As an example, described point of bucket computing
Configuration item can be used for specifying one or more to divide bucket computing for each continuous feature respectively.As another example, described point
Bucket computing configuration item can be used for uniformly specifying one or more to divide bucket computing for all continuous features.
Here, for each continuous feature, the every kind of point of bucket computing performed to it can produce a point of bucket feature,
Accordingly, the feature being made up of all points of bucket features can replace original continuous feature and participate in automatic between feature to be combined
Combination.As an example, each the continuous feature treated among assemblage characteristic can be distinguished by dividing bucket computing configuration item to specify
Perform a variety of points of bucket computings so that perform a variety of points of buckets fortune specified respectively to each described continuous feature in step s 40
Calculate to obtain corresponding multiple points of bucket features.
Particularly, being continuously characterized in can be with a kind of relative feature of discrete features (for example, category feature), its value
It is that there is certain successional numerical value, for example, age, amount of money etc..Relatively, as an example, the value of discrete features does not have
Continuity, for example, it may be the nothing such as " coming from Beijing ", " coming from Shanghai " or " coming from Tianjin ", " sex is man ", " sex is female "
The feature of sequence classification.Correspondingly, bucket computing is divided to refer to carrying out continuous feature into a kind of decentralized ad hoc fashion, i.e. will be continuous
The codomain of feature is divided into multiple sections (that is, multiple buckets), and determines to divide bucket characteristic value accordingly based on the bucket of division.Also
To say, according to the exemplary embodiment of the present invention, for each continuous feature, by perform at least one divide bucket computing and
, can be by being obtained using each point of bucket feature as a component after having obtained corresponding at least one point of bucket feature
Feature corresponding with continuous feature, this feature can regard a point set for bucket feature as, and group is carried out with continuous feature and/or discrete features
Close.Here, it should be appreciated that divide the execution of bucket computing so that continuous feature by decentralized is inserted in corresponding specific bucket, changing
In multiple points of bucket features afterwards, each dimension can both indicate whether to be assigned in bucket continuous feature centrifugal pump (for example,
" 0 " or " 1 "), it can also indicate that specific serial number (for example, the actual characteristic value of continuous feature or its normalized value, described
The average value of each continuous feature, median, boundary value etc. in bucket).Correspondingly, each dimension of concrete application in machine learning
When centrifugal pump (for example, being directed to classification problem) or serial number (for example, being directed to regression problem), the group between centrifugal pump can be carried out
Close the combination (for example, arithmetical operation combination etc.) between (for example, cartesian product etc.) or serial number.
As an example, divide bucket computing configuration item to further comprise point bucket mode configuration item and/or divide bucket parameter configuration item.
Bucket mode configuration item is divided to be used to specify to divide to divide bucket mode used in bucket computing.Bucket parameter configuration item is divided to divide bucket mode for specifying
Divide bucket parameter.For example, bucket mode can be divided deeply by dividing bucket mode configuration item to specify wide point of bucket mode or wait, can be by dividing bucket
Parameter configuration item specifies barrelage amount or bucket width or bucket depth degree etc..Here, user can be manually entered or select a point bucket parameter to match somebody with somebody
Put the numerical value of item, especially, user can be prompted set according to ratio or equal difference relation is waited it is wide/wait depth divide each width of bucket/
Depth.
Here, as an example, divide a variety of points of bucket computings specified by bucket computing configuration item can be point bucket mode it is identical but
Bucket computing or difference is divided to divide bucket mode with what difference divided bucket parameter (for example, barrelage amount, bucket depth degree, bucket width etc.)
Divide bucket computing.Can be by this as an example, performing corresponding feature obtained by a variety of points of bucket computings specified to continuous feature
The feature that continuous feature performs obtained by every kind of point of bucket computing respectively collectively constitutes, corresponding with continuous feature special obtained from
Sign can simultaneously from different angles, yardstick/aspect portray some attributes of original data record.
It should be understood that example is merely possible to generating the mode of assemblage characteristic based on configuration item above and play explanation and
Illustration, exemplary embodiment of the invention are not limited to above-mentioned example.
As an example, after feature based combination configuration item generates the assemblage characteristic of machine learning sample, according to this
The method of the assemblage characteristic of the generation machine learning sample of invention exemplary embodiment may also include:It is directly that the combination of generation is special
Sign is applied to follow-up machine learning step.For example, can the machine learning sample based on the assemblage characteristic including at least generation come
Learn model.
As an example, the method for the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention may be used also
Including:The combination of the assemblage characteristic of generation is preserved in the form of configuration file, so as to machine continuous after execution
It can be directly invoked during learning procedure according to user's request, or can be straight according to user's request when carrying out other machines learning process
Connect calling.
The method that Fig. 2 shows the assemblage characteristic of generation machine learning sample in accordance with an alternative illustrative embodiment of the present invention
Flow chart.As shown in Fig. 2 the assemblage characteristic of generation machine learning sample in accordance with an alternative illustrative embodiment of the present invention
Method may also include step S50 in addition to including the step S10 shown in Fig. 1, step S20, step S30 and step S40.Step
S10, step S20, step S30 and step S40 can refer to according to the embodiment of Fig. 1 descriptions to realize, no longer superfluous herein
State.
In step s 50, the assemblage characteristic generated in step s 40 is shown to user.Here, can be according to any effective
Form shows the specific combination of assemblage characteristic.
As an example, also show evaluation of estimate of each assemblage characteristic on evaluation index to user.Here, the evaluation refers to
Mark can be the evaluation index or other any evaluation indexes specified by the evaluation index configuration item that user is set.
As an example, the side of the assemblage characteristic for generating machine learning sample in accordance with an alternative illustrative embodiment of the present invention
Method may also include:The assemblage characteristic that user is selected from the assemblage characteristic of display is applied to follow-up machine learning step.
As another example, the assemblage characteristic of generation machine learning sample in accordance with an alternative illustrative embodiment of the present invention
Method may also include:The combination for the assemblage characteristic that user is selected is preserved in the form of configuration file, so as to
It can be directly invoked when performing follow-up machine learning step according to user's request, or can when carrying out other machines learning process
Directly invoked according to user's request.
As an example, the side of the assemblage characteristic for generating machine learning sample in accordance with an alternative illustrative embodiment of the present invention
Method may also include:The assemblage characteristic that user is selected from the assemblage characteristic of display is applied to follow-up machine learning step, and
The combination for the assemblage characteristic selected is preserved in the form of configuration file.
Describe according to an exemplary embodiment of the present invention to set feature by graphic interface by user with reference to Fig. 3
Combine the example of configuration item.Fig. 3 shows the figure according to an exemplary embodiment of the present invention for being used to set combinations of features configuration item
The example at interface.It should be understood that specific interaction of the exemplary embodiment of the present invention when setting each combinations of features configuration item is thin
Section is not limited to the example shown in Fig. 3.
As shown in figure 3, for set the graphical interfaces of combinations of features configuration item can show respectively with feature configuration item, evaluation
Index allocation item, training parameter configuration item and divide content options and/or content input frame corresponding to bucket computing configuration item.Specifically
In fact, the input operation of " selecting whole features " option can be chosen according to user to set feature configuration item so that in step S10
The unit character of middle acquisition is designated all as feature to be combined;Or the input of " self-defined " option can be chosen to grasp according to user
Make, eject the user interface for self-defined feature to be combined, it is special with the candidate's unit provided by user from the user interface
Feature to be combined is selected in sign (for example, the whole unit characters obtained in step S10), or, spy to be combined is inputted by user
The identification information of sign completes the setting to feature configuration item.It can be commented according to selection operation of the user in drop-down menu to set
Valency index allocation item so that the content (for example, " AUC " as shown in Figure 3) of user's selection is designated as evaluation index.User can
By to edit operation corresponding to content input frame (for example, learning rate configuration item as shown in Figure 3) (for example, as shown in Figure 3 it is defeated
Enter numerical value " 0.5 ") realize the setting to training parameter configuration item.User can be by dividing bucket computing configuration item (for example, as schemed
Divide bucket parameter configuration item (barrelage amount configuration item) shown in 3) corresponding to content input frame edit operation (for example, as shown in Figure 3
Input numerical value " 10/100/1000/10000/100000 ") realize the setting to dividing bucket computing configuration item, namely user sets
Divide bucket computing configuration item to specify each continuous feature treated among assemblage characteristic to perform five kinds of points of bucket computings respectively, wherein,
The first divides barrelage amount corresponding to bucket computing to be " 10 ", barrelage amount corresponding to second point of bucket computing is " 100 " ..., the 5th kind
The barrelage amount corresponding to bucket computing of dividing is " 100000 ", here, divides bucket computing mode to be defaulted as wide point of bucket.
Showing for the assemblage characteristic according to an exemplary embodiment of the present invention that generation is shown to user is described with reference to Fig. 4
Example.In the example of fig. 4, assemblage characteristic is shown as the form of combinations of features analysis report.
As shown in figure 4, the unit character obtained in step S10 is shown in table left side above, with " output characteristic name=
The form of processing method (field name of former attribute information) " is shown, for example, discrete_feature_1729_0=
Discrete (cons_price_idx) represents the field cons_price_idx of discrete value as unit character discrete_
feature_1729_0;The assemblage characteristic in step S40 generations is shown on the left of following table, with " output characteristic name=place
The form of reason method (combine (former feature name 1, former feature name 2, former feature name 3 ...)) is shown, for example, discrete_
Feature_1729_23=discrete (combine (default, month)) represents to combine feature default and month
Obtained discrete features are new assemblage characteristic discrete_feature_1729_23.Right side is shown each in two tables
Evaluation of estimate of the feature on evaluation index.As an example, table above can not be shown, and only show following table.
Further, as an example, user can select assemblage characteristic from the combinations of features analysis report shown in Fig. 4, with
Preserved applied to follow-up machine learning step and/or in the form of configuration file.
According to the exemplary embodiment of the present invention, machine learning can be performed by the form of directed acyclic graph (DAG figures)
Flow, the machine learning flow can cover all or part of step for carrying out machine learning model training, testing or estimate.
For example, can for feature Automatic Combined come establish including historical data steps for importing, data splitting step, feature extraction step,
The DAG figures of automated characterization combination step.That is, above-mentioned each step can be performed as the node in DAG figures.
Fig. 5 shows the DAG figures according to an exemplary embodiment of the present invention for being used to generate the assemblage characteristic of machine learning sample
Example.
Reference picture 5, the first step:Establish data delivery node.For example, as shown in figure 5, user's operation is may be in response to data
Delivery node is configured so that the banking business data table of entitled " bank " is imported in machine learning platform, wherein, the number
According to a plurality of historgraphic data recording can be included in table.
Second step:Establish data and split node, and import data to node and be connected to data fractionation node, led above-mentioned
The tables of data entered is split as training set and checking collects, wherein, the data record in training set is used to be converted to machine learning sample
To learn model, and verify the data record concentrated and be used to be converted to test sample to verify the effect for the model for learning.
User's operation is may be in response to be configured in an arranged manner to split the tables of data of above-mentioned importing data fractionation node
Collect for training set and checking.
3rd step:Two feature extraction nodes are established, and data fractionation node is connected into spy respectively and taken out to the two features
Node is taken, feature extraction is carried out respectively so that data are split with the training set of node output and checking collection, for example, default data is split
Output is training set on the left of node, and right side output is checking collection.The spy that can be set based on user in feature extraction node
Sign configuration or the code write carry out feature extraction to training set and checking collection.It should be understood that for machine learning sample and test
For sample, both feature extraction modes are corresponding consistent.User can will extract the feature of node configuration to left feature
Extraction mode directly applies to the feature extraction that node is extracted to right feature, or, the two can be set to automatic synchronization by platform
Set.
4th step:Automated characterization combined joint is established, and two feature extraction nodes are connected respectively to automated characterization group
Close node.It may be in response to user's operation to be configured automated characterization combined joint, for example, clicking on " automatically when receiving user
During the operation of combinations of features " node, figure circle for being used to set combinations of features configuration item as shown in Figure 3 can be provided a user
Face, in order to which user sets combinations of features configuration item by the graphical interfaces.
After foundation includes the DAG figures of above-mentioned steps, whole DAG figures can be run according to the instruction of user.Running
During, machine learning platform can automatically generate the assemblage characteristic of machine learning sample according to the configuration item that user is set, and
Export corresponding assemblage characteristic.
In addition, as an example, after automated characterization combined joint, model training node can be also established, and will be automatic special
Sign combined joint is connected to model training node, and the assemblage characteristic of the feature of extraction and generation is directly applied into follow-up mould
Type training.Correspondingly, user's operation is may be in response to be configured model training node to be based on machine in an arranged manner
Device learning sample training pattern.So as to when running whole DAG figures, you can directly learn machine according to the configuration item that user is set
Device learning model.
Fig. 6 shows the frame of the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention
Figure.As shown in fig. 6, the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention includes:It is single
Position feature acquisition device 10, display device 20, configuration item acquisition device 30, assemblage characteristic generating means 40.
Unit character acquisition device 10 is used to obtain the unit character that can be combined.
As an example, unit character acquisition device 10 can by the attribute information of data record is carried out characteristic processing come
To unit character.
Display device 20 is used to provide a user the graphical interfaces for setting combinations of features configuration item, wherein, the spy
Sign combination configuration item is used to limit how to carry out combinations of features between unit character.
Configuration item acquisition device 30 is used to receive user to set combinations of features configuration item and perform on graphical interfaces
Input operation, and according to the input operation come obtain user setting combinations of features configuration item.
Assemblage characteristic generating means 40 are used for the combinations of features configuration item based on acquisition to be combined among unit character
Feature is combined, to generate the assemblage characteristic of machine learning sample.
Alternatively, combinations of features configuration item may include at least one among following item:Feature configuration item, evaluation index are matched somebody with somebody
Put item, training parameter configuration item, divide bucket computing configuration item.
Particularly, feature configuration item is used to specify the feature to be combined among unit character so that assemblage characteristic generates
Device 40 is combined to specified feature to be combined.
Evaluation index configuration item is used for the evaluation index of given combination feature so that assemblage characteristic generating means 40 are according to finger
Fixed evaluation index weighs the effect of machine learning model corresponding with various assemblage characteristics to determine the group of feature to be combined
Conjunction mode.
Training parameter configuration item is used for the training parameter of specified machine learning model so that assemblage characteristic generating means 40 are logical
Cross and weigh the effect of machine learning model corresponding with various assemblage characteristics that is obtained under specified training parameter to determine to treat
The combination of assemblage characteristic.
Point bucket computing configuration item is used to specifying perform at least one continuous feature treated among assemblage characteristic respectively
One or more divide bucket computing so that assemblage characteristic generating means 40 perform what is specified respectively at least one continuous feature
One or more divide bucket computing and divide bucket feature to obtain corresponding one or more, and using obtain divide bucket feature as it is overall with
Other features to be combined are combined.
As an example, described point of bucket computing configuration item can be used for specifying one or more for each continuous feature respectively
Divide bucket computing.As another example, described point of bucket computing configuration item can be used for for all continuous features it is unified specify it is a kind of or
A variety of points of bucket computings.
As an example, display device 20 can also show the assemblage characteristic of the generation of assemblage characteristic generating means 40 to user.Enter
One step, as an example, display device 20 can also show each assemblage characteristic of the generation of assemblage characteristic generating means 40 to user
Evaluation of estimate on evaluation index.
As an example, the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention may be used also
Including:Application apparatus (not shown).
The assemblage characteristic that application apparatus is used to directly generate assemblage characteristic generating means 40 is applied to follow-up engineering
Step is practised, or, the assemblage characteristic selected in the assemblage characteristic that user is shown from display device 20 is applied to follow-up machine
Learning procedure.
As an example, the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention may be used also
Including:Save set (not shown).
The combination for the assemblage characteristic that save set is used to generate assemblage characteristic generating means 40 is with configuration file
Form is preserved, or, the combination for the assemblage characteristic selected in the assemblage characteristic that user is shown from display device 20
Preserved in the form of configuration file.
It should be understood that the tool of the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention
The related specific implementation that body implementation may be incorporated by reference Fig. 1 to Fig. 5 descriptions is realized, will not be repeated here.
Device included by the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention
It can be individually configured to perform the software of specific function, hardware, firmware or any combination of above-mentioned item.For example, these devices can
Corresponding to special integrated circuit, pure software code is can also correspond to, also corresponds to the mould that software is combined with hardware
Block.In addition, the one or more functions realized of these devices also can by physical entity equipment (for example, processor, client or
Server etc.) in component seek unity of action.
It should be understood that the method for the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention can pass through
The program in computer-readable media is recorded in realize, for example, the exemplary embodiment according to the present invention, it is possible to provide one kind is used for
The computer-readable medium of the assemblage characteristic of machine learning sample is generated, wherein, being recorded on the computer-readable medium has
For performing the computer program of following methods step:(A) unit character that can be combined is obtained;(B) provide a user
For setting the graphical interfaces of combinations of features configuration item, wherein, the combinations of features configuration item is used to how limit in unit spy
Combinations of features is carried out between sign;(C) user is received for the input for setting combinations of features configuration item and being performed on graphical interfaces
Operation, and the combinations of features configuration item that user is set is obtained according to the input operation;And the feature group of (D) based on acquisition
Close configuration item to be combined the feature to be combined among unit character, to generate the assemblage characteristic of machine learning sample.
Computer program in above computer computer-readable recording medium can be in client, main frame, agent apparatus, server etc.
Run in the environment disposed in computer equipment, it should be noted that the computer program can be additionally used in perform except above-mentioned steps with
Outer additional step or performed when performing above-mentioned steps more specifically handles, and these additional steps and further handles
Content is described referring to figs. 1 to Fig. 5, here in order to avoid repetition will be repeated no longer.
It should be noted that the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention can be complete
Corresponding function is realized in the operation for relying on computer program, i.e. in the function structure of each device and computer program with it is each
Step is corresponding so that whole system is called by special software kit (for example, lib storehouses), to realize corresponding function.
On the other hand, the system of the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention is wrapped
The each device included can also be realized by hardware, software, firmware, middleware, microcode or its any combination.When with soft
When part, firmware, middleware or microcode are realized, program code or code segment for performing corresponding operating can be stored in all
As storage medium computer-readable medium in so that processor can be by reading and running corresponding program code or code
Section performs corresponding operation.
For example, the exemplary embodiment of the present invention is also implemented as computing device, the computing device includes memory unit
And processor, set of computer-executable instructions conjunction is stored with memory unit, when the set of computer-executable instructions is closed by institute
When stating computing device, the method for the assemblage characteristic of execution generation machine learning sample.
Particularly, the computing device can be deployed in server or client, can also be deployed in distributed network
On node apparatus in network environment.In addition, the computing device can be PC computers, board device, personal digital assistant, intelligence
Can mobile phone, web applications or other be able to carry out the device of above-mentioned instruction set.
Here, the computing device is not necessarily single computing device, can also be it is any can be alone or in combination
Perform the device of above-mentioned instruction (or instruction set) or the aggregate of circuit.Computing device can also be integrated control system or system
A part for manager, or can be configured as with Local or Remote (for example, via be wirelessly transferred) with the portable of interface inter-link
Formula electronic installation.
In the computing device, processor may include central processing unit (CPU), graphics processor (GPU), may be programmed and patrol
Collect device, dedicated processor systems, microcontroller or microprocessor.Unrestricted as example, processor may also include simulation
Processor, digital processing unit, microprocessor, polycaryon processor, processor array, network processing unit etc..
It is according to an exemplary embodiment of the present invention generation machine learning sample assemblage characteristic method described in certain
A little operations can realize that some operations can be realized by hardware mode, in addition, can also pass through software and hardware knot by software mode
The mode of conjunction realizes these operations.
Processor can run the instruction being stored in one of memory unit or code, wherein, the memory unit can be with
Data storage.Instruction and data can be also sent and received via Network Interface Unit and by network, wherein, the network connects
Mouth device can use any of host-host protocol.
Memory unit can be integral to the processor and be integrated, for example, RAM or flash memory are arranged in into integrated circuit microprocessor etc.
Within.In addition, memory unit may include independent device, such as, outside dish driving, storage array or any Database Systems can
Other storage devices used.Memory unit and processor can be coupled operationally, or can for example by I/O ports,
Network connection etc. communicates so that processor can read the file being stored in memory unit.
In addition, the computing device may also include video display (such as, liquid crystal display) and user mutual interface is (all
Such as, keyboard, mouse, touch input device etc.).The all component of computing device can be connected to each other via bus and/or network.
Operation involved by the method for the assemblage characteristic of generation machine learning sample according to an exemplary embodiment of the present invention
The functional block or function diagram of various interconnections or coupling can be described as.However, these functional blocks or function diagram can be impartial
Ground is integrated into single logic device or operated according to non-definite border.
For example, the as described above, assemblage characteristic according to an exemplary embodiment of the present invention for being used to generate machine learning sample
Computing device may include memory unit and processor, wherein, be stored with memory unit set of computer-executable instructions conjunction, when
When the set of computer-executable instructions is closed by the computing device, following step is performed:(A) obtain what can be combined
Unit character;(B) graphical interfaces for setting combinations of features configuration item is provided a user, wherein, the combinations of features configuration
Item is used to limit how to carry out combinations of features between unit character;(C) receive user to set combinations of features configuration item and
The input operation performed on graphical interfaces, and the combinations of features configuration item that user is set is obtained according to the input operation;
And the combinations of features configuration item of (D) based on acquisition is combined to the feature to be combined among unit character, to generate machine
The assemblage characteristic of learning sample.
The foregoing describe each exemplary embodiment of the present invention, it should be appreciated that foregoing description is only exemplary, and exhaustive
Property, the invention is not restricted to disclosed each exemplary embodiment.Without departing from the scope and spirit of the invention, it is right
Many modifications and changes will be apparent from for those skilled in the art.Therefore, protection of the invention
Scope should be defined by the scope of claim.
Claims (10)
1. a kind of method for the assemblage characteristic for generating machine learning sample, including:
(A) unit character that can be combined is obtained;
(B) graphical interfaces for setting combinations of features configuration item is provided a user, wherein, the combinations of features configuration item is used for
How restriction carries out combinations of features between unit character;
(C) user is received for the input operation for setting combinations of features configuration item and being performed on graphical interfaces, and according to described
Input operation come obtain user setting combinations of features configuration item;And
(D) the combinations of features configuration item based on acquisition is combined to the feature to be combined among unit character, to generate machine
The assemblage characteristic of learning sample.
2. according to the method for claim 1, wherein, combinations of features configuration item includes at least one among following item:It is special
Configuration item is levied, for specifying the feature to be combined among unit character so that specified feature to be combined is entered in step (D)
Row combination;Evaluation index configuration item, the evaluation index for given combination feature so that commented in step (D) according to specified
Valency index weighs the effect of machine learning model corresponding with various assemblage characteristics to determine the combination of feature to be combined;
Training parameter configuration item, the training parameter for specified machine learning model so that by weighing specified in step (D)
The effect of the machine learning model corresponding with various assemblage characteristics obtained under training parameter determines the combination of feature to be combined
Mode.
3. according to the method for claim 2, wherein, combinations of features configuration item also includes:Divide bucket computing configuration item, for referring to
The fixed one or more for performing at least one continuous feature treated among assemblage characteristic respectively divide bucket computing so that in step
(D) at least one continuous feature is performed respectively in the one or more specified divide bucket computing with obtain corresponding one or
Multiple points of bucket features, and divide bucket feature to be combined as overall with other features to be combined using what is obtained.
4. according to the method for claim 3, wherein, described point of bucket computing configuration item is used for continuous special for each respectively
Sign specifies one or more to divide bucket computing;Or described point of bucket computing configuration item is used to specify for all continuous features are unified
One or more divide bucket computing.
5. the method according to claim 11, in addition to:
(E) assemblage characteristic of generation is shown to user.
6. according to the method for claim 5, wherein, in step (E), also show each assemblage characteristic on commenting to user
The evaluation of estimate of valency index.
7. the method according to claim 11, in addition to:
(F) assemblage characteristic of generation is directly applied to follow-up machine learning step.
8. a kind of system for the assemblage characteristic for generating machine learning sample, including:
Unit character acquisition device, for obtaining the unit character that can be combined;
Display device, for providing a user the graphical interfaces for setting combinations of features configuration item, wherein, the combinations of features
Configuration item is used to limit how to carry out combinations of features between unit character;
Configuration item acquisition device, for receiving user for the input for setting combinations of features configuration item and being performed on graphical interfaces
Operation, and the combinations of features configuration item that user is set is obtained according to the input operation;And
Assemblage characteristic generating means, the feature to be combined among unit character is entered for the combinations of features configuration item based on acquisition
Row combination, to generate the assemblage characteristic of machine learning sample.
9. a kind of computer-readable medium for being used to generate the assemblage characteristic of machine learning sample, wherein, can in the computer
Reading record on medium has the side of the assemblage characteristic for generating machine learning sample for execution as described in claim 1 to 7 is any
The computer program of method.
10. a kind of computing device for being used to generate the assemblage characteristic of machine learning sample, including memory unit and processor, its
In, set of computer-executable instructions conjunction is stored with memory unit, when the set of computer-executable instructions is closed by the processing
When device performs, the method for the assemblage characteristic of generation machine learning sample of the execution as described in claim 1 to 7 is any.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010658034.5A CN111797998B (en) | 2017-09-28 | 2017-09-28 | Method and system for generating combined features of machine learning samples |
CN201710898898.2A CN107766946B (en) | 2017-09-28 | 2017-09-28 | Method and system for generating combined features of machine learning samples |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710898898.2A CN107766946B (en) | 2017-09-28 | 2017-09-28 | Method and system for generating combined features of machine learning samples |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010658034.5A Division CN111797998B (en) | 2017-09-28 | 2017-09-28 | Method and system for generating combined features of machine learning samples |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107766946A true CN107766946A (en) | 2018-03-06 |
CN107766946B CN107766946B (en) | 2020-06-23 |
Family
ID=61267329
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010658034.5A Active CN111797998B (en) | 2017-09-28 | 2017-09-28 | Method and system for generating combined features of machine learning samples |
CN201710898898.2A Active CN107766946B (en) | 2017-09-28 | 2017-09-28 | Method and system for generating combined features of machine learning samples |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010658034.5A Active CN111797998B (en) | 2017-09-28 | 2017-09-28 | Method and system for generating combined features of machine learning samples |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN111797998B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108681426A (en) * | 2018-05-25 | 2018-10-19 | 第四范式(北京)技术有限公司 | Method and system for executing characteristic processing for data |
CN108710949A (en) * | 2018-04-26 | 2018-10-26 | 第四范式(北京)技术有限公司 | The method and system of template are modeled for creating machine learning |
CN108985459A (en) * | 2018-05-30 | 2018-12-11 | 华为技术有限公司 | The method and apparatus of training pattern |
CN109634961A (en) * | 2018-12-05 | 2019-04-16 | 杭州大拿科技股份有限公司 | A kind of paper sample generating method, device, electronic equipment and storage medium |
CN109685583A (en) * | 2019-01-10 | 2019-04-26 | 博拉网络股份有限公司 | A kind of supply chain needing forecasting method based on big data |
CN110851500A (en) * | 2019-11-07 | 2020-02-28 | 北京集奥聚合科技有限公司 | Method for generating expert characteristic dimension required by machine learning modeling |
CN110895718A (en) * | 2018-09-07 | 2020-03-20 | 第四范式(北京)技术有限公司 | Method and system for training machine learning model |
CN110956272A (en) * | 2019-11-01 | 2020-04-03 | 第四范式(北京)技术有限公司 | Method and system for realizing data processing |
CN111625692A (en) * | 2020-05-27 | 2020-09-04 | 北京字节跳动网络技术有限公司 | Feature extraction method, device, electronic equipment and computer readable medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090116693A1 (en) * | 2007-11-01 | 2009-05-07 | Canon Kabushiki Kaisha | Image processing apparatus and image processing method |
CN103353936A (en) * | 2013-07-26 | 2013-10-16 | 上海交通大学 | Method and system for face identification |
US20130322741A1 (en) * | 2012-06-05 | 2013-12-05 | DRVision Technologies LLC. | Teachable pattern scoring method |
CN105260171A (en) * | 2015-09-10 | 2016-01-20 | 深圳市创梦天地科技有限公司 | Virtual item generation method and apparatus |
CN105677353A (en) * | 2016-01-08 | 2016-06-15 | 北京物思创想科技有限公司 | Feature extraction method and machine learning method and device thereof |
CN106127531A (en) * | 2016-07-14 | 2016-11-16 | 北京物思创想科技有限公司 | The method and system of differentiation price are performed based on machine learning |
CN106779088A (en) * | 2016-12-06 | 2017-05-31 | 北京物思创想科技有限公司 | Perform the method and system of machine learning flow |
CN107045503A (en) * | 2016-02-05 | 2017-08-15 | 华为技术有限公司 | The method and device that a kind of feature set is determined |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106897918A (en) * | 2017-02-24 | 2017-06-27 | 上海易贷网金融信息服务有限公司 | A kind of hybrid machine learning credit scoring model construction method |
-
2017
- 2017-09-28 CN CN202010658034.5A patent/CN111797998B/en active Active
- 2017-09-28 CN CN201710898898.2A patent/CN107766946B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090116693A1 (en) * | 2007-11-01 | 2009-05-07 | Canon Kabushiki Kaisha | Image processing apparatus and image processing method |
US20130322741A1 (en) * | 2012-06-05 | 2013-12-05 | DRVision Technologies LLC. | Teachable pattern scoring method |
CN103353936A (en) * | 2013-07-26 | 2013-10-16 | 上海交通大学 | Method and system for face identification |
CN105260171A (en) * | 2015-09-10 | 2016-01-20 | 深圳市创梦天地科技有限公司 | Virtual item generation method and apparatus |
CN105677353A (en) * | 2016-01-08 | 2016-06-15 | 北京物思创想科技有限公司 | Feature extraction method and machine learning method and device thereof |
CN107045503A (en) * | 2016-02-05 | 2017-08-15 | 华为技术有限公司 | The method and device that a kind of feature set is determined |
CN106127531A (en) * | 2016-07-14 | 2016-11-16 | 北京物思创想科技有限公司 | The method and system of differentiation price are performed based on machine learning |
CN106779088A (en) * | 2016-12-06 | 2017-05-31 | 北京物思创想科技有限公司 | Perform the method and system of machine learning flow |
Non-Patent Citations (3)
Title |
---|
李叶紫等: ""基于互信息的组合特征选择算法"", 《计算机系统应用》 * |
李婷婷等: ""基于SVM和CRF多特征组合的微博情感分析"", 《计算机应用研究》 * |
李敏等: ""特征选择方法与算法的研究"", 《计算机技术与发展》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108710949A (en) * | 2018-04-26 | 2018-10-26 | 第四范式(北京)技术有限公司 | The method and system of template are modeled for creating machine learning |
CN108681426A (en) * | 2018-05-25 | 2018-10-19 | 第四范式(北京)技术有限公司 | Method and system for executing characteristic processing for data |
CN108681426B (en) * | 2018-05-25 | 2020-08-11 | 第四范式(北京)技术有限公司 | Method and system for performing feature processing on data |
CN108985459A (en) * | 2018-05-30 | 2018-12-11 | 华为技术有限公司 | The method and apparatus of training pattern |
CN110895718A (en) * | 2018-09-07 | 2020-03-20 | 第四范式(北京)技术有限公司 | Method and system for training machine learning model |
CN109634961A (en) * | 2018-12-05 | 2019-04-16 | 杭州大拿科技股份有限公司 | A kind of paper sample generating method, device, electronic equipment and storage medium |
CN109634961B (en) * | 2018-12-05 | 2021-06-04 | 杭州大拿科技股份有限公司 | Test paper sample generation method and device, electronic equipment and storage medium |
CN109685583A (en) * | 2019-01-10 | 2019-04-26 | 博拉网络股份有限公司 | A kind of supply chain needing forecasting method based on big data |
CN109685583B (en) * | 2019-01-10 | 2020-12-25 | 博拉网络股份有限公司 | Supply chain demand prediction method based on big data |
CN110956272A (en) * | 2019-11-01 | 2020-04-03 | 第四范式(北京)技术有限公司 | Method and system for realizing data processing |
CN110956272B (en) * | 2019-11-01 | 2023-08-08 | 第四范式(北京)技术有限公司 | Method and system for realizing data processing |
CN110851500A (en) * | 2019-11-07 | 2020-02-28 | 北京集奥聚合科技有限公司 | Method for generating expert characteristic dimension required by machine learning modeling |
CN110851500B (en) * | 2019-11-07 | 2022-10-28 | 北京集奥聚合科技有限公司 | Method for generating expert characteristic dimension required by machine learning modeling |
CN111625692A (en) * | 2020-05-27 | 2020-09-04 | 北京字节跳动网络技术有限公司 | Feature extraction method, device, electronic equipment and computer readable medium |
CN111625692B (en) * | 2020-05-27 | 2023-08-22 | 抖音视界有限公司 | Feature extraction method, device, electronic equipment and computer readable medium |
Also Published As
Publication number | Publication date |
---|---|
CN111797998B (en) | 2024-06-11 |
CN107766946B (en) | 2020-06-23 |
CN111797998A (en) | 2020-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107766946A (en) | Generate the method and system of the assemblage characteristic of machine learning sample | |
CN107844837A (en) | The method and system of algorithm parameter tuning are carried out for machine learning algorithm | |
CN108090516A (en) | Automatically generate the method and system of the feature of machine learning sample | |
Osaba et al. | A tutorial on the design, experimentation and application of metaheuristic algorithms to real-world optimization problems | |
Sedlmair et al. | Visual parameter space analysis: A conceptual framework | |
CN108710949A (en) | The method and system of template are modeled for creating machine learning | |
CN108008942A (en) | The method and system handled data record | |
US20180365557A1 (en) | Information processing method and information processing apparatus | |
CN107908566A (en) | Automatic test management method, device, terminal device and storage medium | |
CN104798043B (en) | A kind of data processing method and computer system | |
CN110045953A (en) | Generate the method and computing device of business rule expression formula | |
CN108228861A (en) | For performing the method and system of the Feature Engineering of machine learning | |
US20210027514A1 (en) | Method and system for creating animal type avatar using human face | |
CN104834479A (en) | Method and system for automatically optimizing configuration of storage system facing cloud platform | |
CN107316082A (en) | For the method and system for the feature importance for determining machine learning sample | |
CN108960264A (en) | The training method and device of disaggregated model | |
CN107273979A (en) | The method and system of machine learning prediction are performed based on service class | |
CN107578140A (en) | Guide analysis system and method | |
WO2019108371A1 (en) | Training neural networks to detect similar three-dimensional objects using fuzzy identification | |
CN110197004B (en) | Circuit simulation method and device based on mobile terminal, computer medium and equipment | |
Fischer et al. | Towards a survey on static and dynamic hypergraph visualizations | |
CN107909087A (en) | Generate the method and system of the assemblage characteristic of machine learning sample | |
CN108108820A (en) | For selecting the method and system of the feature of machine learning sample | |
CN107909141A (en) | A kind of data analysing method and device based on grey wolf optimization algorithm | |
CN107679549A (en) | Generate the method and system of the assemblage characteristic of machine learning sample |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |