CN108108820A - For selecting the method and system of the feature of machine learning sample - Google Patents
For selecting the method and system of the feature of machine learning sample Download PDFInfo
- Publication number
- CN108108820A CN108108820A CN201711383339.4A CN201711383339A CN108108820A CN 108108820 A CN108108820 A CN 108108820A CN 201711383339 A CN201711383339 A CN 201711383339A CN 108108820 A CN108108820 A CN 108108820A
- Authority
- CN
- China
- Prior art keywords
- feature
- candidate
- machine learning
- candidate feature
- importance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010801 machine learning Methods 0.000 title claims abstract description 140
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000012360 testing method Methods 0.000 claims abstract description 62
- 238000006243 chemical reaction Methods 0.000 claims abstract description 29
- 230000000694 effects Effects 0.000 claims abstract description 21
- 238000013480 data collection Methods 0.000 claims abstract description 17
- 238000012549 training Methods 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 abstract description 3
- 238000012545 processing Methods 0.000 description 9
- 238000012216 screening Methods 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 230000007246 mechanism Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 241001269238 Data Species 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000011282 treatment Methods 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000029305 taxis Effects 0.000 description 1
- 230000007306 turnover Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Abstract
Provide a kind of method and system for the feature for being used to select machine learning sample.The described method includes:(A) candidate characteristic set is divided into multiple candidate feature subsets;(B) for each candidate feature subset, corresponding feature pool machine learning model is obtained;(C) importance of each candidate feature among corresponding candidate feature subset is determined according to the difference between effect of the feature pool machine learning model on original test data collection and conversion test data set;And (D) therefrom filters out at least one candidate feature of high importance using the target signature as machine learning sample for each candidate feature subset according to the importance of its each candidate feature.According to described method and system, relatively important sample characteristics can be preferably filtered out in the case where calculation resources are limited.
Description
Technical field
All things considered of the present invention is related to artificial intelligence field, is used to select machine learning sample more specifically to one kind
The method and system of this feature.
Background technology
With the appearance of mass data, artificial intelligence technology is rapidly developed, and in order to be excavated from mass data
Bid value based on data record, it is necessary to generate the sample suitable for machine learning.
Here, per data, record can be seen as the description as described in an event or object, corresponding to an example or sample
Example.In data record, each item of performance or property including reflection event or object in terms of certain, these items can claim
For " attribute ".The processing such as Feature Engineering are carried out by the attribute information to data record, can be generated including various features
Machine learning sample.
In practice, the extraction of the selecting of the prediction effect of machine learning model and model, available data and sample characteristics
There is relation.In addition, using machine learning techniques when also needs to objective ask in face of computing resource is limited, sample data is insufficient etc.
Topic.Therefore, the feature of machine learning sample how is efficiently extracted out from each attribute of original data record, it will to machine
The effect of learning model brings very big influence.For example, it can be calculated each special according to the tree-model trained based on XGBoost
The expectation division gain of sign, then calculates feature importance, and screens feature based on the importance.Although aforesaid way can be examined
Consider the interaction between feature, but training cost is high, and different parameters are affected to feature importance.
In fact, during feature is screened, the knowledge that technical staff not only grasps machine learning is generally required, is also needed
There is deep understanding to actual prediction problem, and forecasting problem often combines the different practical experiences of different industries, causes
It is extremely difficult to satisfied effect.
The content of the invention
Exemplary embodiment of the present invention, which is intended to overcome, to be difficult to effectively filter out machine learning sample spy in the prior art
The defects of sign.
Exemplary embodiment according to the present invention provides a kind of method for the feature for being used to select machine learning sample, bag
It includes:(A) candidate characteristic set is divided into multiple candidate feature subsets;(B) for each candidate feature subset, obtain corresponding
Feature pool machine learning model, wherein, the feature pool machine learning model corresponds to each described candidate feature subset;
(C) according to feature pool machine learning model original test data collection and conversion test data set on effect between difference come
Determine the importance of each candidate feature among corresponding candidate feature subset, wherein, conversion test data set refers to pass through
The number that the original value of the candidate feature to be determined to its importance of original test data concentration replaces with transformed value and obtains
According to collection;And (D) therefrom filters out importance for each candidate feature subset according to the importance of its each candidate feature
Higher at least one candidate feature is using the target signature as machine learning sample.
Optionally, in the method, the transformed value includes at least one among following item:Zero, random number,
The value obtained after original value upset order by its importance for concentrating original test data candidate feature to be determined.
Optionally, the method further includes:(E) target signature is removed from candidate characteristic set to update candidate characteristic set;And
And after step (E), the method is performed since step (A) based on newer candidate characteristic set again, until completing
The selection of all target signatures.
Optionally, in the method, in step (E), also add while target signature is removed from candidate characteristic set
Enter new candidate feature to update candidate characteristic set.
Optionally, in the method, the new candidate feature is by carrying out combinations of features between candidate feature
And newly-generated assemblage characteristic.
Optionally, in the method, in step (B), by concurrently training multiple feature pool machine learning models
To obtain feature pool machine learning model corresponding with each candidate feature subset.
Optionally, the method further includes:(E) using the target signature filtered out as newer candidate characteristic set;Also,
After step (E), the method is performed since step (A) based on newer candidate characteristic set again, it is pre- until meeting
If target signature screening termination condition.
In accordance with an alternative illustrative embodiment of the present invention, provide a kind of for selecting the feature of machine learning sample to be
System, including:Candidate feature subset divides device, for candidate characteristic set to be divided into multiple candidate feature subsets;Feature pool machine
Device learning model acquisition device for being directed to each candidate feature subset, obtains corresponding feature pool machine learning model,
In, the feature pool machine learning model corresponds to each described candidate feature subset;Candidate feature importance determining device,
For according to feature pool machine learning model original test data collection and conversion test data set on effect between difference
Determine the importance of each candidate feature among corresponding candidate feature subset, wherein, conversion test data set refers to lead to
The original value for crossing the candidate feature to be determined to its importance of original test data concentration replaces with transformed value and obtains
Data set;And
Target signature selection device, for being directed to each candidate feature subset, according to the important of its each candidate feature
Property therefrom filters out at least one candidate feature of high importance using the target signature as machine learning sample.
Optionally, in the system, the transformed value includes at least one among following item:Zero, random number,
The value obtained after original value upset order by its importance for concentrating original test data candidate feature to be determined.
Optionally, in the system, candidate feature subset division device also from candidate characteristic set remove target signature with
Candidate characteristic set is updated, and newer candidate characteristic set is divided into multiple candidate feature subsets, it is special until completing all targets
The selection of sign.
Optionally, in the system, candidate feature subset division device is removing target signature from candidate characteristic set
New candidate feature is additionally added to update candidate characteristic set simultaneously.
Optionally, in the system, the new candidate feature is by carrying out combinations of features between candidate feature
And newly-generated assemblage characteristic.
Optionally, in the system, feature pool machine learning model acquisition device is by concurrently training multiple features
Pond machine learning model obtains feature pool machine learning model corresponding with each candidate feature subset.
Optionally, in the system, character subset divides device also using the target signature filtered out as newer time
Feature set is selected, and newer candidate characteristic set is divided into multiple candidate feature subsets, until meeting default target signature sieve
Select termination condition.
In accordance with an alternative illustrative embodiment of the present invention, a kind of calculating for the feature for being used to select machine learning sample is provided
Machine readable medium, wherein, record has any as described above for selecting machine learning sample on the computer-readable medium
The computer program of the method for this feature.
In accordance with an alternative illustrative embodiment of the present invention, a kind of calculating for the feature for being used to select machine learning sample is provided
Device, including storage unit and processor, wherein, set of computer-executable instructions conjunction is stored in storage unit, when the meter
When calculation machine executable instruction set is performed by the processor, execution is any to be used to select machine learning sample as described above
The method of feature.
It, will according to an exemplary embodiment of the present invention in selecting the method and system of feature of machine learning sample
Candidate characteristic set carries out subset division, for each candidate feature subset marked off, using feature pool machine learning model, adopts
Mode is weighed with specific importance to determine the importance of wherein each candidate feature, so as in the limited feelings of calculation resources
Relatively important sample characteristics are preferably filtered out under condition.
Description of the drawings
From the detailed description below in conjunction with the accompanying drawings to the embodiment of the present invention, these and/or other aspect of the invention and
Advantage will become clearer and be easier to understand, wherein:
Fig. 1 shows the frame of the system of the feature according to an exemplary embodiment of the present invention for being used to select machine learning sample
Figure;
Fig. 2 shows the flow of the method for the feature according to an exemplary embodiment of the present invention for being used to select machine learning sample
Figure;
Fig. 3 shows the flow chart of the method for training characteristics pond according to an exemplary embodiment of the present invention machine learning model;
And
Fig. 4 show another exemplary embodiment according to the present invention for selecting the method for the feature of machine learning sample
Flow chart.
Specific embodiment
In order to which those skilled in the art is made to more fully understand the present invention, with reference to the accompanying drawings and detailed description to this hair
Bright exemplary embodiment is described in further detail.
In an exemplary embodiment of the present invention, screening machine learning sample feature is carried out in the following manner:By all times
Feature is selected to be divided into multiple subsets, for each candidate feature subset, using corresponding feature pool machine learning model in difference
Performance difference in test data set determines the importance of wherein each candidate feature, and therefrom selects more important feature
It is used as the target signature of machine learning sample.
Here, machine learning is the inevitable outcome that artificial intelligence study develops to certain phase, is directed to passing through calculating
Means, improve the performance of system itself using experience.In computer systems, " experience " is usually deposited in the form of " data "
By machine learning algorithm, " model " can be being generated from data, that is to say, that machine learning is supplied to calculate empirical data
Method can just be based on these empirical datas and generate model, and when in face of news, model can provide corresponding judgement, i.e. prediction
As a result.Whether training machine learning model or predicted using trained machine learning model, data are required for turning
It is changed to the machine learning sample including various features.Machine learning can be implemented as " supervised learning ", " unsupervised learning " or
The form of " semi-supervised learning ", it should be noted that exemplary embodiment of the present invention is to specific machine learning algorithm and without spy
Definite limitation.Further, it should also be noted that during training and application model, other means such as statistic algorithm are may also be combined with.
Fig. 1 shows the frame of the system of the feature according to an exemplary embodiment of the present invention for being used to select machine learning sample
Figure.Feature selecting system shown in FIG. 1 includes candidate feature subset division device 100, feature pool machine learning model obtains dress
Put 200, candidate feature importance determining device 300 and target signature selection device 400.
Particularly, candidate feature subset division device 100 is used to candidate characteristic set being divided into multiple candidate features
Collection.Here, candidate characteristic set may include at least one candidate feature, which can be the attribute letter for data record
Breath carries out feature obtained from any characteristic processing.Correspondingly, candidate feature subset division device 100 can be according to any appropriate
Mode divides the candidate feature included by candidate characteristic set, to obtain multiple candidate feature subsets.As an example, candidate
Character subset division device 100 can carry out random division to whole candidate features so that each candidate feature is concentrated comprising phase
With the candidate feature of quantity.Alternatively, candidate feature subset division device 100 will can in some aspects have uniformity and/or pass
The candidate feature of connection property is divided into identical candidate feature and concentrates, for example, the candidate feature concentration after division can only include
The identical one group of feature (that is, discrete features or continuous feature etc.) of value type;In another example the candidate feature after division is concentrated
One group of similar feature of business meaning can only be included (for example, the feature on user itself or the feature on trade property
Deng).It is to be understood that exemplary embodiment of the present invention is not intended to limit the specific dividing mode of candidate feature subset.
Here, candidate feature subset division device 100 can receive candidate from the other component in system or from its exterior
Feature set, and the candidate characteristic set of reception is divided.
Alternately, candidate feature subset division device 100 can additionally be responsible for the attribute information based on data record
To generate candidate feature.For this purpose, as an example, candidate feature subset division device 100 can further obtain data record,
In, the data record includes multiple attribute informations.For example, candidate feature subset division device 100 can obtain it is labeled
Historgraphic data recording, for carry out Supervised machine learning.
Data that above-mentioned historgraphic data recording can be the data generated online, previously generate and store can also be logical
Cross input unit or transmission medium and from the data of external reception.These data can relate to the attribute letter of personal, enterprise or tissue
Breath, for example, identity, educational background, occupation, assets, contact method, debt, income, the information such as get a profit, pay taxes.Alternatively, these data
The attribute information of business relevant item is can also refer to, for example, on the turnover of deal contract, both parties, subject matter, transaction
The information such as place.It should be noted that the attribute information content mentioned in exemplary embodiment of the present invention can relate to any object or thing
Performance or property of the business in terms of certain, and be not limited to be defined individual, object, tissue, unit, mechanism, project, event etc.
Or description.
Candidate feature subset division device 100 can obtain structuring or the unstructured data of separate sources, for example, text
Data or numeric data etc..The data record of acquisition can be used to form machine learning sample, participate in the training of machine learning model/
Test process.These data can be derived from inside the entity for it is expected to obtain model prediction result, for example, pre- from it is expected to obtain
Survey bank, enterprise, school of result etc.;These data can be also derived from beyond above-mentioned entity, for example, being provided from data
Business, internet (for example, social network sites), mobile operator, APP operator, express company, credit institution etc..Optionally, it is above-mentioned
Use can be combined in internal data and external data, to form the machine learning sample for carrying more information.
Above-mentioned data can be input to candidate feature subset division device 100 by input unit or by candidate feature
Collection division device 100 automatically generates according to existing data or can divide device 100 from network by candidate feature subset
(for example, storage medium (for example, data warehouse) on network) obtains, in addition, the intermediate data switch of such as server
It can help to candidate feature subset and divide device 100 from the corresponding data of external data source acquisition.Here, the data of acquisition can quilt
The data conversion modules such as the text analysis model in candidate feature subset division device 100 are converted to the form being easily processed.
Here, candidate feature subset division device 100 can be primarily based on multiple attribute information next life of historgraphic data recording
Into candidate feature, in the process, any appropriate characteristic processing mode, which can be used, in candidate feature subset division device 100 comes
To single single order candidate feature or the combination candidate feature of higher order (for example, second order, three ranks etc.), wherein, " rank " represents ginseng
With the single feature number combined.
As an example, can be continuous feature by the candidate feature that candidate feature subset division device 100 generates, wherein,
Candidate feature subset divide device 100 by least one Continuous valued attributes information among the multiple attribute information and/
Or centrifugal pump attribute information is handled and generates the continuous feature.
Particularly, at least a portion attribute information based on historgraphic data recording can generate corresponding continuous feature, this
In, continuously it is characterized in and a kind of opposite feature of discrete features (for example, category feature), value can be had centainly continuously
The numerical value of property, for example, distance, age, amount of money etc..Relatively, as an example, the value of discrete features does not have continuity, example
Such as, can be " from Beijing ", " from Shanghai " or the unordered classification such as " from Tianjin ", " gender is man ", " gender is female "
Feature.
Citing is got on very well, can be by the Continuous valued attributes information of certain in historgraphic data recording directly as corresponding continuous spy
Sign, for example, can will be apart from attribute informations such as, age, the amount of money directly as corresponding continuous feature.That is, continuous feature
It itself can be formed by the Continuous valued attributes information among the multiple attribute information.It alternatively, also can be by historgraphic data recording
In some attribute informations (for example, Continuous valued attributes and/or centrifugal pump attribute information) handled, it is corresponding continuous to obtain
Feature, for example, using height with the ratio of weight as corresponding continuous feature.Particularly, the continuous feature can be by institute
The centrifugal pump attribute information among multiple attribute informations is stated to carry out continuous transformation and formed.As an example, the continuous transformation can
Instruction counts the value of the centrifugal pump attribute information.For example, continuous feature may indicate that some centrifugal pump attribute informations
The statistical information of prediction target on machine learning model.Citing is got on very well, can be by seller in the example of prediction purchase probability
Trade company numbers the probability statistics for the history buying behavior that this centrifugal pump attribute information is transformed on corresponding seller trade company coding
Feature.
It can be combined between continuous feature as described above by modes such as arithmetical operations, using as according to this hair
The combination candidate feature of bright exemplary embodiment.
As another example, the candidate feature generated by candidate feature subset division device 100 can be discrete features,
In, candidate feature subset division device 100 passes through at least one Continuous valued attributes information among the multiple attribute information
And/or centrifugal pump attribute information is handled and generates the discrete features.
Particularly, at least a portion attribute information based on historgraphic data recording can generate corresponding discrete features, lift
Example is got on very well, can be by the centrifugal pump attribute information of certain in historgraphic data recording directly as corresponding discrete features, that is to say, that
Discrete features itself can be formed by the centrifugal pump attribute information among the multiple attribute information.It alternatively, also can be by history
Some attribute informations (for example, Continuous valued attributes and/or centrifugal pump attribute information) in data record are handled, to obtain phase
The discrete features answered.
Here, can by continuous feature (for example, Continuous valued attributes information in itself or by centrifugal pump attribute information through company
The continuous feature for continuing conversion and being formed) discretization is carried out to obtain corresponding discrete features.Preferably, carried out to continuous feature
During discretization, candidate feature subset division device 100 can be directed to each continuous feature, perform at least one branch mailbox computing, with
The discrete features being made of at least one branch mailbox feature are generated, wherein, each branch mailbox computing corresponds to a branch mailbox feature, so as to
It is enough simultaneously obtain it is multiple from different angles, scale/level portray the discrete features of some attributes of original data record.
After candidate feature subset division device 100 marks off each candidate feature subset, further, feature pool machine
Device learning model acquisition device 200 can be directed to each candidate feature subset, obtain corresponding feature pool machine learning model,
In, the feature pool machine learning model corresponds to each described candidate feature subset.
Exemplary embodiment according to the present invention for each candidate feature subset, need to obtain corresponding feature pool machine
Device learning model.Here, feature pool machine learning model acquisition device 200 itself can complete the instruction of feature pool machine learning model
Practice, also can obtain trained feature pool machine learning model from outside.Here, the feature pool machine learning model
Sample characteristics part may include the candidate feature included by each described candidate feature subset.It, can structure accordingly, as example
The training sample of feature pool machine learning model is built, and based on these training samples come training characteristics pond machine learning model.It should
Note that exemplary embodiment of the present invention is not intended to limit the algorithm of feature pool machine learning model.Preferably, different candidate features
The corresponding feature pool machine learning model of subset can be based on identical machine learning model algorithm.
Candidate feature importance determining device 300 be used for according to feature pool machine learning model in original test data collection and
The difference between the effect in test data set is converted to determine each candidate feature among corresponding candidate feature subset
Importance, conversion test data set refer to the original by its importance concentrated to original test data candidate feature to be determined
The data set that beginning value replaces with transformed value and obtains.
Here, candidate feature importance determining device 300 can utilize the feature pool corresponding to each candidate feature subset
Machine learning model determines the importance of each candidate feature in respective subset.As an example, feature pool machine learning mould
The effect of type may include AUC (ROC (Receiver Operating Characteristics, the Receiver Operating of feature pool machine learning model
Characteristic) area under a curve, Area Under ROC Curve) or to rate loss (logistic loss) etc..
As an example it is supposed that some candidate feature subset includes whole candidate feature { f1,f2,…,fnAmong three spies
Levy { f1,f3,f5, correspondingly, AUC of the feature pool machine learning model on original test data collection can reflect characteristic set
{f1,f3,f5Predictive ability.Here, it is definite candidate feature f5Importance, can be by being wrapped to original test data collection
Feature f in each test sample included5Original value handled to obtain conversion test data set, and and then obtain special
Levy AUC of the pond machine learning model in conversion test data set.On this basis, the difference between above-mentioned two AUC can use
In reflection candidate feature f5Importance.As an example, in conversion process, the transformed value may include among following item extremely
One item missing:Zero, random number are taken by its importance for concentrating original test data the original of candidate feature to be determined
The value obtained after value upset order.It that is, can be by the feature f in each original test sample5Original value replace with zero
Value, random number or by by feature f5Original value upset order after obtained value.Here, in definite candidate feature
During the importance of each candidate feature among collection, it is preferable that using identical original test data collection and its each accordingly
A conversion test data set.
Target signature selection device 400 is directed to each candidate feature subset, according to the importance of its each candidate feature
At least one candidate feature of high importance is therefrom filtered out using the target signature as machine learning sample.Here, target
Feature selecting device 400 can be concentrated from each candidate feature and select candidate feature of high importance respectively, using as mesh
Mark feature.For example, target signature selection device 400 can root according to the rule pre-set come carry out target signature selection (example automatically
Such as, each candidate feature concentrates the candidate feature of the selection highest predetermined quantity of importance);Alternatively, target signature selection dress
Put 400 can according to the instruction of user among candidate feature subset selection target feature, for this purpose, can to user show candidate feature
The important ratio of each candidate feature is to situation in subset, and correspondingly, target signature selection device 400 can receive user and therefrom select
Select the instruction of target signature, and according to the instruction of user come selection target feature.
By the above-mentioned means, a part of relatively important target signature can be effectively filtered out among candidate feature.Make
For optional mode, these target signatures can form final machine learning sample individually or with reference to other features.
The feature according to an exemplary embodiment of the present invention for being used to select machine learning sample is described referring to Fig. 2
The flow chart of method.Here, as an example, method shown in Fig. 2 can be as shown in Figure 1 feature selecting system perform, also may be used
It is realized completely by computer program with software mode, side shown in Fig. 2 can be also performed by the computing device of particular configuration
Method.For convenience, it is assumed that the feature selecting system of method shown in Fig. 2 as shown in Figure 1 performs.
With reference to Fig. 2, in the step s 100, by candidate feature subset divide device 100 candidate characteristic set is divided into it is multiple
Candidate feature subset.
As described above, candidate feature subset division device 100 can in any suitable fashion carry out candidate characteristic set
Division.Here, candidate feature subset division device 100 can receive the candidate characteristic set that is provided by its other party and to the candidate of reception
Feature set is divided;Alternatively, candidate feature subset division device 100 can voluntarily generate candidate characteristic set, in this case,
As an example, candidate feature subset division device 100 can be additionally responsible for carrying out the processing such as feature for data record to obtain phase
The candidate characteristic set answered.
Next, in step s 200, each candidate feature is directed to by feature pool machine learning model acquisition device 200
Subset obtains corresponding feature pool machine learning model, wherein, the feature pool machine learning model correspond to it is described each
Candidate feature subset.
As described above, feature pool machine learning model acquisition device 200 itself can complete feature pool machine learning model
Training also can obtain trained feature pool machine learning model from outside.
As an example, training characteristics pond according to an exemplary embodiment of the present invention machine learning is described below in conjunction with Fig. 3
The flow chart of the method for model, this method can be performed by the feature selecting system described in Fig. 1, also can be by other systems or dress
It puts to perform.
With reference to Fig. 3, in step s 110, historgraphic data recording can be obtained, here, candidate feature that can be as shown in Figure 1
Collection division device 100 obtains historgraphic data recording, outside other devices that also can be not shown in the system by Fig. 1 or the system
Other devices in portion obtain historgraphic data recording.As an example, the historgraphic data recording may include to ask on machine learning
The mark of topic and at least one attribute information for generating each feature of machine learning sample.
Here, historgraphic data recording is the true record on the Machine Learning Problems for it is expected prediction, may include attribute
Information and mark two parts, such historgraphic data recording can be used to form machine learning sample, as the material of machine learning,
And exemplary embodiment of the present invention is intended to filter out the comparatively important machine learning sample generated based on attribute information
Feature.
Particularly, as an example, can manually, semi- or fully automated mode gather historical data or right
The original historical data of acquisition is handled so that treated, and historgraphic data recording has appropriate form or form.As
Example can gather historical data in bulk.
Here, the historgraphic data recording that user is manually entered can be received by input unit (for example, work station).In addition,
Can historgraphic data recording from data source systems be taken out by full automatic mode, for example, by with software, firmware, hardware or
It combines the timer mechanism realized and requested historical data is obtained come systematically request data source and from response.The number
It may include one or more databases or other servers according to source.Can be realized via internal network and/or external network it is complete from
The dynamic mode for obtaining data, wherein may include to transmit encrypted data by internet.In server, database, network etc.
It is configured as in the case of communicating with one another, data acquisition can be carried out automatically in the case of no manual intervention, it should be noted that
Certain user still may be present under this mode and input operation.Semiautomatic fashion between manual mode and full-automatic mode it
Between.Semiautomatic fashion and full-automatic mode difference lies in by user activated trigger mechanism instead of such as timer mechanism.
In this case, specific input by user, the just request of generation extraction data is being received.Data are obtained every time
When, it is preferable that it can be by the history data store of capture in the nonvolatile memory.As an example, availability data warehouse comes
It is stored in during obtaining the initial data that gathers and treated data.
The historgraphic data recording of above-mentioned acquisition can derive from identical or different data source, that is to say, that every history number
Can also be the splicing result of different historgraphic data recordings according to record.For example, credit is opened to bank's application except obtaining client
Outside the information data recording (it includes the attribute information fields such as income, educational background, post, Assets) filled in during card, as
Example can also obtain other data records of the client in the bank, for example, loan documentation, current transaction data etc., these are obtained
The data record taken can together on the client whether be cheat client Sign mosaics be complete historgraphic data recording.This
Outside, the data from other privately owned sources or common source can be also obtained, for example, data from metadata provider, deriving from
The data of internet (for example, social network sites), data, the data from APP operators, source from mobile operator
Data in express company, from data of credit institution etc..
Optionally, the data collected can be deposited by hardware cluster (Hadoop clusters, Spark clusters etc.)
Storage and/or processing, for example, storage, classification and other off-line operations.In addition, the data of acquisition can also be carried out at online stream
Reason.
As an example, the unstructured datas such as the text of acquisition can be turned by data conversion modules such as text analysis models
It is changed to and is easier to the structural data used to be further processed or quote subsequently.Text based data may include
Email, document, webpage, figure, spreadsheet, call center's daily record, transaction reporting etc..
Next, in the step s 120, each feature pool machine can be generated based on the attribute information of historgraphic data recording
The training sample of learning model.Here, as described above, the sample characteristics of each feature pool machine learning model may correspond to each
Candidate feature subset part, wherein, candidate feature subset be from whole candidate features form candidate characteristic set division.
As an example, can original candidates feature set be generated based on the attribute information of historgraphic data recording.For example, it can pass through
The attribute information of historgraphic data recording is screened, be grouped or further additional treatments etc. and to obtain corresponding original candidates special
Sign.Exemplary embodiment according to the present invention can generate original candidates feature, example according to any appropriate characteristic processing mode
Such as, it is contemplated that content, meaning, value continuity, value range, valued space scale, Deletional, importance of attribute information etc.
Factor.
In addition, after candidate characteristic set is determined, can be incited somebody to action according to dividing mode according to an exemplary embodiment of the present invention
Candidate characteristic set is divided into each candidate feature subset.After each candidate feature subset is determined, it can be accordingly based on and go through
History data record generates the training sample of each feature pool machine learning model respectively.
It, can be using the training sample of generation come training characteristics pond machine learning model in step S230.Particularly, may be used
It, can base between each feature pool machine learning model according to default machine learning algorithm come training characteristics pond machine learning model
In identical or different model algorithm.Exemplary embodiment according to the present invention, can be by concurrently training multiple feature pool machines
Device learning model obtains feature pool machine learning model corresponding with each candidate feature subset.
The exemplary training method of feature pool machine learning model is enumerated above, however, it should be understood that the example of the present invention
Property embodiment is not limited to above-mentioned example.
Referring back to Fig. 2, after feature pool machine learning model corresponding with each candidate feature subset respectively is obtained,
In step S300, by candidate feature importance determining device 300 according to feature pool machine learning model in original test data collection
Difference between the effect in conversion test data set determines each candidate feature among corresponding candidate feature subset
Importance, wherein, conversion test data set refer to through its importance concentrated to original test data candidate to be determined
The data set that the original value of feature replaces with transformed value and obtains.
Here, for each feature pool machine learning model, candidate feature subset may include at least one candidate feature,
Correspondingly, prediction effect of the feature pool machine learning model on original test data collection can be obtained.In addition, can by according to
The value of each candidate feature on secondary conversion original test data collection is converting to obtain the feature pool machine learning model
Prediction effect in test data set.Difference between above two prediction effect is that can be used to weigh the weight of each candidate feature
The property wanted.
As an example it is supposed that some corresponding candidate feature subset of feature pool machine learning model includes candidate feature { f1,
f2,…,fn, prediction effect of this feature pond machine learning model on original test data collection is represented by AUCall, at this
In example, in order to determine { f1,f2,…,fnAmong any candidate feature fiImportance (wherein, 1≤i≤n), can be corresponding
Ground is handled to obtain for target signature f to original test data collectioniConversion test data set, for example, by original survey
Try the feature f in each test sample of data setiOriginal value replace with other values, for example, zero, random number or
Person is by feature fiValue upset between each test sample order after obtain value.Correspondingly, features described above can be obtained
Test effect AUC of the pond machine learning model in conversion test data seti。
Effect of the feature pool machine learning model on original test data collection and conversion test data set is being obtained respectively
It, can be by difference (that is, the AUC between two effects after fruitall-AUCi) as measurement candidate feature fiImportance reference.
Next, in step S400, each candidate feature subset is directed to by target signature selection device 400, according to
The importance of its each candidate feature therefrom filters out at least one candidate feature of high importance using as machine learning sample
This target signature.
Here, target signature selection device 400 can be directed to each candidate feature subset automatically or according to user's instruction,
Therefrom filter out relatively important target signature.As an example, can feature importance be shown to user by patterned form
Definitive result.For example, the importance of each candidate feature subset can be shown as figure or form, so that user therefrom selects
Target signature.For this purpose, system shown in FIG. 1 may also include input unit (not shown), for sensing user for selection target
Feature etc. and carry out input operation.
As can be seen that exemplary embodiment according to the present invention, can be directed to each candidate feature subset, utilize corresponding spy
Pond machine learning model is levied, effectively filters out relatively important candidate feature.
As an example, the process of above-mentioned screening target signature can be applied to multiple rounds constantly sequentially screens target spy
The situation of sign, for example, constantly filtering out the situation of target signature from original candidate feature concentration.In another example it iteratively gives birth to
Screen into assemblage characteristic and therefrom the situation of important assemblage characteristic.In another example the further iteration from the target signature filtered out
Ground filters out even more important target signature.It will be understood by those skilled in the art that the unlimited fixture of exemplary embodiment of the present invention
The iterative manner of body.
Fig. 4 show another exemplary embodiment according to the present invention for selecting the method for the feature of machine learning sample
Flow chart.This method can be as shown in Figure 1 feature selecting system perform, also can be completely by computer program with software mode
It realizes, method shown in Fig. 4 can be also performed by the computing device of particular configuration.
With reference to Fig. 4, in the step s 100, current candidate characteristic set can be divided into multiple candidate feature subsets.Here,
Current candidate characteristic set can be for the update result after often wheel Feature Selection.
Next, in step s 200, each candidate feature subset can be directed to, obtains corresponding feature pool machine learning
Model, wherein, the sample characteristics of feature pool machine learning model correspond to each described candidate feature subset.
Then, in step S300, can be tested according to feature pool machine learning model in original test data collection and conversion
The difference between effect on data set determines the importance of each candidate feature among corresponding candidate feature subset,
In, conversion test data set refers to take by its importance for concentrating original test data the original of candidate feature to be determined
The data set that value replaces with transformed value and obtains.
In step S400, can be directed to each candidate feature subset, according to its each candidate feature importance therefrom
At least one candidate feature of high importance is filtered out using the target signature as machine learning sample.
In step S500, it may be determined whether needs continue to screen target signature.It here, can be according to default target signature
Termination condition is screened to determine whether also needing to continue the screening of target signature.As an example, target signature screening terminates
Condition can filter out enough target signatures, alternatively, target signature screening termination condition can filter out foot
Enough important target signatures.
As an example it is supposed that needing to continue to screen target signature because the target signature filtered out is insufficient to, can perform
Step S550, wherein, target signature can be removed from candidate characteristic set to update candidate characteristic set, so as to then be based on newer time
Feature set is selected to perform Feature Selection again, the selection until completing all target signatures.Optionally, gone from candidate characteristic set
New candidate feature can be further also added in update candidate characteristic set while except target signature.For example, the new candidate
Feature can be by carrying out combinations of features between candidate feature and newly-generated assemblage characteristic.It here, can be according on group
The search strategy of feature is closed, carrys out assemblage characteristic in each round time generation machine learning sample in iterative fashion using as new
Candidate feature.
As another example, it is assumed that need to continue further to be screened from the target signature that epicycle filters out, can hold
Row step S550, wherein, it can be using the target signature filtered out as newer candidate characteristic set, so as to then be based on newer time
Feature set is selected to perform Feature Selection again, until filtering out target signature important enough.
After step S550, as an example, step S100 can be again returned to divide updated candidate feature subset.
As an example, after target signature is concentrated removal from former candidate feature, can correspondingly be concentrated in each candidate feature
Delete target feature;Alternatively, entirely different mode, which can be used, re-starts updated candidate characteristic set division to obtain
Newer candidate feature subset, for example, included new combination candidate feature is concentrated for updated candidate feature, it can
Ready-portioned each candidate feature is concentrated before only the new combination candidate feature is assigned to, and can also repartition update
Candidate feature subset afterwards.
In step s 200, newer candidate feature subset is may correspond to obtain new feature pool machine learning model.
Next, step S300 and step S400 can be continued to execute, to filter out the target signature of current round.And so on, until
Meet default target signature screening termination condition, be then determined as no longer needing to continue Screening Treatment in step S500,
The method terminates, and the selection result further can be utilized or handled subsequently.
Device illustrated in fig. 1 can be individually configured to perform appointing for the software of specific function, hardware, firmware or above-mentioned item
Meaning combination.For example, these devices or unit may correspond to dedicated integrated circuit, pure software code is can also correspond to, also
It may correspond to the module that software is combined with hardware.In addition, the one or more functions that these devices are realized also can be by physics
Component in entity device (for example, processor, client or server etc.) is sought unity of action.
It is described above by reference to Fig. 1 to Fig. 4 according to an exemplary embodiment of the present invention for selecting machine learning sample
The method and system of feature.It is to be understood that the above method can be realized by the program being recorded in computer-readable media, for example,
Exemplary embodiment according to the present invention, it is possible to provide it is a kind of for select machine learning sample feature computer-readable Jie
Matter, wherein, record is useful for performing the computer program of following methods step on the computer-readable medium:(A) will wait
Feature set is selected to be divided into multiple candidate feature subsets;(B) for each candidate feature subset, corresponding feature pool machine is obtained
Learning model, wherein, the feature pool machine learning model corresponds to each described candidate feature subset;(C) according to feature
Difference between effect of the pond machine learning model on original test data collection and conversion test data set is corresponding to determine
The importance of each candidate feature among candidate feature subset, wherein, conversion test data set refers to by original test
The data set that the original value of its importance in data set candidate feature to be determined replaces with transformed value and obtains;And
(D) for each candidate feature subset, according to the importance of its each candidate feature therefrom filter out it is of high importance extremely
Lack a candidate feature using the target signature as machine learning sample.
Computer program in above computer readable medium can be in client, host, agent apparatus, server etc.
Run in the environment disposed in computer equipment, it should be noted that the computer program can be additionally used in perform except above-mentioned steps with
Outer additional step or performed when performing above-mentioned steps more specifically handles, these additional steps and is further processed
Content is described referring to figs. 1 to Fig. 4, here in order to avoid repetition will be repeated no longer.
It should be noted that feature selecting system according to an exemplary embodiment of the present invention can be completely dependent on the operation of computer program
To realize corresponding function, i.e. each device is corresponding with each step to the function structure of computer program so that whole system
It is called by special software package (for example, lib storehouses), to realize corresponding function.
On the other hand, each device shown in FIG. 1 can also by hardware, software, firmware, middleware, microcode or its
It is combined to realize.When being realized with software, firmware, middleware or microcode, for performing the program code of corresponding operating
Or code segment can be stored in the computer-readable medium of such as storage medium so that processor can be by reading and running
Corresponding program code or code segment perform corresponding operation.
For example, exemplary embodiment of the present invention is also implemented as computing device, which includes storage unit
And processor, set of computer-executable instructions conjunction is stored in storage unit, when the set of computer-executable instructions is closed by institute
When stating processor execution, feature selection approach is performed.
Particularly, the computing device can be deployed in server or client, can also be deployed in distributed network
On node apparatus in network environment.In addition, the computing device can be PC computers, board device, personal digital assistant, intelligence
Can mobile phone, web applications or other be able to carry out the device of above-metioned instruction set.
Here, the computing device is not necessarily single computing device, can also be it is any can be alone or in combination
Perform the device of above-metioned instruction (or instruction set) or the aggregate of circuit.Computing device can also be integrated control system or system
A part for manager can be configured as with Local or Remote (for example, via wireless transmission) with the portable of interface inter-link
Formula electronic device.
In the computing device, processor may include central processing unit (CPU), graphics processor (GPU), may be programmed and patrol
Collect device, dedicated processor systems, microcontroller or microprocessor.As an example, not a limit, processor may also include simulation
Processor, digital processing unit, microprocessor, polycaryon processor, processor array, network processing unit etc..
Some operations in feature selection approach according to an exemplary embodiment of the present invention can be realized by software mode,
Some operations can be realized by hardware mode, in addition, can also realize these operations by way of software and hardware combining.
Processor can run the instruction being stored in one of storage unit or code, wherein, the storage unit can be with
Store data.Instruction and data can be also sent and received via Network Interface Unit and by network, wherein, the network connects
Any of transport protocol can be used in mouth device.
Storage unit can be integral to the processor and be integrated, for example, RAM or flash memory are arranged in integrated circuit microprocessor etc.
Within.In addition, storage unit may include independent device, such as, external dish driving, storage array or any Database Systems can
Other storage devices used.Storage unit and processor can operationally be coupled or can for example by I/O ports,
Network connection etc. communicates so that processor can read the file being stored in storage unit.
In addition, the computing device may also include video display (such as, liquid crystal display) and user interaction interface is (all
Such as, keyboard, mouse, touch input device etc.).The all components of computing device can be connected to each other via bus and/or network.
Operation involved by feature selection approach according to an exemplary embodiment of the present invention can be described as it is various interconnection or
The functional block of coupling or function diagram.However, these functional blocks or function diagram can be equably integrated into single logic dress
It puts or is operated according to non-exact border.
For example, the as described above, computing device for being used to select the feature of machine learning sample according to embodiments of the present invention
It may include storage unit and processor, wherein, set of computer-executable instructions conjunction is stored in storage unit, when the computer
When executable instruction set is performed by the processor, following step is performed:(A) candidate characteristic set is divided into multiple candidates spies
Levy subset;(B) for each candidate feature subset, corresponding feature pool machine learning model is obtained, wherein, the feature pool
Machine learning model corresponds to each described candidate feature subset;(C) according to feature pool machine learning model in original test
The difference between effect on data set and conversion test data set determines each time among corresponding candidate feature subset
The importance of feature is selected, wherein, conversion test data set refers to be determined by its importance concentrated to original test data
Candidate feature the original value data set that replaces with transformed value and obtain;And (D) is directed to each candidate feature subset,
Importance according to its each candidate feature therefrom filters out at least one candidate feature of high importance using as engineering
Practise the target signature of sample.
The foregoing describe each exemplary embodiments of the present invention, it should be appreciated that foregoing description is only exemplary, and exhaustive
Property, the invention is not restricted to disclosed each exemplary embodiments.Without departing from the scope and spirit of the invention, originally
Many modifications and changes will be apparent from for field technology personnel.Therefore, protection scope of the present invention should be with right
It is required that scope subject to.
Claims (10)
1. it is a kind of for selecting the method for the feature of machine learning sample, including:
(A) candidate characteristic set is divided into multiple candidate feature subsets;
(B) for each candidate feature subset, corresponding feature pool machine learning model is obtained, wherein, the feature pool machine
Device learning model corresponds to each described candidate feature subset;
(C) according to feature pool machine learning model original test data collection and conversion test data set on effect between difference
It is different to determine the importance of each candidate feature among corresponding candidate feature subset, wherein, conversion test data set refers to
Transformed value is replaced with by the original value of its importance concentrated to original test data candidate feature to be determined to obtain
Data set;And
(D) for each candidate feature subset, it is higher therefrom to filter out importance according to the importance of its each candidate feature
At least one candidate feature using the target signature as machine learning sample.
2. the method for claim 1, wherein the transformed value includes at least one among following item:It is zero, random
It is obtained after numerical value, the original value upset order by its importance for concentrating original test data candidate feature to be determined
Value.
3. the method as described in claim 1 further includes:
(E) target signature is removed from candidate characteristic set to update candidate characteristic set;
Also, after step (E), the method is performed since step (A) based on newer candidate characteristic set again, directly
To the selection for completing all target signatures.
4. method as claimed in claim 3, wherein, in step (E), while target signature is removed from candidate characteristic set
New candidate feature is additionally added to update candidate characteristic set.
5. method as claimed in claim 4, wherein, the new candidate feature is by carrying out feature between candidate feature
It combines and newly-generated assemblage characteristic.
6. the method for claim 1, wherein in step (B), by concurrently training multiple feature pool machine learning
Model obtains feature pool machine learning model corresponding with each candidate feature subset.
7. the method as described in claim 1 further includes:
(E) using the target signature filtered out as newer candidate characteristic set;
Also, after step (E), the method is performed since step (A) based on newer candidate characteristic set again, directly
Termination condition is screened to default target signature is met.
8. it is a kind of for selecting the system of the feature of machine learning sample, including:
Candidate feature subset divides device, for candidate characteristic set to be divided into multiple candidate feature subsets;
Feature pool machine learning model acquisition device for being directed to each candidate feature subset, obtains corresponding feature pool machine
Device learning model, wherein, the feature pool machine learning model corresponds to each described candidate feature subset;
Candidate feature importance determining device, for being surveyed according to feature pool machine learning model in original test data collection and conversion
The difference between the effect on data set is tried to determine the importance of each candidate feature among corresponding candidate feature subset,
Wherein, conversion test data set refers to through the original of its importance concentrated to original test data candidate feature to be determined
The data set that value replaces with transformed value and obtains;And
Target signature selection device, for being directed to each candidate feature subset, according to its each candidate feature importance from
In filter out at least one candidate feature of high importance using the target signature as machine learning sample.
9. it is a kind of for selecting the computer-readable medium of the feature of machine learning sample, wherein, in computer-readable Jie
Record is useful for performing in matter described is used for selecting the method for feature of machine learning sample claim 1 to 7 is any
Computer program.
10. it is a kind of for selecting the computing device of the feature of machine learning sample, including storage unit and processor, wherein, it deposits
Set of computer-executable instructions conjunction is stored in storage component, closes when the set of computer-executable instructions and is performed by the processor
When, perform the method for being used to select the feature of machine learning sample as described in claim 1 to 7 is any.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310907091.6A CN116882520A (en) | 2017-12-20 | 2017-12-20 | Prediction method and system for predetermined prediction problem |
CN201711383339.4A CN108108820A (en) | 2017-12-20 | 2017-12-20 | For selecting the method and system of the feature of machine learning sample |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711383339.4A CN108108820A (en) | 2017-12-20 | 2017-12-20 | For selecting the method and system of the feature of machine learning sample |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310907091.6A Division CN116882520A (en) | 2017-12-20 | 2017-12-20 | Prediction method and system for predetermined prediction problem |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108108820A true CN108108820A (en) | 2018-06-01 |
Family
ID=62211434
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310907091.6A Pending CN116882520A (en) | 2017-12-20 | 2017-12-20 | Prediction method and system for predetermined prediction problem |
CN201711383339.4A Pending CN108108820A (en) | 2017-12-20 | 2017-12-20 | For selecting the method and system of the feature of machine learning sample |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310907091.6A Pending CN116882520A (en) | 2017-12-20 | 2017-12-20 | Prediction method and system for predetermined prediction problem |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN116882520A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110689133A (en) * | 2018-06-20 | 2020-01-14 | 深信服科技股份有限公司 | Method, system and related device for training machine learning engine |
CN111079939A (en) * | 2019-11-28 | 2020-04-28 | 支付宝(杭州)信息技术有限公司 | Machine learning model feature screening method and device based on data privacy protection |
CN111310930A (en) * | 2018-12-11 | 2020-06-19 | 富士通株式会社 | Optimization device, optimization method, and non-transitory computer-readable storage medium |
CN112348043A (en) * | 2019-08-09 | 2021-02-09 | 杭州海康机器人技术有限公司 | Feature screening method and device in machine learning |
CN113191824A (en) * | 2021-05-24 | 2021-07-30 | 北京大米科技有限公司 | Data processing method and device, electronic equipment and readable storage medium |
CN114268625A (en) * | 2020-09-14 | 2022-04-01 | 腾讯科技(深圳)有限公司 | Feature selection method, device, equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169573A (en) * | 2017-05-05 | 2017-09-15 | 第四范式(北京)技术有限公司 | Using composite machine learning model come the method and system of perform prediction |
CN107316082A (en) * | 2017-06-15 | 2017-11-03 | 第四范式(北京)技术有限公司 | For the method and system for the feature importance for determining machine learning sample |
-
2017
- 2017-12-20 CN CN202310907091.6A patent/CN116882520A/en active Pending
- 2017-12-20 CN CN201711383339.4A patent/CN108108820A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169573A (en) * | 2017-05-05 | 2017-09-15 | 第四范式(北京)技术有限公司 | Using composite machine learning model come the method and system of perform prediction |
CN107316082A (en) * | 2017-06-15 | 2017-11-03 | 第四范式(北京)技术有限公司 | For the method and system for the feature importance for determining machine learning sample |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110689133A (en) * | 2018-06-20 | 2020-01-14 | 深信服科技股份有限公司 | Method, system and related device for training machine learning engine |
CN110689133B (en) * | 2018-06-20 | 2023-09-05 | 深信服科技股份有限公司 | Method, system and related device for training machine learning engine |
CN111310930A (en) * | 2018-12-11 | 2020-06-19 | 富士通株式会社 | Optimization device, optimization method, and non-transitory computer-readable storage medium |
CN111310930B (en) * | 2018-12-11 | 2023-07-21 | 富士通株式会社 | Optimizing apparatus, optimizing method, and non-transitory computer-readable storage medium |
CN112348043A (en) * | 2019-08-09 | 2021-02-09 | 杭州海康机器人技术有限公司 | Feature screening method and device in machine learning |
CN112348043B (en) * | 2019-08-09 | 2024-04-02 | 杭州海康机器人股份有限公司 | Feature screening method and device in machine learning |
CN111079939A (en) * | 2019-11-28 | 2020-04-28 | 支付宝(杭州)信息技术有限公司 | Machine learning model feature screening method and device based on data privacy protection |
CN114268625A (en) * | 2020-09-14 | 2022-04-01 | 腾讯科技(深圳)有限公司 | Feature selection method, device, equipment and storage medium |
CN114268625B (en) * | 2020-09-14 | 2024-01-02 | 腾讯科技(深圳)有限公司 | Feature selection method, device, equipment and storage medium |
CN113191824A (en) * | 2021-05-24 | 2021-07-30 | 北京大米科技有限公司 | Data processing method and device, electronic equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116882520A (en) | 2023-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108090570A (en) | For selecting the method and system of the feature of machine learning sample | |
CN108021984A (en) | Determine the method and system of the feature importance of machine learning sample | |
CN108108820A (en) | For selecting the method and system of the feature of machine learning sample | |
CN107704871A (en) | Generate the method and system of the assemblage characteristic of machine learning sample | |
CN106779088B (en) | Execute the method and system of machine learning process | |
JP6541868B2 (en) | Condition-Satisfied Likelihood Prediction Using Recursive Neural Networks | |
CN107392319A (en) | Generate the method and system of the assemblage characteristic of machine learning sample | |
CN105229633B (en) | It is uploaded for realizing data, system, method and apparatus disclosed in processing and predicted query API | |
CN107729915A (en) | For the method and system for the key character for determining machine learning sample | |
US10083263B2 (en) | Automatic modeling farmer | |
US11663839B1 (en) | Polarity semantics engine analytics platform | |
CN107679549A (en) | Generate the method and system of the assemblage characteristic of machine learning sample | |
CN107871166A (en) | For the characteristic processing method and characteristics processing system of machine learning | |
US20200159690A1 (en) | Applying scoring systems using an auto-machine learning classification approach | |
CN107909087A (en) | Generate the method and system of the assemblage characteristic of machine learning sample | |
CN107316082A (en) | For the method and system for the feature importance for determining machine learning sample | |
CN109242040A (en) | Automatically generate the method and system of assemblage characteristic | |
CN107578140A (en) | Guide analysis system and method | |
CN107169574A (en) | Using nested machine learning model come the method and system of perform prediction | |
US11461343B1 (en) | Prescriptive analytics platform and polarity analysis engine | |
CN107273979A (en) | The method and system of machine learning prediction are performed based on service class | |
CN113609193A (en) | Method and device for training prediction model for predicting customer transaction behavior | |
CN115345530A (en) | Market address recommendation method, device and equipment and computer readable storage medium | |
US11295325B2 (en) | Benefit surrender prediction | |
CN113569162A (en) | Data processing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180601 |