CN109472318A - For the method and device of the machine learning model selected characteristic of building - Google Patents

For the method and device of the machine learning model selected characteristic of building Download PDF

Info

Publication number
CN109472318A
CN109472318A CN201811427683.3A CN201811427683A CN109472318A CN 109472318 A CN109472318 A CN 109472318A CN 201811427683 A CN201811427683 A CN 201811427683A CN 109472318 A CN109472318 A CN 109472318A
Authority
CN
China
Prior art keywords
different degree
feature
machine learning
learning model
fisrt feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811427683.3A
Other languages
Chinese (zh)
Other versions
CN109472318B (en
Inventor
易灿
许辽萨
王维强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201811427683.3A priority Critical patent/CN109472318B/en
Publication of CN109472318A publication Critical patent/CN109472318A/en
Application granted granted Critical
Publication of CN109472318B publication Critical patent/CN109472318B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

This specification embodiment provides a kind of method and apparatus of machine learning model selected characteristic for building, according to this method embodiment, m sample data pair is obtained first, then to m sample data to random perturbation is carried out, to analyze the different degree of feature.Specifically, on the one hand, using m sample data to training machine learning model, to obtain the first different degree of fisrt feature by trained machine learning model;On the other hand, the sample label of m sample data pair is exchanged at random, and using the m sample data after exchanging sample label at random to training machine learning model, to obtain the second different degree of fisrt feature by trained machine learning model.Further, the first different degree of each feature and the second different degree are compared, is that constructed machine learning model selects feature according to comparing result.The validity of feature selecting can be improved in the embodiment.

Description

For the method and device of the machine learning model selected characteristic of building
Technical field
This specification one or more embodiment is related to field of computer technology, more particularly to by computer is building The method and apparatus of machine learning model selected characteristic.
Background technique
In order to construct the machine learning model of a best performance, it usually needs manually according to business experience and to data Understanding, select the feature (also referred to as variable) of many dimensions.It, may be for wanting structure if the feature of this process choosing is improper The machine learning model built is not worth too much, or even can generate acting in opposition.Therefore in the process of building machine learning model In, it needs constantly to go to test, Feature Selection is carried out, to construct more preferably machine learning model.And for running one on line The model of section time, it is also possible to because there is new element to be added, and predict that model can not correctly, model is caused to move back Change.At this time, it is also desirable to select new feature re -training model.
For the process of features described above screening, when being carried out by artificial mode, usual unusual labor intensive, and can drag The speed of slow model construction, therefore the mode for generalling use automation carries out.In traditional technology, usually pass through feature AUC value (area under ROC curve, that is, feature quality arranges probability), selects feature according to ratio.This Feature Selection mode is being sieved When selecting feature, need to calculate AUC one by one to single feature by single feature AUC evaluation module, if feature quantity is relatively more, This Feature Selection mode timeliness is lower.Meanwhile this Feature Selection mode scope of application has certain limitation, such as can not Feature AUC value calculating etc. is carried out to regression problem.Accordingly, it is desirable to provide can be improved the scheme of the validity of Feature Selection.
Summary of the invention
This specification one or more embodiment describes a kind of method and apparatus for machine learning model selected characteristic, By the method for random perturbation variable, the different degree of feature is assessed, and feature is selected based on characteristic importance, feature can be improved The validity of screening.
According in a first aspect, providing a kind of method of machine learning model selected characteristic for building, comprising: obtain m A sample data pair, each sample data for the n features to be selected that respective sample is extracted, and are directed to the sample to including The sample label marked in advance, wherein the n features to be selected include at least fisrt feature;Using m sample data to training The machine learning model, to obtain the first important of the fisrt feature by the trained machine learning model Degree;Sample label is exchanged to random to the m sample data, and utilizes the m sample after exchanging sample label at random Data are the training machine learning model, to obtain the fisrt feature by the trained machine learning model Second different degree;At least comparison based on first different degree and second different degree, it is determined whether special by described first Sign is selected as the feature of the machine learning model.
In one embodiment, described to utilize m sample data the training machine learning model, by by instruction It includes: to the m sample data to progress that the experienced machine learning model, which obtains the first different degree of the fisrt feature, k1It is secondary randomly ordered, and after each time randomly ordered, it is utilized respectively by m randomly ordered sample data described in training Machine learning model, to respectively obtain the k of the fisrt feature by the machine learning model by each training1A One evaluation score;Based on the k1A first evaluation score determines first different degree.
In a further embodiment, described to be based on the k1A first evaluation score determines the first different degree packet It includes: by the k1The average value of a first evaluation score is determined as first different degree.
In another embodiment, described that sample label exchanged to random to the m sample data, and using by with M sample data after machine exchange sample label is the training machine learning model, to pass through the trained machine The second different degree that learning model obtains the fisrt feature includes: to exchange sample label to random to the m sample data k2It is secondary, and after each secondary changing label, the m sample data after exchanging sample label at random is utilized respectively to training institute Machine learning model is stated, to respectively obtain the k of the fisrt feature by the machine learning model by each training2It is a Second evaluation score;Based on the k2A second evaluation score determines second different degree.
In a further embodiment, described to be based on the k2A second evaluation score determines the second different degree packet It includes: by the k2The average value of a first evaluation score is determined as second different degree.
In one embodiment, at least comparison based on first different degree and second different degree determines It whether include: to subtract described in second different degree by the feature that the fisrt feature is selected as the machine learning model In the case that the difference of one different degree is more than first threshold, the fisrt feature is rejected to the feature of the machine learning model Except.
In another embodiment, at least comparison based on first different degree and second different degree, really It is fixed whether by the feature that the fisrt feature is selected as the machine learning model include: subtracted in second different degree it is described The difference of first different degree is more than the k2It is special by described first in the case where twice of the second variance of a second evaluation score Sign is rejected to except the feature of the machine learning model.
In one embodiment, at least comparison based on first different degree and second different degree determines It whether include: to subtract described in second different degree by the feature that the fisrt feature is selected as the machine learning model In the case that the difference of one different degree is less than second threshold, determines and the fisrt feature is selected as the machine learning model Feature.
In another embodiment, at least comparison based on first different degree and second different degree, really It is fixed whether by the feature that the fisrt feature is selected as the machine learning model include: subtracted in second different degree it is described The difference of first different degree is less than the k2In the case where the second variance of a second evaluation score, determine the fisrt feature It is selected as the feature of the machine learning model.
In one embodiment, described to be at least based on first different degree and second different degree, it is determined whether will The feature that the fisrt feature is selected as the machine learning model includes: based on first different degree and described second important Degree determines the synthesis different degree of the fisrt feature;Determine whether for the fisrt feature to be selected as according to the comprehensive different degree The feature of the machine learning model.
In one embodiment, the synthesis different degree of the fisrt feature is based on first different degree and described second The overall target for the fisrt feature that different degree is integrated determines that the overall target of the fisrt feature is, described The difference of second different degree and first different degree is plus obtained by the ratio of second different degree and first different degree The sum arrived.
In one further embodiment, the synthesis different degree of the fisrt feature is the synthesis of the fisrt feature The ratio of greatest measure in the corresponding n overall target of index and n features to be selected.
In another further embodiment, the synthesis different degree of the fisrt feature is, the fisrt feature it is comprehensive Close the ratio of the greatest measure in each overall target of index and similar feature to be selected, wherein the similar feature to be selected is In the n features to be selected, the feature of identical predetermined condition is met with the fisrt feature.
According to second aspect, a kind of device of machine learning model selected characteristic for building is provided, comprising:
Acquiring unit is configured to obtain m sample data pair, and each sample data is to including, for respective sample extraction N features to be selected, and for the sample label that the sample marks in advance, wherein the n features to be selected include at least the One feature;
First determination unit is configured to using m sample data the training machine learning model, by by instruction The experienced machine learning model obtains the first different degree of the fisrt feature;
Second determination unit is configured to exchange sample label to random to the m sample data, and using by random M sample data after exchanging sample label is the training machine learning model, to pass through the trained engineering It practises model and obtains the second different degree of the fisrt feature;
Selecting unit is configured at least comparison based on first different degree and second different degree, it is determined whether The fisrt feature is selected as to the feature of the machine learning model.
According to the third aspect, a kind of computer readable storage medium is provided, computer program is stored thereon with, when described When computer program executes in a computer, enable computer execute first aspect method.
According to fourth aspect, a kind of calculating equipment, including memory and processor are provided, which is characterized in that described to deposit It is stored with executable code in reservoir, when the processor executes the executable code, the method for realizing first aspect.
There is provided by this specification embodiment be building machine learning model selected characteristic method and apparatus, first M sample data pair is obtained, then to m sample data to random perturbation is carried out, to analyze the different degree of feature.Specifically, On the one hand, using m sample data to training machine learning model, to obtain first by trained machine learning model First different degree of feature;On the other hand, the sample label of m sample data pair is exchanged at random, and using by with M sample data after machine exchange sample label is to training machine learning model, to pass through trained machine learning model Obtain the second different degree of fisrt feature.Further, the first different degree of each feature and the second different degree are compared, It is that constructed machine learning model selects feature according to comparing result, so as to improve machine learning model feature selecting Validity.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment Attached drawing be briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill of field, without creative efforts, it can also be obtained according to these attached drawings others Attached drawing.
Fig. 1 is the application scenarios schematic diagram of one embodiment that this specification discloses;
Fig. 2 shows the flow charts according to the method for the machine learning model selected characteristic that one embodiment is building;
Fig. 3 shows the specific example analyzed the first different degree of each feature and the second different degree;
Fig. 4 shows the schematic frame of the device according to the machine learning model selected characteristic that one embodiment is building Figure.
Specific embodiment
With reference to the accompanying drawing, the scheme provided this specification is described.
Fig. 1 is an application scenarios schematic diagram of this specification embodiment.It mainly include data mould in the application scenarios Block, feature selection module and model training module.It is that the machine learning model of building selects the side of feature that this specification, which provides, Method is primarily adapted for use in the feature selection module in Fig. 1.Data module, feature selection module and model training module can be set to same One computing platform (such as same server or server cluster) can also be set to different computing platforms, it is not limited here.Its In, data module for example may include various storage mediums, for storing training dataset.
The basic conception of this specification embodiment is established on the basis of random perturbation.If being appreciated that a feature It is important, if that sample label is disturbed, then its different degree may reduce.For example, distinguishing old man and children In the two classifications, a feature is " age ", it is assumed that the value of sample A this feature is 85 years old, and sample label is " old man ", sample The value of B this feature is 5 years old, and sample label is " children " ..., and obviously, feature " age " is extremely important.When label is disturbed, Assuming that the sample label of sample A becomes " children ", the sample label of sample B is still " children ", etc..Then " age " this feature Effectively two classifications of old man and children cannot be distinguished.That is, under the different degree of " age " this feature Drop, and such as living habit, behavioural habits, hair color etc, the different degree of originally not too important feature are possible in meeting It rises.
Therefore, the variation of characteristic importance can be analyzed, so that it is determined that going out spy by carrying out random perturbation to sample label Sign is an important feature.Wherein, the different degree of feature is understood that the significance level being characterized.For example, linear model In weight, in Tree-structure Model feature (corresponding leaf node) to differentiation degree of inhomogeneity ratio etc..For tree construction For model, during training pattern, by model training process, model itself can also give a mark to each feature, with Characterize characteristic importance of each feature in the volume machine learning model currently constructed.Such as LightGBM (lightweight gradient Elevator), Xgboost (eXtreme Gradient Boosting, limit gradient lift scheme), Randomforest it is (random Forest) etc. machine learning model the different degree of each feature can be exported during training pattern.
So, on the one hand, can be with the sample data that training data is concentrated to machine learning model constructed by training, really First different degree of fixed each feature, on the other hand, by carrying out random perturbation to sample label, and with disturb sample label it The machine learning model constructed to training again of sample data afterwards, obtains the second different degree of each feature.Later, for Each feature can learn mould by the second different degree for carrying out random perturbation to sample label and using normal sample training machine The first different degree that type obtains compares, and can select more important feature for machine learning model.
Specifically, as shown in Figure 1, can store training dataset in data module in above-mentioned application scenarios, the data Collection includes at least m training sample.Feature selection module can obtain the sample data of m training sample from data module It is right.Each sample data is marked for the n features to be selected that respective sample is extracted, and for the sample in advance to may include The sample of note.For example, sample 1, sample 2 ... sample m, corresponding sample data to [n feature, label 1], [n feature, Label 2] ... sample data pair as [n feature, label m].As shown in table 1, giving n features to be selected includes feature In the case where 1 to feature n, the signal of the corresponding sample data pair of each sample.
The corresponding sample data of each sample of table 1 is to signal
Sample 1 X11、X21……Xn1 Y1
Sample 2 X12、X22……Xn2 Y2
…… …… ……
Sample m X1m、X2m……Xnm Ym
In table 1, n feature is respectively X1、X2……Xn, label indicates with Y.The feature of each sample and label by with Sample consistent label indicates, if the feature of sample 1 adds suffix 1 in subscript, is expressed as X11、X21……Xn1, tag representation For Y1.It is worth noting that being intended merely to the convenience of description herein, label itself is not distinguished.For example, Y1And YmAll It can be " apple ", Y2And Y1It can be " pears ", etc..When the machine learning constructed to training by this m sample data When model, the first different degree of available each feature.
On the other hand, if the sample label to sample data centering carries out random perturbation, i.e., sample label is exchanged at random, The available m sample data pair as shown in the following table 2, table 3.Table 2 and table 3 have been used twice disturbs sample label at random M sample data after dynamic is to signal.
Table 2 illustrates a random perturbation of sample label
Sample 1 X11、X21……Xn1 Y2
Sample 2 X12、X22……Xn2 Ym
…… …… ……
Sample m X1m、X2m……Xnm Y1
Table 3 illustrates another secondary random perturbation of sample label
It can be seen that the random perturbation to sample label from table 2 and table 3, change sample corresponding sample label originally Serial number.At this point, meaning representated by the corresponding sample label of sample may change, it is also possible to not change.For example, table The original label of sample 1 is Y in 21" apple " can become label Y after carrying out random perturbation2" pears ";In table 3, sample 1 is original Label be Y1" apple " can become label Y after carrying out random perturbationm, it is still " pears ", etc..For a sample mark Sign exchange at random as a result, still there is m sample data pair.It is also possible to constructed to training using these sample datas And machine learning model, obtain the second different degree of each feature.
If being appreciated that a feature after sample label is by random perturbation, different degree significantly reduces, then the spy Sign is likely to an effective feature.Therefore, it is possible to further important according to the first different degree of each feature and second The comparison of degree determines whether the feature for being selected as above-mentioned machine learning model.Later, each feature selected can be used for Model training module carries out machine learning model training.The process of Feature Selection is detailed below.
Fig. 2 shows the method flow diagrams according to the machine learning model selected characteristic that one embodiment is building.The party The executing subject of method can be any with calculating, the system of processing capacity, unit, platform or server.
If Fig. 2 shows, method includes the following steps: step 21, obtains m sample data pair, each sample data is to packet It includes, for the n features to be selected that respective sample is extracted, and for the sample label that the sample marks in advance;Step 22, it utilizes M sample data is to training machine learning model, to obtain the first of each feature by trained machine learning model Different degree;Step 23, sample label is exchanged to random to m sample data, and utilizes the m after exchanging sample label at random A sample data is to training machine learning model, to obtain the second weight of each feature by trained machine learning model It spends;Step 24, at least comparison of the first different degree and the second different degree based on each feature, it is determined whether by individual features It is selected as the feature of machine learning model.
Firstly, obtaining m sample data pair in step 21.Here, each sample data is to may include, for phase N features to be selected of sample extraction are answered, and for the sample label that the sample marks in advance.
Wherein, m sample data to can from training sample concentrate obtain.Training dataset at least may include for instructing Practice the training sample not less than m (such as 2m) of machine learning model.M can be the positive integer with certain magnitude, with Meet the needs of training machine learning model, such as 1000.In some cases, training dataset can also include test specimens This, for testing by the way that whether the machine learning model of training sample training meets use condition.For convenience, this explanation Training data, which is concentrated, in book includes at least (or can extract) m sample.
By taking the machine learning model of building is credit air control model as an example, the sample that training data is concentrated may include a plurality of The credit of user records, such as the debt-credit of multiple users, record of refunding.At this point, a user can be used as a sample.Often A sample can also respectively correspond sample label, for example, the sample label of user A corresponding " keep one's word user ".Sample label can It is labeled with first passing through artificial or other reliable fashions in advance, it is not limited here.It can be concentrated from training data and obtain wherein m The corresponding m sample data pair of a sample.For each sample in m sample, n features to be selected can be extracted respectively, and It is obtained by the label marked in advance, forms sample data pair, as shown in table 1.Sample data in table 1 is to can also indicate At the form of readily comprehensible [feature, label].Wherein in table 1, feature includes X1To Xn, label indicates with Y.
Then, in step 22, it can use m sample data to training machine learning model, by by training Machine learning model obtain the first different degree of each feature.For convenience, this specification assume that the n is a to be selected Feature includes at least fisrt feature.Wherein, fisrt feature can be any one feature in n features to be selected.As previously mentioned, Using each sample data of m in the process, the different degree of feature can be determined by machine learning model, details are not described herein.
It is appreciated that the input sequence of sample is different, the model parameter of the model trained during model training Understand difference.In order to enable the result of the first different degree is more acurrate, it according to one embodiment, can be to the row of m sample Column sequence carries out random perturbation, based on integrating to the characteristic importance obtained in the case of each disturbance, obtains each feature The first different degree.Here, for the convenience distinguished and described, the characteristic importance obtained in the case of each disturbance can be claimed For each first evaluation score.It specifically, can be to the above m sample data to progress k1Secondary random perturbation, and detect respectively Every time under disturbance, each first evaluation score of each feature.It is worth noting that the random perturbation of m sample data pair, It can be appreciated that the randomly ordered of m sample data pair.It is rear m sample randomly ordered twice as shown in table 4 and table 5 The signal that puts in order of data pair.
Primary randomly ordered signal of the table 4 to each sample data pair
Sample 2 X12、X22……Xn2 Y2
Sample m X1m、X2m……Xnm Ym
…… …… ……
Sample 1 X11、X21……Xn1 Y1
Another randomly ordered signal of the table 5 to each sample data pair
Sample m X1m、X2m……Xnm Ym
Sample 1 X11、X21……Xn1 Y1
…… …… ……
Sample 2 X12、X22……Xn2 Y2
It can be seen that the random perturbation to sample data pair from table 4 and table 5, only change sample data to whole row Column sequence, does not change sample data to itself.
It is appreciated that sample input sequence is different, the model trained may also have certain difference, then what is obtained is same First evaluation score of feature is also different.With fisrt feature (such as X1) for, by m sample data to carry out k1It is secondary random Sequence, and after each time randomly ordered, it is utilized respectively and mould is learnt to training machine by m randomly ordered sample data Type, available k1A first evaluation score.
Assuming that it is randomly ordered for the first time as shown in table 1, then when training constructed machine learning model for the first time, by m The corresponding sample data of sample sequentially inputs machine learning model to according to sample 1, sample 2 ... sample m, in training pattern In the process, the respective different degree of n feature, i.e. the first evaluation score are determined.
Assuming that second of randomly ordered such as table 2 shows, then when second of constructed machine learning model of training, by m sample This corresponding sample data sequentially inputs machine learning model to according to sample 2, sample m ... sample 1, in training pattern mistake Cheng Zhong determines respective first evaluation score of n feature again.
So analogize, until carrying out k1It is secondary randomly ordered, to each feature (such as fisrt feature) in n feature, divide Not available k1A first evaluation score.As shown in table 6.
Table 6k1First evaluation score of n features to be selected in secondary randomly ordered situation
As can be seen that n features to be selected respectively obtain k in table 61A first evaluation score.
It is then possible to the k based on each feature1A first evaluation score determines the first different degree of each feature.With For one feature, in some embodiments, the first different degree of fisrt feature can be k1The average value of a first evaluation score. Still by taking fisrt feature as an example, in further embodiments, the first different degree of fisrt feature can be k1A first evaluation score Median.First different degree of each feature can also be according to k1A first evaluation score is true by other reasonable manners It is fixed, it is not limited here.In this way, according to above-mentioned each randomly ordered result training machine learning model, available X1To XnIn One the first evaluation score of each feature.To the k of sample data pair1It is secondary randomly ordered, available X1To XnEach of The k of feature1A first evaluation score, and then obtain X1To XnEach of feature the first different degree.
It is appreciated that more important feature, sample input sequence when regardless of training pattern, the first evaluation score It will keep relative stability.However, first evaluation score, which is converged, generates lesser fluctuation since sample input sequence is different.Therefore, By that can be interfered caused by input sequence to avoid sample data, obtain each spy repeatedly to sample data to randomly ordered The first more accurate different degree of sign.
On the other hand, by step 23, sample label is exchanged to random to m sample data, and adjust using by random M sample data after changing sample label is to training machine learning model, to be obtained by trained machine learning model Second different degree of each feature.Wherein, m sample data after exchanging sample label is each to determining as shown in table 2 or table 3 The process of a the second different degree of feature is similar with the process of each the first different degree of feature determining in step 22, no longer superfluous herein It states.
It is appreciated that the variation difference of characteristic importance is also larger for the random perturbation of a sample label.Example Such as, if under a random perturbation, just the sample label of positive negative sample is restored, as although the sample label of sample 1 becomes The sample label of sample 2, but label substance is there is no changing, and is still " apple ", the then different degree of each feature and the One different degree is compared, and may not have what variation.For another example if under a random perturbation, just by the positive sample of half Label has been transposed on original negative sample, and the negative sample label of half becomes positive sample label, such as sample 1 to sample m/2 Label originally is " apple ", and sample 1+m/2 to the original label of sample m is " pears ", and sample 1 is to sample after label is exchanged The sample label of m/4 becomes " pears ", and the sample label of sample 1+m/4 to sample m/2 are still " apple ", and sample 1+m/2 is extremely The sample label of sample 3m/4 becomes " apple ", and the sample label of sample 1+3m/4 to sample m are still " pears ", then some weight Want the different degree of feature (such as Pi Shangyou stain) that may decline very more.
Therefore, according to a possible design, can the sample label to m sample data pair repeatedly disturbed, such as Carry out k1Secondary random perturbation, every progress can re-start a machine learning mould once to the random perturbation of sample label The training of type.The process of each training machine learning model, available X1To XnIn one second of each feature evaluation point Number.Assuming that randomly ordered k2It is secondary, then for X1To XnEach of feature, respectively obtain k2A second evaluation score.With first Similarly, the second evaluation score here is substantially table also for the second different degree is distinguished in statement to evaluation score Show feature in the different degree of certain model training.It is then possible to be based on the corresponding k of each feature to be selected2A second evaluation score Determine its second different degree.Wherein, by taking above-mentioned fisrt feature as an example, the second different degree can be the corresponding k of fisrt feature2A The average value of two evaluation scores is also possible to the k2The median of a second evaluation score, it is not limited here.
In this way, by repeatedly being exchanged at random to the sample label of sample data centering, it can be special to avoid arriving at random It is interfered caused by situation, obtains the second relatively accurate different degree of each feature.
Then, by step 24, at least the comparison of the first different degree and the second different degree based on each feature, determination are It is no by each feature selecting be machine learning model feature.It is appreciated that being directed to each feature, the first different degree is by just What machine learning model constructed by normal sample training determined, the second different degree is true in the case where sample label changes at random Fixed, pass through the first different degree and the second different degree of each feature of comprehensive analysis, it can be estimated that each feature is for building The significance level of machine learning model, so that selecting effective feature from each feature carrys out training machine learning model.
Below by taking fisrt feature as an example, the process that comprehensive analysis is carried out to the first different degree and the second different degree is described.Root Upper analysis accordingly, when label is exchanged at random, the different degree of more important feature may sharp fall, and with focusing on The different degree of feature is wanted to decline, the different degree of not too important feature may rise or remain unchanged.Therefore, in one embodiment In, fisrt feature can be rejected in the case where the difference that the second different degree subtracts the first different degree is more than first threshold Except the feature of machine learning model.Wherein, first threshold can predefine.Such as the first threshold can be 0, then second In the case that different degree is greater than the first different degree, fisrt feature is rejected to except the feature of machine learning model.That is, Do not select fisrt feature as the feature of constructed machine learning model.On the other hand, according to the description of step 23, some In embodiment, the second different degree is to pass through k2What a second evaluation score determined, therefore, in corresponding alternative embodiment, on The k can also be based on by stating first threshold2A second evaluation score is determining, e.g. k2The corresponding second party of a second evaluation score Twice of difference.
Further, if the difference that the second different degree of fisrt feature subtracts the first different degree is less than first threshold, Can the second different degree to fisrt feature and the first different degree further compare, fisrt feature can also be selected as constructed Machine learning model feature.
According to a kind of possible design, second threshold can also be less than in the difference that the second different degree subtracts the first different degree In the case where, determine the feature that fisrt feature is selected as to constructed machine learning model.The second threshold can in advance really It is fixed, the half of opposite number etc. of e.g. the second different degree.On the other hand, according to the description of step 23, in some realities It applies in example, the second different degree is to pass through k2What a second evaluation score determined, it is therefore, above-mentioned in corresponding alternative embodiment Second threshold can also be based on the k2A second evaluation score is determining, e.g. k2The corresponding second variance of a second evaluation score Etc..It wherein, the case where difference for subtracting the first different degree for the second different degree of fisrt feature is greater than second threshold, can be with Directly fisrt feature is excluded except the feature of constructed machine learning model, can also further be judged, herein Without limitation.
It is worth noting that determining whether for fisrt feature to be selected as above by first threshold and second threshold constructed Machine learning model feature during, progress can be combined by the judgement of first threshold and second threshold.At this point, First threshold is greater than second threshold.The difference of the second different degree and the first different degree for fisrt feature is in first threshold and The case where between two threshold values, can determine whether to be selected as according to other rules the feature of constructed machine learning.Such as by n The difference of corresponding second different degree of a feature to be selected and the first different degree is ranked up, and selects the lesser predetermined number of difference Feature.It is appreciated that the different degree sharp fall after label random permutation of the different degree due to important feature, the The difference of two different degrees and the first different degree is negative value, and the difference is smaller, and absolute value is bigger, and the decline of individual features different degree is got over It is more.
According to some optionally implementations, the first different degree passes through k1A first evaluation score is determining, the second different degree Pass through k2A second evaluation score determines, then can also compare by other means to the first different degree and the second different degree Analysis.Such as first different degree be greater than third threshold value, and the second different degree be selected as less than the feature of the 4th threshold value it is constructed The feature of machine learning model, etc..
For convenience, each feature that can also will be selected to be the feature of constructed machine learning model is known as having Imitate feature.
According to alternatively possible design, it is also based on corresponding first different degree of each feature and the second different degree is true Determine the synthesis different degree of each feature, and determines the validity feature of machine learning model according to comprehensive different degree.
In one embodiment, first the first different degree and the second different degree can be integrated to obtain the comprehensive of fisrt feature Index is closed, then determines the synthesis different degree of fisrt feature based on the overall target.In one embodiment, which can be with For the ratio of the second different degree and the first different degree.In another embodiment, which can be, the second different degree with The difference of first different degree plus the ratio of the second different degree and the first different degree it is obtained and, it may be assumed that (the second different degree-the One different degree) the+the second different degree/the first different degree.It is possible to further regard the overall target itself as the comprehensive of fisrt feature Close different degree, can also be further calculated, which is mapped to predetermined interval range, such as [0,1], using as The synthesis different degree of fisrt feature.
In one implementation, the synthesis different degree of fisrt feature can be overall target and the n spy to be selected of fisrt feature Levy the ratio of the greatest measure in corresponding n overall target.In this way, can determine that each feature is opposite in n feature Comprehensive different degree.
In another realization, the synthesis different degree of fisrt feature can be, the overall target of fisrt feature and it is similar to Select the ratio of the greatest measure in each overall target of feature.Wherein, similar feature to be selected is in n features to be selected, with the One feature meets the feature of identical predetermined condition.For example, fisrt feature meets the second mean value and the difference of the first mean value is less than The predetermined condition of first threshold then meets the second mean value and the difference of the first mean value is less than first threshold in n features to be selected Each feature to be selected is similar feature to be selected.In this specification, the similar feature to be selected of fisrt feature includes fisrt feature.Such as This, can determine the opposite synthesis different degree of each feature in different classes of respectively, and the maximum in each classification is comprehensive Different degree is 1.
Referring to FIG. 3, below by way of a specific example, illustrating the implementation of this step to keep above description apparent Process.In the specific example, to each feature in n features to be selected, all have through k in step 221A first evaluation K in the first determining different degree of score, step 232The second different degree that a second evaluation score determines.It can first be directed to first Each the first different degree of feature calculation and the second important and k1The first variance of a first evaluation score, k2A second evaluation point Several second variances.Then the predetermined condition met according to mean value and variance probably classifies n features to be selected.Specifically:
The first kind, meet condition: the difference of the second different degree and the first different degree is greater than or equal to 2 times of second variance;
Second class, meet condition: the difference of the second different degree and the first different degree is greater than or equal to second variance, and small In 2 times of second variance;
Third class, meet condition: the difference of the second different degree and the first different degree is less than second variance.
According to the above grouping: the first category feature after sample label is by random perturbation, different degree fall compared with It is small, or do not decline, it is feature lower for the importance of the machine learning model of building, is invalid feature (reject class); After sample label is by random perturbation, different degree declines by a big margin third category feature, is the machine learning mould for building The important feature of type, for validity feature (keep class);Second category feature is between the first category feature and third category feature Between feature, be intermediate features (tentative class).In general, validity feature has biggish tribute for sample area lease making Ji It offers, however information content may be fewer, if this feature of prediction object is unobvious or lacks, model cannot carry out pre- very well It surveys.Therefore, in this example embodiment, validity feature and intermediate features can be selected as to the spy of the machine learning model of building together Sign.
Further, in practical applications, it may be necessary to the arrangement of more specific characteristic importance is provided, for user's ginseng It examines, or has more selection spaces to feature, as a part can be rejected when feature is more from the intermediate features of the second class Feature.At this point, as shown in figure 3, the overall target of each feature in each classification can also be calculated, and it is true according to overall target Determine characteristic synthetic different degree.One specific comprehensive importance calculation method are as follows: by the overall target of each feature divided by such Maximum overall target in not.In this way, the feature in each class corresponds to the score between a 0-1, for indicating this feature Relative importance in such.If it is desired to reducing the calculation amount of model, continue to reject feature to be selected, then it can be from intermediate spy The comprehensive lower feature of different degree is rejected in sign.
It looks back above procedure and utilizes random perturbation, a side during for constructed machine learning model selected characteristic Face determines the first different degree of each feature by machine learning model constructed by sample normally training, on the other hand, right Sample label is exchanged at random, determines the second different degree of each feature, to pass through pair of the second different degree and the first different degree Than carrying out feature selecting, the validity of Feature Selection can be improved in machine learning model building process.Further, right It can be exchanged, repeatedly, can make first important at random by the sequence to sample data pair in the determination of the first different degree It spends more accurate.In addition, the determination for the second different degree, determined by the way of repeatedly being disturbed to sample label, reduce with Machine disturbs the interference for encountering special circumstances, further increases the accuracy of the second different degree.
According to the embodiment of another aspect, it is also provided as the device of the machine learning model selected characteristic of building.Fig. 4 is shown According to the schematic block diagram of the device for the machine learning model selected characteristic that one embodiment is building.As shown in figure 4, being structure The device 400 for the machine learning model selected characteristic built includes: acquiring unit 41, is configured to obtain m sample data pair, each Sample data is to including, for the n features to be selected that respective sample is extracted, and for the sample mark that the sample marks in advance Label, wherein n features to be selected include at least fisrt feature;First determination unit 42 is configured to using m sample data to training Machine learning model, to obtain the first different degree of fisrt feature by trained machine learning model;Second determines list Member 43 is configured to exchange sample label to random to m sample data, and utilizes m after exchanging sample label at random Sample data is to training machine learning model, to obtain the second important of fisrt feature by trained machine learning model Degree;Selecting unit 44 is configured at least comparison based on the first different degree and the second different degree, it is determined whether select fisrt feature It is selected as the feature of machine learning model.
In embodiment on the one hand, the first determination unit 42 may further be configured that
To m sample data to progress k1It is secondary randomly ordered, and after each time randomly ordered, it is utilized respectively by random M sample data of the sequence machine learning model constructed to training, to pass through the machine learning model by each training Respectively obtain the k of fisrt feature1A first evaluation score;
Based on k1A first evaluation score determines the first different degree.
Further, the first determination unit 42 can be by k1The average value of a first evaluation score is determined as fisrt feature First different degree.
In embodiment on the other hand, the second determination unit 43 may further be configured that
Sample label k is exchanged to random to m sample data2It is secondary, and after each secondary changing label, be utilized respectively by M sample data after random exchange sample label is to machine learning model constructed by training, by training by each time Machine learning model respectively obtain the k of fisrt feature2A second evaluation score;
Based on k2A second evaluation score determines the second different degree of fisrt feature.
Further, the second determination unit 43 can be by k2The average value of a first evaluation score is determined as fisrt feature Second different degree.
According to a possible design, selecting unit 44 may further be configured that
In the case where the difference that the second different degree subtracts the first different degree is more than first threshold, fisrt feature is rejected to Except the feature of constructed machine learning model.
Wherein, in some embodiments, when the second different degree passes through k2When a second evaluation score determines, selecting unit 43 It can be with: being more than k in the difference that the second different degree subtracts the first different degree2Twice of the second variance of a second evaluation score In the case of, fisrt feature is rejected to except the feature of constructed machine learning model.That is, above-mentioned first threshold is k2Twice of the second variance of a second evaluation score.
According to another possible design, selecting unit 44 may further be configured that
In the case where the difference that the second different degree subtracts the first different degree is less than second threshold, fisrt feature is selected in determination It is selected as the feature of constructed machine learning model.
Wherein, in some embodiments, when the second different degree passes through k2When a second evaluation score determines, selecting unit 44 The difference for being also configured as subtracting the first different degree in the second different degree is less than k2The second variance of a second evaluation score In the case of, determine the feature that fisrt feature is selected as to constructed machine learning model.That is, above-mentioned second threshold can Think k2The second variance of a second evaluation score.
According to a kind of embodiment, selecting unit 44 may further be configured that
The synthesis different degree of fisrt feature is determined based on the first different degree and the second different degree;
The feature that fisrt feature is selected as to constructed machine learning model is determined whether according to comprehensive different degree.
In one embodiment, the synthesis different degree of fisrt feature is based on carrying out the first different degree and the second different degree comprehensive The overall target for closing obtained fisrt feature determines that the overall target of fisrt feature is the second different degree and the first different degree Difference plus the ratio of the second different degree and the first different degree it is obtained and.
Wherein, in one further embodiment, the synthesis different degree of fisrt feature is the overall target of fisrt feature The ratio of greatest measure in n overall target corresponding with n feature to be selected.
In another further embodiment, the synthesis different degree of fisrt feature is, the overall target of fisrt feature with The ratio of greatest measure in each overall target of similar feature to be selected, wherein similar feature to be selected is n features to be selected In, the feature of identical predetermined condition is met with fisrt feature.
It is worth noting that device 400 shown in Fig. 4 be with Fig. 2 shows the corresponding device of embodiment of the method implement Example, Fig. 2 shows embodiment of the method in it is corresponding describe be equally applicable to device 400, details are not described herein.
By apparatus above, during for constructed machine learning model selected characteristic, random perturbation, a side are utilized Face determines the first different degree of each feature by machine learning model constructed by sample normally training, on the other hand, right Sample label is exchanged at random, determines the second different degree of each feature, to pass through pair of the second different degree and the first different degree Than carrying out feature selecting, the validity of Feature Selection can be improved in machine learning model building process.
According to the embodiment of another aspect, a kind of computer readable storage medium is also provided, is stored thereon with computer journey Sequence enables computer execute method described in conjunction with Figure 2 when the computer program executes in a computer.
According to the embodiment of another further aspect, a kind of calculating equipment, including memory and processor, the memory are also provided In be stored with executable code, when the processor executes the executable code, realize the method in conjunction with described in Fig. 2.
Those skilled in the art are it will be appreciated that in said one or multiple examples, function described in the invention It can be realized with hardware, software, firmware or their any combination.It when implemented in software, can be by these functions Storage in computer-readable medium or as on computer-readable medium one or more instructions or code transmitted.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all any modification, equivalent substitution, improvement and etc. on the basis of technical solution of the present invention, done should all Including within protection scope of the present invention.

Claims (28)

1. a kind of method of the machine learning model selected characteristic for building, which comprises
M sample data pair is obtained, each sample data is to including, for a features to be selected of n that respective sample is extracted, Yi Jizhen To the sample label that the sample marks in advance, wherein the n features to be selected include at least fisrt feature;
Using m sample data the training machine learning model, to be obtained by the trained machine learning model To the first different degree of the fisrt feature;
Sample label is exchanged to random to the m sample data, and utilizes the m sample after exchanging sample label at random Data are the training machine learning model, to obtain the fisrt feature by the trained machine learning model Second different degree;
At least comparison based on first different degree and second different degree, it is determined whether be selected as the fisrt feature The feature of the machine learning model.
2. according to the method described in claim 1, wherein, it is described using m sample data the training machine learning model, Include: with the first different degree for obtaining the fisrt feature by the trained machine learning model
To the m sample data to progress k1It is secondary randomly ordered, and after each time randomly ordered, it is utilized respectively by random M sample data of sequence is the training machine learning model, to pass through the machine learning model by each training Respectively obtain the k of the fisrt feature1A first evaluation score;
Based on the k1A first evaluation score determines first different degree.
3. described to be based on the k according to the method described in claim 2, wherein1A first evaluation score determines first weight It spends and includes:
By the k1The average value of a first evaluation score is determined as first different degree.
It is described that sample label exchanged to random to the m sample data 4. according to the method described in claim 1, wherein, and Using the m sample data after exchanging sample label at random the training machine learning model, by by training The machine learning model obtain the second different degree of the fisrt feature and include:
Sample label k is exchanged to random to the m sample data2It is secondary, and after each secondary changing label, be utilized respectively by M sample data after random exchange sample label is the training machine learning model, to pass through the institute by each training State the k that machine learning model respectively obtains the fisrt feature2A second evaluation score;
Based on the k2A second evaluation score determines second different degree.
5. described to be based on the k according to the method described in claim 4, wherein2A second evaluation score determines second weight It spends and includes:
By the k2The average value of a first evaluation score is determined as second different degree.
6. any method in -5 according to claim 1, wherein described at least based on first different degree and described the The comparison of two different degrees, it is determined whether include: by the feature that the fisrt feature is selected as the machine learning model
It is special by described first in the case where the difference that second different degree subtracts first different degree is more than first threshold Sign is rejected to except the feature of the machine learning model.
7. method according to claim 4 or 5, wherein described to be at least based on first different degree and second weight The comparison to be spent, it is determined whether include: by the feature that the fisrt feature is selected as the machine learning model
It is more than the k in the difference that second different degree subtracts first different degree2The second variance of a second evaluation score Twice in the case where, the fisrt feature is rejected to except the feature of the machine learning model.
8. any method in -5 according to claim 1, wherein described at least based on first different degree and described the The comparison of two different degrees, it is determined whether include: by the feature that the fisrt feature is selected as the machine learning model
In the case where the difference that second different degree subtracts first different degree is less than second threshold, determine described the One feature selecting is the feature of the machine learning model.
9. method according to claim 4 or 5, wherein described to be at least based on first different degree and second weight The comparison to be spent, it is determined whether include: by the feature that the fisrt feature is selected as the machine learning model
It is less than the k in the difference that second different degree subtracts first different degree2The second variance of a second evaluation score In the case where, determine the feature that the fisrt feature is selected as to the machine learning model.
It is described at least based on first different degree and described second important 10. according to the method described in claim 1, wherein Degree, it is determined whether include: by the feature that the fisrt feature is selected as the machine learning model
The synthesis different degree of the fisrt feature is determined based on first different degree and second different degree;
The feature that the fisrt feature is selected as to the machine learning model is determined whether according to the comprehensive different degree.
11. according to the method described in claim 10, wherein, the synthesis different degree of the fisrt feature is based on first weight Spend the fisrt feature integrated with second different degree overall target determine, the fisrt feature it is comprehensive Closing index is, the difference of second different degree and first different degree is important plus second different degree and described first The ratio of degree it is obtained and.
12. according to the method for claim 11, wherein the synthesis different degree of the fisrt feature is the fisrt feature The corresponding n overall target of overall target and n features to be selected in greatest measure ratio.
13. according to the method for claim 11, wherein the synthesis different degree of the fisrt feature is the fisrt feature Overall target and similar feature to be selected each overall target in greatest measure ratio, wherein the similar spy to be selected Sign is to meet the feature of identical predetermined condition with the fisrt feature in the n features to be selected.
14. a kind of device of the machine learning model selected characteristic for building, described device include:
Acquiring unit is configured to obtain m sample data pair, and for each sample data to including, the n extracted for respective sample is a Feature to be selected, and for the sample label that the sample marks in advance, wherein the n features to be selected include at least the first spy Sign;
First determination unit is configured to using m sample data the training machine learning model, by trained The machine learning model obtains the first different degree of the fisrt feature;
Second determination unit is configured to exchange sample label to random to the m sample data, and exchanges using by random M sample data after sample label is the training machine learning model, to pass through the trained machine learning mould Type obtains the second different degree of the fisrt feature;
Selecting unit is configured at least comparison based on first different degree and second different degree, it is determined whether by institute State the feature that fisrt feature is selected as the machine learning model.
15. device according to claim 14, wherein first determination unit is further configured to:
To the m sample data to progress k1It is secondary randomly ordered, and after each time randomly ordered, it is utilized respectively by random M sample data of sequence is the training machine learning model, to pass through the machine learning model by each training Respectively obtain the k of the fisrt feature1A first evaluation score;
Based on the k1A first evaluation score determines first different degree.
16. device according to claim 15, wherein first determination unit is additionally configured to:
By the k1The average value of a first evaluation score is determined as first different degree.
17. device according to claim 14, wherein second determination unit is further configured to:
Sample label k is exchanged to random to the m sample data2It is secondary, and after each secondary changing label, be utilized respectively by M sample data after random exchange sample label is the training machine learning model, to pass through the institute by each training State the k that machine learning model respectively obtains the fisrt feature2A second evaluation score;
Based on the k2A second evaluation score determines second different degree.
18. device according to claim 17, wherein second determination unit is additionally configured to:
By the k2The average value of a first evaluation score is determined as second different degree.
19. any device in 4-18 according to claim 1, wherein the selecting unit is further configured to:
It is special by described first in the case where the difference that second different degree subtracts first different degree is more than first threshold Sign is rejected to except the feature of the machine learning model.
20. device described in 7 or 18 according to claim 1, wherein the selecting unit is additionally configured to: important described second It is more than the k that degree, which subtracts the difference of first different degree,2It, will in the case where twice of the second variance of a second evaluation score The fisrt feature is rejected to except the feature of the machine learning model.
21. any device in 4-18 according to claim 1, wherein the selecting unit is further configured to:
In the case where the difference that second different degree subtracts first different degree is less than second threshold, determine described the One feature selecting is the feature of the machine learning model.
22. device described in 7 or 18 according to claim 1, wherein the selecting unit is additionally configured in second different degree The difference for subtracting first different degree is less than the k2In the case where the second variance of a second evaluation score, determination will be described Fisrt feature is selected as the feature of the machine learning model.
23. device according to claim 14, wherein the selecting unit is further configured to:
The synthesis different degree of the fisrt feature is determined based on first different degree and second different degree;
The feature that the fisrt feature is selected as to the machine learning model is determined whether according to the comprehensive different degree.
24. device according to claim 23, wherein the synthesis different degree of the fisrt feature is based on first weight Spend the fisrt feature integrated with second different degree overall target determine, the fisrt feature it is comprehensive Closing index is, the difference of second different degree and first different degree is important plus second different degree and described first The ratio of degree it is obtained and.
25. device according to claim 24, wherein the synthesis different degree of the fisrt feature is the fisrt feature The corresponding n overall target of overall target and n features to be selected in greatest measure ratio.
26. device according to claim 24, wherein the synthesis different degree of the fisrt feature is the fisrt feature Overall target and similar feature to be selected each overall target in greatest measure ratio, wherein the similar spy to be selected Sign is to meet the feature of identical predetermined condition with the fisrt feature in the n features to be selected.
27. a kind of computer readable storage medium, is stored thereon with computer program, when the computer program in a computer When execution, computer perform claim is enabled to require the method for any one of 1-13.
28. a kind of calculating equipment, including memory and processor, which is characterized in that be stored with executable generation in the memory Code realizes method of any of claims 1-13 when the processor executes the executable code.
CN201811427683.3A 2018-11-27 2018-11-27 Method and device for selecting features for constructed machine learning model Active CN109472318B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811427683.3A CN109472318B (en) 2018-11-27 2018-11-27 Method and device for selecting features for constructed machine learning model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811427683.3A CN109472318B (en) 2018-11-27 2018-11-27 Method and device for selecting features for constructed machine learning model

Publications (2)

Publication Number Publication Date
CN109472318A true CN109472318A (en) 2019-03-15
CN109472318B CN109472318B (en) 2021-06-04

Family

ID=65674382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811427683.3A Active CN109472318B (en) 2018-11-27 2018-11-27 Method and device for selecting features for constructed machine learning model

Country Status (1)

Country Link
CN (1) CN109472318B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110318327A (en) * 2019-06-10 2019-10-11 长安大学 A kind of surface evenness prediction technique based on random forest
CN110443346A (en) * 2019-08-12 2019-11-12 腾讯科技(深圳)有限公司 A kind of model explanation method and device based on input feature vector importance
CN110674178A (en) * 2019-08-30 2020-01-10 阿里巴巴集团控股有限公司 Method and system for constructing user portrait label
CN110909005A (en) * 2019-11-29 2020-03-24 广州市百果园信息技术有限公司 Model feature analysis method, device, equipment and medium
CN110956278A (en) * 2019-11-26 2020-04-03 支付宝(杭州)信息技术有限公司 Method and system for retraining machine learning models
CN112183758A (en) * 2019-07-04 2021-01-05 华为技术有限公司 Method and device for realizing model training and computer storage medium
CN112381426A (en) * 2020-11-18 2021-02-19 中国林业科学研究院资源信息研究所 Forest degradation remote sensing monitoring method and system based on staged time trend characteristics
WO2021240300A1 (en) * 2020-05-29 2021-12-02 International Business Machines Corporation Machine learning model error detection
CN116012849A (en) * 2023-01-19 2023-04-25 北京百度网讯科技有限公司 Feature screening method and device and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130304732A1 (en) * 2012-05-11 2013-11-14 Sony Corporation Information processing apparatus, information processing method, and program
CN103473366A (en) * 2013-09-27 2013-12-25 浙江大学 Classification method and device for content identification of multi-view cross data field image
CN103487832A (en) * 2013-09-12 2014-01-01 电子科技大学 Method for classifying supervised waveforms in three-dimensional seismic signal
CN103530321A (en) * 2013-09-18 2014-01-22 上海交通大学 Sequencing system based on machine learning
CN105893380A (en) * 2014-12-11 2016-08-24 成都网安科技发展有限公司 Improved text classification characteristic selection method
CN106897918A (en) * 2017-02-24 2017-06-27 上海易贷网金融信息服务有限公司 A kind of hybrid machine learning credit scoring model construction method
CN107392456A (en) * 2017-07-14 2017-11-24 武汉理工大学 A kind of multi-angle rating business credit modeling method for merging internet information
CN107437417A (en) * 2017-08-02 2017-12-05 中国科学院自动化研究所 Based on speech data Enhancement Method and device in Recognition with Recurrent Neural Network speech recognition
CN108231201A (en) * 2018-01-25 2018-06-29 华中科技大学 A kind of construction method, system and the application of disease data analyzing and processing model

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130304732A1 (en) * 2012-05-11 2013-11-14 Sony Corporation Information processing apparatus, information processing method, and program
CN103487832A (en) * 2013-09-12 2014-01-01 电子科技大学 Method for classifying supervised waveforms in three-dimensional seismic signal
CN103530321A (en) * 2013-09-18 2014-01-22 上海交通大学 Sequencing system based on machine learning
CN103473366A (en) * 2013-09-27 2013-12-25 浙江大学 Classification method and device for content identification of multi-view cross data field image
CN105893380A (en) * 2014-12-11 2016-08-24 成都网安科技发展有限公司 Improved text classification characteristic selection method
CN106897918A (en) * 2017-02-24 2017-06-27 上海易贷网金融信息服务有限公司 A kind of hybrid machine learning credit scoring model construction method
CN107392456A (en) * 2017-07-14 2017-11-24 武汉理工大学 A kind of multi-angle rating business credit modeling method for merging internet information
CN107437417A (en) * 2017-08-02 2017-12-05 中国科学院自动化研究所 Based on speech data Enhancement Method and device in Recognition with Recurrent Neural Network speech recognition
CN108231201A (en) * 2018-01-25 2018-06-29 华中科技大学 A kind of construction method, system and the application of disease data analyzing and processing model

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110318327A (en) * 2019-06-10 2019-10-11 长安大学 A kind of surface evenness prediction technique based on random forest
CN112183758A (en) * 2019-07-04 2021-01-05 华为技术有限公司 Method and device for realizing model training and computer storage medium
WO2021000958A1 (en) * 2019-07-04 2021-01-07 华为技术有限公司 Method and apparatus for realizing model training, and computer storage medium
CN110443346B (en) * 2019-08-12 2023-05-02 腾讯科技(深圳)有限公司 Model interpretation method and device based on importance of input features
CN110443346A (en) * 2019-08-12 2019-11-12 腾讯科技(深圳)有限公司 A kind of model explanation method and device based on input feature vector importance
CN110674178A (en) * 2019-08-30 2020-01-10 阿里巴巴集团控股有限公司 Method and system for constructing user portrait label
CN110674178B (en) * 2019-08-30 2023-09-05 创新先进技术有限公司 Method and system for constructing user portrait tag
CN110956278A (en) * 2019-11-26 2020-04-03 支付宝(杭州)信息技术有限公司 Method and system for retraining machine learning models
CN110909005A (en) * 2019-11-29 2020-03-24 广州市百果园信息技术有限公司 Model feature analysis method, device, equipment and medium
CN110909005B (en) * 2019-11-29 2023-03-28 广州市百果园信息技术有限公司 Model feature analysis method, device, equipment and medium
GB2610775A (en) * 2020-05-29 2023-03-15 Ibm Machine learning model error detection
WO2021240300A1 (en) * 2020-05-29 2021-12-02 International Business Machines Corporation Machine learning model error detection
US11720819B2 (en) 2020-05-29 2023-08-08 International Business Machines, Incorporated Machine learning model error detection
CN112381426A (en) * 2020-11-18 2021-02-19 中国林业科学研究院资源信息研究所 Forest degradation remote sensing monitoring method and system based on staged time trend characteristics
CN116012849A (en) * 2023-01-19 2023-04-25 北京百度网讯科技有限公司 Feature screening method and device and electronic equipment

Also Published As

Publication number Publication date
CN109472318B (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN109472318A (en) For the method and device of the machine learning model selected characteristic of building
US11645541B2 (en) Machine learning model interpretation
CN108351985A (en) Method and apparatus for large-scale machines study
CN107230108A (en) The processing method and processing device of business datum
Kumar et al. A benchmark to select data mining based classification algorithms for business intelligence and decision support systems
Intisar et al. Classification of online judge programmers based on rule extraction from self organizing feature map
US20190311258A1 (en) Data dependent model initialization
CN111008898B (en) Method and apparatus for evaluating model interpretation tools
Barry-Straume et al. An evaluation of training size impact on validation accuracy for optimized convolutional neural networks
Verma et al. An ensemble approach to identifying the student gender towards information and communication technology awareness in european schools using machine learning
Tiwari Supervised learning: From theory to applications
CN109635010A (en) A kind of user characteristics and characterization factor extract, querying method and system
Isljamovic et al. Predicting students’ academic performance using artificial neural network: a case study from faculty of organizational sciences
Deepika et al. Relief-F and Budget Tree Random Forest Based Feature Selection for Student Academic Performance Prediction.
Zaffar et al. Role of FCBF feature selection in educational data mining
Pérez-Lemonche et al. Analysing event transitions to discover student roles and predict grades in MOOCs
CN110458600A (en) Portrait model training method, device, computer equipment and storage medium
Drăgulescu et al. Predicting assignment submissions in a multi-class classification problem
Arifin et al. Automatic essay scoring for Indonesian short answers using siamese Manhattan long short-term memory
CN109657710A (en) Data screening method, apparatus, server and storage medium
Lubis et al. KNN method on credit risk classification with binary particle swarm optimization based feature selection
CN113627522A (en) Image classification method, device and equipment based on relational network and storage medium
CN113554099A (en) Method and device for identifying abnormal commercial tenant
Rong et al. Exploring network behavior using cluster analysis
Leteno et al. An investigation of structures responsible for gender bias in BERT and DistilBERT

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200928

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200928

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

GR01 Patent grant
GR01 Patent grant