CN110378391A - Feature Selection method, apparatus, electronic equipment and the storage medium of computation model - Google Patents

Feature Selection method, apparatus, electronic equipment and the storage medium of computation model Download PDF

Info

Publication number
CN110378391A
CN110378391A CN201910554932.3A CN201910554932A CN110378391A CN 110378391 A CN110378391 A CN 110378391A CN 201910554932 A CN201910554932 A CN 201910554932A CN 110378391 A CN110378391 A CN 110378391A
Authority
CN
China
Prior art keywords
feature
feature set
fisrt
sample data
computation model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910554932.3A
Other languages
Chinese (zh)
Inventor
刘扬
陈金辉
陈鹏程
朱晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN201910554932.3A priority Critical patent/CN110378391A/en
Publication of CN110378391A publication Critical patent/CN110378391A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This application discloses Feature Selection method, apparatus, electronic equipment and the storage mediums of computation model.The described method includes: determining the sample data of computation model;Feature to be screened is extracted from sample data;Feature to be screened is divided into fisrt feature set and second feature set;Gone out based on conditional mutual information index screening from the feature moved in second feature set in fisrt feature set;Each feature in finally obtained fisrt feature set is entered into modular character as the computation model.The present solution provides a kind of Feature Selection modes of automation, using conditional mutual information as screening index, the redundancy between feature and the correlation and feature of target is considered simultaneously, it measures and newly enters modular character to the information gain of existing feature and target, to consider feature to the registration between the discrimination and feature of label value simultaneously, model is established with various dimensions as far as possible;Reduce artificial participation, the quick screening of feature may be implemented, there is stronger robustness.

Description

Feature Selection method, apparatus, electronic equipment and the storage medium of computation model
Technical field
This application involves machine learning fields, and in particular to the Feature Selection method, apparatus of computation model, electronic equipment and Storage medium.
Background technique
In machine learning field, computation model is characterized in a primary study object, when carrying out feature selecting, not only The accuracy for needing to consider whether to be able to ascend model, in fields such as traditional financials, the business such as credit scoring card enter modular character Also need to have operational strong explanatory, this just brings challenge.Artificial excessive intervention, example are generally required in the prior art Such as, needing user to provide proves to confirm that it has work, needs to determine that this requires labor intensive to the job specification of user Cost, and different mechanisms and approving person also has different criterion and cognition, it is difficult to it standardizes.Therefore one kind is needed The method for carrying out feature selecting can be automated.
Summary of the invention
In view of the above problems, it proposes on the application overcomes the above problem or at least be partially solved in order to provide one kind State Feature Selection method, apparatus, electronic equipment and the storage medium of the computation model of problem.
According to the one aspect of the application, a kind of Feature Selection method of computation model is provided, which is characterized in that described Method includes:
Determine the sample data of computation model;
Feature to be screened is extracted from sample data;
Feature to be screened is divided into fisrt feature set and second feature set;
Gone out based on conditional mutual information index screening from the feature moved in second feature set in fisrt feature set;
Each feature in finally obtained fisrt feature set is entered into modular character as the computation model.
Optionally, described feature to be screened to be divided into fisrt feature set and second feature set includes:
Feature to be screened is divided into the first spy according to calculated result by the mutual information for calculating each feature to be screened and target value Collection is closed and second feature set.
Optionally, described to go out to move in fisrt feature set from second feature set based on conditional mutual information index screening Feature include:
Several wheel Feature Selections are carried out to each feature in second feature set;
In the screening of every wheel, each spy in each feature and current fisrt feature set in current second feature set is calculated The conditional mutual information of sign and target value, filters out epicycle according to calculated result and moves to fisrt feature set from second feature set In feature.
Optionally, described epicycle is filtered out according to calculated result to be moved in fisrt feature set from second feature set Feature includes:
It is ranked up according to calculated conditional mutual information, selects several features to be transferred according to ranking results;
The feature to be transferred for meeting jump condition is moved in fisrt feature set from second feature set.
Optionally, described that the feature to be transferred for meeting jump condition is moved into fisrt feature set from second feature set In include:
According to each feature calculation variance inflation factor in each qualified feature to be transferred and fisrt feature set VIF;
Corresponding feature to be transferred meets jump condition if VIF is greater than preset value.
Optionally, described to go out to move in fisrt feature set from second feature set based on conditional mutual information index screening Feature include:
If the feature in fisrt feature set reaches preset quantity, stop Feature Selection;
Alternatively,
If the feature quantity that epicycle filters out is 0, stop Feature Selection.
Optionally, the sample data of the determining computation model includes:
Several wheel random samplings are carried out to population sample data, obtain corresponding several pieces sample data;
This method further include:
Enter modular character to what is respectively obtained according to each part sample data, record respectively enters modular character and corresponding fisrt feature is added The sequence of set;
It is determined according to the sequence and final enters modular character.
Optionally, described to include: to several wheel random samplings of population sample data progress
The population sample data are sampled by bootstrap mode, until reaching preset bootstrap wheel Number.
Optionally, the computation model is Logic Regression Models.
According to the another aspect of the application, a kind of Feature Selection device of computation model is provided, which is characterized in that described Device includes:
Sample data unit, for determining the sample data of computation model;
Feature extraction unit, for extracting feature to be screened from sample data;
Feature Selection unit, for feature to be screened to be divided into fisrt feature set and second feature set;Based on item Part mutual information index screening goes out from the feature moved in fisrt feature set in second feature set;It is special by finally obtained first Each feature in collection conjunction enters modular character as the computation model.
Optionally, the Feature Selection unit, for calculating the mutual information of each feature to be screened and target value, according to calculating As a result feature to be screened is divided into fisrt feature set and second feature set.
Optionally, the Feature Selection unit is sieved for carrying out several wheel features to each feature in second feature set Choosing;In the screening of every wheel, calculate each feature in current second feature set and each feature in current fisrt feature set and The conditional mutual information of target value filters out epicycle according to calculated result and moves in fisrt feature set from second feature set Feature.
Optionally, the Feature Selection unit is tied for being ranked up according to calculated conditional mutual information according to sequence Fruit selects several features to be transferred;The feature to be transferred for meeting jump condition is moved into fisrt feature from second feature set In set.
Optionally, the Feature Selection unit, for according to each qualified feature to be transferred and fisrt feature set In each feature calculation variance inflation factor VIF;Corresponding feature to be transferred meets jump condition if VIF is greater than preset value.
Optionally, the Feature Selection unit stops if the feature in fisrt feature set reaches preset quantity Feature Selection;Alternatively, stopping Feature Selection if the feature quantity that epicycle filters out is 0.
Optionally, the sample data unit obtains corresponding for carrying out several wheel random samplings to population sample data Several pieces sample data;
The Feature Selection unit is also used to enter modular character to what is respectively obtained according to each part sample data, and record respectively enters The sequence of corresponding fisrt feature set is added in modular character;It is determined according to the sequence and final enters modular character.
Optionally, the sample data unit, for being taken out by bootstrap mode to the population sample data Sample, until reaching preset bootstrap wheel number.
Optionally, the computation model is Logic Regression Models.
According to the another aspect of the application, a kind of electronic equipment is provided, comprising: processor;And it is arranged to store The memory of computer executable instructions, the executable instruction execute the processor such as any of the above-described institute The method stated.
According to the application's in another aspect, providing a kind of computer readable storage medium, wherein described computer-readable Storage medium stores one or more programs, and one or more of programs when being executed by a processor, are realized as any of the above-described The method.
It can be seen from the above, the technical solution of the application, after determining the sample data of computation model, first from sample data Feature to be screened is extracted, then feature to be screened is divided into fisrt feature set and second feature set, is based on condition mutual trust Index screening is ceased to go out from the feature moved in second feature set in fisrt feature set, by finally obtained fisrt feature set In each feature enter modular character as computation model.The present solution provides a kind of Feature Selection mode of automation, with Conditional mutual information considers the redundancy between feature and the correlation and feature of target as screening index, measures new Enter modular character to the information gain of existing feature and target, thus consider simultaneously feature to the discrimination and feature of label value it Between registration, establish model as far as possible with various dimensions;Reduce artificial participation, the quick screening of feature may be implemented, have Stronger robustness.
Above description is only the general introduction of technical scheme, in order to better understand the technological means of the application, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects, features and advantages of the application can It is clearer and more comprehensible, below the special specific embodiment for lifting the application.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the application Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of process signal of the Feature Selection method of computation model according to the application one embodiment Figure;
Fig. 2 shows the structural representations according to the Feature Selection device of computation model of the application one embodiment a kind of Figure;
Fig. 3 shows the structural schematic diagram of the electronic equipment according to the application one embodiment;
Fig. 4 shows the structural schematic diagram of the computer readable storage medium according to the application one embodiment.
Specific embodiment
The exemplary embodiment of the application is more fully described below with reference to accompanying drawings.Although showing the application in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the application without should be by embodiments set forth here It is limited.It is to be able to thoroughly understand the application on the contrary, providing these embodiments, and can be by scope of the present application It is fully disclosed to those skilled in the art.
In the prior art, by taking credit scoring card mold type as an example, in order to improve modeling effect, often through based on calculating IV value (Information Value, information value index) and the mode being ranked up enter the variable in label with strong discrimination Mould, and artificial rejecting multicollinearity is needed, this aspect still needs higher artificial participation, it is difficult to realize credit card On the other hand the automation of scoring only considers the correlation of feature and label, does not account for the redundancy of feature, it is difficult to from Various dimensions portray the credit risk of user.Here label can be used for identifying a sample be handy family or bad user (from Understand in business).
Introduce the technical solution that the application is proposed is how to make answer to the above problem below with reference to several embodiments 's.
Fig. 1 shows a kind of process signal of the Feature Selection method of computation model according to the application one embodiment Figure.As shown in Figure 1, this method comprises:
Step S110 determines the sample data of computation model.Such as sample data can be collage-credit data, carry label The quality of identity user.
Step S120 extracts feature to be screened from sample data.Here feature to be screened may include whether Any active ues, certain transaction features, geographical location etc..
Feature to be screened is divided into fisrt feature set and second feature set by step S130.Fisrt feature set can To be considered to have entered modular character set, and second feature set is accordingly modular character set to be entered.
Step S140 is gone out from second feature set based on conditional mutual information index screening and is moved in fisrt feature set Feature.Conditional mutual information index can be reacted in the case where considering to have entered modular character, and modular character to be entered increases the information of target Benefit also just considers the redundancy between feature in this way, such as feature is just entered mould when bring information gain is sufficiently large.
Each feature in finally obtained fisrt feature set is entered modular character as computation model by step S150.
As it can be seen that method shown in FIG. 1, provides a kind of Feature Selection mode of automation, using conditional mutual information as sieve Index is selected, while considering the redundancy between feature and the correlation and feature of target, measures and newly enters modular character to existing spy The information gain for target of seeking peace to the greatest extent may be used to consider feature simultaneously to the registration between the discrimination and feature of label value Model can be established with various dimensions;Reduce artificial participation, the quick screening of feature may be implemented, there is stronger robustness.
In one embodiment of the application, in the above method, feature to be screened is divided into fisrt feature set and Two characteristic sets include: the mutual information for calculating each feature to be screened and target value, are divided feature to be screened according to calculated result For fisrt feature set and second feature set.
The mutual information for calculating each feature to be screened and target value, can assess the correlation of feature with target.It is tied calculating Fruit is ranked up, and can choose several features biggish to label value mutual information as modular character is entered, is put into fisrt feature set, Other features to be screened are put into second feature set.In a specific example, it can will select to label value mutual information most Big feature enters modular character as first and is put into fisrt feature set, other features to be screened are put into second feature set.
In one embodiment of the application, in the above method, gone out based on conditional mutual information index screening from second feature The feature moved in fisrt feature set in set includes: to carry out several wheel feature sieves to each feature in second feature set Choosing;In the screening of every wheel, calculate each feature in current second feature set and each feature in current fisrt feature set and The conditional mutual information of target value filters out epicycle according to calculated result and moves in fisrt feature set from second feature set Feature.
Such as in first round Feature Selection, only have one in fisrt feature set and enter modular character, then it is special to calculate second Each feature X in collection conjunctionnEnter modular character X with thisv(1)And the conditional mutual information I (Y of target value Y;Xn|Xv(1)), it is tied according to calculating Fruit screens the feature of epicycle transfer.Then in the same way, every wheel calculates separately each feature X in second feature setnEnter mould with this Feature Xv(k)And the conditional mutual information I (Y of target value Y;Xn|Xv(k)), the feature of epicycle transfer is screened according to calculated result.
In one embodiment of the application, in the above method, epicycle is filtered out from second feature collection according to calculated result The feature moved in fisrt feature set in conjunction includes: to be ranked up according to calculated conditional mutual information, according to ranking results Select several features to be transferred;The feature to be transferred for meeting jump condition is moved into fisrt feature collection from second feature set In conjunction.
By sample calculation given above, in each feature and fisrt feature set in available second feature set The conditional mutual information of each feature, since we are information of the modular character to be entered to target in the case where considering to have entered modular character Gain, then the minimum value in the calculated conditional mutual information of epicycle can be investigated for each feature in second feature set It is compared to each other, selects several features that conditional mutual information minimum value sorts forward, specifically can be a feature.
In addition some additional jump conditions also be can choose to determine whether the feature that epicycle filters out can enter mould.Tool Body in the above method, will meet the feature to be transferred of jump condition from second feature collection in one embodiment of the application Moving in conjunction includes: each feature according in each qualified feature to be transferred and fisrt feature set in fisrt feature set Calculate variance inflation factor VIF;Corresponding feature to be transferred meets jump condition if VIF is greater than preset value.Variance inflation because Son refers between explanatory variable there are variance when multicollinearity and there is no the ratio between variance when multicollinearity, can solve Certainly the problem of multicollinearity, feature is selected by stepwise regression method.
In one embodiment of the application, in the above method, gone out based on conditional mutual information index screening from second feature If the feature moved in fisrt feature set in set includes: that the feature in fisrt feature set reaches preset quantity, stop Feature Selection;Alternatively, stopping Feature Selection if the feature quantity that epicycle filters out is 0.That is, if having entered model Sign meets quantitative requirement, and better effect will not often be brought by being further added to modular character;If epicycle is screened not Feature out, then result and epicycle that next round calculates can not screen feature equally also without difference.Therefore two are given here Kind stops the condition example of Feature Selection.
In one embodiment of the application, in the above method, determine that the sample data of computation model includes: to overall sample Notebook data carries out several wheel random samplings, obtains corresponding several pieces sample data;This method further include: to according to each part sample What data respectively obtained enters modular character, and record respectively enters the sequence that corresponding fisrt feature set is added in modular character;Really according to sequence It is fixed final to enter modular character.
By the way of several wheel random samplings, influence of the sample data for feature selecting can reduce.Specifically, exist In one embodiment of the application, in the above method, carrying out several wheel random samplings to population sample data includes: to pass through Bootstrap mode is sampled population sample data, until reaching preset bootstrap wheel number.
Respectively enter modular character the sequence of corresponding fisrt feature set be added to be able to reflect the power of its information gain, it is final to combine According to the fisrt feature that different part sample datas obtain, is determined by modes such as ballots and final enter modular character.
In one embodiment of the application, in the above method, computation model is Logic Regression Models.Each reality of the application Applying the computation model in example can be specially credit scoring card mold type, be also possible to the relevant computation model of other business, especially Being suitable for modular character needs strong explanatory scene in business.
Fig. 2 shows the structural representations according to the Feature Selection device of computation model of the application one embodiment a kind of Figure.As shown in Fig. 2, the Feature Selection device 200 of computation model includes:
Sample data unit 210, for determining the sample data of computation model.Such as sample data can be reference number According to carrying the quality of tag identifier user.
Feature extraction unit 220, for extracting feature to be screened from sample data.Here feature to be screened can be with Include whether it is any active ues, certain transaction features, geographical location etc..
Feature Selection unit 230, for feature to be screened to be divided into fisrt feature set and second feature set;It is based on Conditional mutual information index screening goes out from the feature moved in fisrt feature set in second feature set;By finally obtained first Each feature in characteristic set enters modular character as computation model.Fisrt feature set may be considered model's collection It closes, and second feature set is accordingly modular character set to be entered.Conditional mutual information index, which can react, to be considered to have entered model In the case where sign, modular character to be entered also just considers the redundancy between feature to the information gain of target in this way, such as Feature is just entered into mould when bring information gain is sufficiently large.
As it can be seen that device shown in Fig. 2, provides a kind of Feature Selection mode of automation, using conditional mutual information as sieve Index is selected, while considering the redundancy between feature and the correlation and feature of target, measures and newly enters modular character to existing spy The information gain for target of seeking peace to the greatest extent may be used to consider feature simultaneously to the registration between the discrimination and feature of label value Model can be established with various dimensions;Reduce artificial participation, the quick screening of feature may be implemented, there is stronger robustness.
In one embodiment of the application, in above-mentioned apparatus, Feature Selection unit 230, for calculating each spy to be screened The mutual information of sign and target value, is divided into fisrt feature set and second feature set for feature to be screened according to calculated result.
In one embodiment of the application, in above-mentioned apparatus, Feature Selection unit 230, for second feature set In each feature carry out several wheel Feature Selections;In the screening of every wheel, calculates each feature in current second feature set and work as The conditional mutual information of each feature and target value in preceding fisrt feature set, filters out epicycle from second feature according to calculated result The feature in fisrt feature set is moved in set.
In one embodiment of the application, in above-mentioned apparatus, Feature Selection unit 230, for according to calculated item Part mutual information is ranked up, and selects several features to be transferred according to ranking results;The feature to be transferred of jump condition will be met It is moved in fisrt feature set from second feature set.
In one embodiment of the application, in above-mentioned apparatus, Feature Selection unit 230, for according to each eligible Feature and fisrt feature set to be transferred in each feature calculation variance inflation factor VIF;It is corresponding if VIF is greater than preset value Feature to be transferred meet jump condition.
In one embodiment of the application, in above-mentioned apparatus, Feature Selection unit 230, if being used for fisrt feature set In feature reach preset quantity, then stop Feature Selection;Alternatively, stopping feature if the feature quantity that epicycle filters out is 0 Screening.
In one embodiment of the application, in above-mentioned apparatus, sample data unit 210, for population sample data Several wheel random samplings are carried out, corresponding several pieces sample data is obtained;Feature Selection unit 230 is also used to according to each part What sample data respectively obtained enters modular character, and record respectively enters the sequence that corresponding fisrt feature set is added in modular character;According to suitable Sequence, which determines, final enters modular character.
In one embodiment of the application, in above-mentioned apparatus, sample data unit 210, for passing through the side bootstrap Formula is sampled population sample data, until reaching preset bootstrap wheel number.
In one embodiment of the application, in above-mentioned apparatus, computation model is Logic Regression Models.
It should be noted that the specific embodiment of above-mentioned each Installation practice is referred to aforementioned corresponding method embodiment Specific embodiment carry out, details are not described herein.
In conclusion the technical solution of the application is first mentioned from sample data after determining the sample data of computation model Feature to be screened is taken out, then feature to be screened is divided into fisrt feature set and second feature set, is based on conditional mutual information Index screening goes out from the feature moved in fisrt feature set in second feature set, will be in finally obtained fisrt feature set Each feature enter modular character as computation model.The present solution provides a kind of Feature Selection modes of automation, with item Part mutual information considers the redundancy between feature and the correlation and feature of target as screening index, and measurement newly enters Modular character is to the information gain of existing feature and target, to consider feature simultaneously between the discrimination and feature of label value Registration, establish model as far as possible with various dimensions;Reduce artificial participation, the quick screening of feature may be implemented, have compared with Strong robustness.
It should be understood that
Algorithm and display be not inherently related to any certain computer, virtual bench or other equipment provided herein. Various fexible units can also be used together with teachings based herein.As described above, it constructs required by this kind of device Structure be obvious.In addition, the application is also not for any particular programming language.It should be understood that can use various Programming language realizes present context described herein, and the description done above to language-specific is to disclose this Shen Preferred forms please.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the application Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the application and help to understand one or more of the various inventive aspects, Above in the description of the exemplary embodiment of the application, each feature of the application is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect Shield this application claims features more more than feature expressly recited in each claim.More precisely, as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as the separate embodiments of the application.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments means to be in the application's Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed Meaning one of can in any combination mode come using.
The various component embodiments of the application can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice Microprocessor or digital signal processor (DSP) realize the Feature Selection device according to the computation model of the embodiment of the present application In some or all components some or all functions.The application is also implemented as described herein for executing Some or all device or device programs (for example, computer program and computer program product) of method.In this way The program of realization the application can store on a computer-readable medium, or can have the shape of one or more signal Formula.Such signal can be downloaded from an internet website to obtain, and perhaps be provided on the carrier signal or with any other shape Formula provides.
For example, Fig. 3 shows the structural schematic diagram of the electronic equipment according to the application one embodiment.The electronic equipment 300 include processor 310 and the memory for being arranged to storage computer executable instructions (computer readable program code) 320.Memory 320 can be such as flash memory, EEPROM (electrically erasable programmable read-only memory), EPROM, hard disk or The electronic memory of ROM etc.Memory 320 has the computer stored for executing any method and step in the above method The memory space 330 of readable program code 331.For example, the memory space 330 for storing computer readable program code can be with Including being respectively used to realize each computer readable program code 331 of the various steps in above method.It is computer-readable Program code 331 can read or be written to this one or more calculating from one or more computer program product In machine program product.These computer program products include such as hard disk, the journey of compact-disc (CD), storage card or floppy disk etc Sequence code carrier.Such computer program product is usually computer readable storage medium described in such as Fig. 4.Fig. 4 is shown According to a kind of structural schematic diagram of the computer readable storage medium of the application one embodiment.The computer-readable storage medium Matter 400 is stored with for executing the computer readable program code 331 according to the present processes step, can be by electronic equipment 300 processor 310 is read, and when computer readable program code 331 is run by electronic equipment 300, leads to the electronic equipment 300 execute each step in method described above, specifically, the computer of the computer-readable recording medium storage Readable program code 331 can execute method shown in any of the above-described embodiment.Computer readable program code 331 can be with Appropriate form is compressed.
The application is limited it should be noted that above-described embodiment illustrates rather than the application, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The application can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame Claim.

Claims (12)

1. a kind of Feature Selection method of computation model, which is characterized in that the described method includes:
Determine the sample data of computation model;
Feature to be screened is extracted from sample data;
Feature to be screened is divided into fisrt feature set and second feature set;
Gone out based on conditional mutual information index screening from the feature moved in second feature set in fisrt feature set;
Each feature in finally obtained fisrt feature set is entered into modular character as the computation model.
2. the method as described in claim 1, which is characterized in that described that feature to be screened is divided into fisrt feature set and Two characteristic sets include:
Feature to be screened is divided into fisrt feature collection according to calculated result by the mutual information for calculating each feature to be screened and target value It closes and second feature set.
3. the method as described in claim 1, which is characterized in that described to be gone out based on conditional mutual information index screening from second feature The feature moved in fisrt feature set in set includes:
Several wheel Feature Selections are carried out to each feature in second feature set;
In the screening of every wheel, calculate each feature in current second feature set and each feature in current fisrt feature set and The conditional mutual information of target value filters out epicycle according to calculated result and moves in fisrt feature set from second feature set Feature.
4. method as claimed in claim 3, which is characterized in that described to filter out epicycle from second feature collection according to calculated result The feature moved in fisrt feature set in conjunction includes:
It is ranked up according to calculated conditional mutual information, selects several features to be transferred according to ranking results;
The feature to be transferred for meeting jump condition is moved in fisrt feature set from second feature set.
5. method as claimed in claim 4, which is characterized in that the feature to be transferred that will meet jump condition is special from second It is moved in fisrt feature set in collection conjunction and includes:
According to each feature calculation variance inflation factor VIF in each qualified feature to be transferred and fisrt feature set;
Corresponding feature to be transferred meets jump condition if VIF is greater than preset value.
6. method as claimed in claim 3, which is characterized in that described to be gone out based on conditional mutual information index screening from second feature The feature moved in fisrt feature set in set includes:
If the feature in fisrt feature set reaches preset quantity, stop Feature Selection;
Alternatively,
If the feature quantity that epicycle filters out is 0, stop Feature Selection.
7. the method as described in claim 1, which is characterized in that the sample data of the determining computation model includes:
Several wheel random samplings are carried out to population sample data, obtain corresponding several pieces sample data;
This method further include:
Enter modular character to what is respectively obtained according to each part sample data, record respectively enters modular character and corresponding fisrt feature set is added Sequence;
It is determined according to the sequence and final enters modular character.
8. the method for claim 7, which is characterized in that described to carry out several wheel random sampling packets to population sample data It includes:
The population sample data are sampled by bootstrap mode, until reaching preset bootstrap wheel number.
9. such as method of any of claims 1-8, which is characterized in that the computation model is Logic Regression Models.
10. a kind of Feature Selection device of computation model, which is characterized in that described device includes:
Sample data unit, for determining the sample data of computation model;
Feature extraction unit, for extracting feature to be screened from sample data;
Feature Selection unit, for feature to be screened to be divided into fisrt feature set and second feature set;It is mutual based on condition Information index is filtered out from the feature moved in fisrt feature set in second feature set;By finally obtained fisrt feature collection Each feature in conjunction enters modular character as the computation model.
11. a kind of electronic equipment, wherein the electronic equipment includes: processor;And it is arranged to the executable finger of storage computer The memory of order, the executable instruction execute the processor as described in any one of claim 1-9 Method.
12. a kind of computer readable storage medium, wherein the computer-readable recording medium storage one or more program, One or more of programs when being executed by a processor, realize method as claimed in any one of claims 1-9 wherein.
CN201910554932.3A 2019-06-25 2019-06-25 Feature Selection method, apparatus, electronic equipment and the storage medium of computation model Pending CN110378391A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910554932.3A CN110378391A (en) 2019-06-25 2019-06-25 Feature Selection method, apparatus, electronic equipment and the storage medium of computation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910554932.3A CN110378391A (en) 2019-06-25 2019-06-25 Feature Selection method, apparatus, electronic equipment and the storage medium of computation model

Publications (1)

Publication Number Publication Date
CN110378391A true CN110378391A (en) 2019-10-25

Family

ID=68249284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910554932.3A Pending CN110378391A (en) 2019-06-25 2019-06-25 Feature Selection method, apparatus, electronic equipment and the storage medium of computation model

Country Status (1)

Country Link
CN (1) CN110378391A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079939A (en) * 2019-11-28 2020-04-28 支付宝(杭州)信息技术有限公司 Machine learning model feature screening method and device based on data privacy protection
CN111861704A (en) * 2020-07-10 2020-10-30 深圳无域科技技术有限公司 Wind control feature generation method and system
CN111931848A (en) * 2020-08-10 2020-11-13 中国平安人寿保险股份有限公司 Data feature extraction method and device, computer equipment and storage medium
CN112364012A (en) * 2021-01-14 2021-02-12 上海冰鉴信息科技有限公司 Data feature determination method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778861A (en) * 2016-12-12 2017-05-31 齐鲁工业大学 A kind of screening technique of key feature
CN108898479A (en) * 2018-06-28 2018-11-27 中国农业银行股份有限公司 The construction method and device of Credit Evaluation Model
US20190034461A1 (en) * 2016-01-28 2019-01-31 Koninklijke Philips N.V. Data reduction for reducing a data set
CN109598275A (en) * 2017-09-30 2019-04-09 富士通株式会社 Feature selecting device, method and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190034461A1 (en) * 2016-01-28 2019-01-31 Koninklijke Philips N.V. Data reduction for reducing a data set
CN106778861A (en) * 2016-12-12 2017-05-31 齐鲁工业大学 A kind of screening technique of key feature
CN109598275A (en) * 2017-09-30 2019-04-09 富士通株式会社 Feature selecting device, method and electronic equipment
CN108898479A (en) * 2018-06-28 2018-11-27 中国农业银行股份有限公司 The construction method and device of Credit Evaluation Model

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079939A (en) * 2019-11-28 2020-04-28 支付宝(杭州)信息技术有限公司 Machine learning model feature screening method and device based on data privacy protection
CN111079939B (en) * 2019-11-28 2021-04-20 支付宝(杭州)信息技术有限公司 Machine learning model feature screening method and device based on data privacy protection
CN111861704A (en) * 2020-07-10 2020-10-30 深圳无域科技技术有限公司 Wind control feature generation method and system
CN111931848A (en) * 2020-08-10 2020-11-13 中国平安人寿保险股份有限公司 Data feature extraction method and device, computer equipment and storage medium
CN111931848B (en) * 2020-08-10 2024-06-14 中国平安人寿保险股份有限公司 Data feature extraction method and device, computer equipment and storage medium
CN112364012A (en) * 2021-01-14 2021-02-12 上海冰鉴信息科技有限公司 Data feature determination method and device and electronic equipment
CN112364012B (en) * 2021-01-14 2021-04-09 上海冰鉴信息科技有限公司 Data feature determination method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN110378391A (en) Feature Selection method, apparatus, electronic equipment and the storage medium of computation model
Trabelsi et al. Comprehensive evaluation of deep learning architectures for prediction of DNA/RNA sequence binding specificities
CN110428475B (en) Medical image classification method, model training method and server
CN106530305B (en) Semantic segmentation model training and image partition method and device calculate equipment
CN106097353B (en) Method for segmenting objects and device, computing device based on the fusion of multi-level regional area
CN106529565B (en) Model of Target Recognition training and target identification method and device calculate equipment
Liu et al. HiCNN: a very deep convolutional neural network to better enhance the resolution of Hi-C data
CN108921569B (en) Method and device for determining complaint type of user
Lee et al. PINE-SPARKY: graphical interface for evaluating automated probabilistic peak assignments in protein NMR spectroscopy
CN108073981A (en) The method and apparatus for handling convolutional neural networks
CN107423613A (en) The method, apparatus and server of device-fingerprint are determined according to similarity
CN106776334A (en) Based on annotation generation method of test example and device
CN110222285A (en) The methods of exhibiting of reading page calculates equipment and computer storage medium
Roel-Touris et al. LightDock goes information-driven
CN111738780A (en) Method and system for recommending object
CN118013428B (en) Geological disaster risk assessment method and system based on artificial intelligence
Fang et al. MUFold-SSW: a new web server for predicting protein secondary structures, torsion angles and turns
CN104462311B (en) The methods of exhibiting and device of a kind of information
US20220036966A1 (en) Machine learning for protein binding sites
CN114861783A (en) Recommendation model training method and device, electronic equipment and storage medium
CN104517166A (en) Candidate skill evaluation method, candidate skill evaluation device and system
Thiele et al. Motivation for using data-driven algorithms in research: A review of machine learning solutions for image analysis of micrographs in neuroscience
US20150370689A1 (en) Automated defect positioning based on historical data
US20170308379A1 (en) Evaluating documentation coverage
CN107391771A (en) The generation method and device of a kind of image special effect

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191025