CN110390400A - Feature generation method, device, electronic equipment and the storage medium of computation model - Google Patents
Feature generation method, device, electronic equipment and the storage medium of computation model Download PDFInfo
- Publication number
- CN110390400A CN110390400A CN201910596683.4A CN201910596683A CN110390400A CN 110390400 A CN110390400 A CN 110390400A CN 201910596683 A CN201910596683 A CN 201910596683A CN 110390400 A CN110390400 A CN 110390400A
- Authority
- CN
- China
- Prior art keywords
- model
- feature
- foundation characteristic
- foundation
- forest model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Image Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
This application discloses feature generation method, device, electronic equipment and the storage mediums of computation model.The described method includes: obtaining sample data set and foundation characteristic collection;Sequential forest model is constructed according to the sample data set and the foundation characteristic collection;Wherein, different foundation characteristics is distributed for each forest model, the root node of each tree-model in same forest model uses the same foundation feature for forest model distribution as division feature;Assemblage characteristic and/or compound characteristics are obtained according to the sequential forest model that building is completed.The beneficial effect of the technical solution is, the intersection that feature is carried out based on sequential forest model is derivative, propose a kind of new feature generating mode, the more random forest of the selection of foundation characteristic has more controllability, the feature of generation is explanatory stronger, the possibility for generating repeated characteristic and useless feature is reduced, and each forest model can construct parallel, efficiency is very high.
Description
Technical field
This application involves machine learning fields, and in particular to the feature generation method of computation model, device, electronic equipment and
Storage medium.
Background technique
Machine learning often selects to construct different calculating moulds in different fields based on the building of computation model
Type.For example, being often used credit scoring card in financial air control field and user credit risk being modeled and assessed, in order to balance
The interpretation and algorithm complexity of computation model, credit scoring card are often implemented on linear model.For linear model
For, the interactive information between feature, therefore the Feature Engineering stage before constructing computation model can not be captured, it usually needs into
The intersection of row feature is derivative, obtains assemblage characteristic (also referred to as cross feature, combined crosswise feature etc.).
In the prior art, there are some problems for mode derived from feature, such as: linear discriminant analysis LDA can not be captured
Non-linear interactive information between feature;Complexity derived from violence intersection is excessively high, interpretation is poor;Gradient boosted tree GBDT,
Random forest falls into local optimum, etc..Therefore, it is necessary to a kind of new feature generating modes.
Summary of the invention
In view of the above problems, it proposes on the application overcomes the above problem or at least be partially solved in order to provide one kind
State feature generation method, device, electronic equipment and the storage medium of the computation model of problem.
According to the one aspect of the application, a kind of feature generation method of computation model is provided, comprising:
Obtain sample data set and foundation characteristic collection;
Sequential forest model is constructed according to the sample data set and the foundation characteristic collection;It wherein, is each forest model
Different foundation characteristics is distributed, the root node of each tree-model in same forest model is same using distributing for the forest model
One foundation characteristic is as division feature;
Assemblage characteristic and/or compound characteristics are obtained according to the sequential forest model that building is completed.
Optionally, described to distribute different foundation characteristics for each forest model and include:
Preset quantity wheel information is carried out to each foundation characteristic that the foundation characteristic is concentrated according to the sample data set to increase
Benefit calculates, and after every wheel calculates, the maximum foundation characteristic of information gain which obtains is mentioned from the foundation characteristic collection
It takes out, and assigns them to the forest model of a unassigned foundation characteristic.
Optionally, described to include: according to the sample data set and the sequential forest model of foundation characteristic collection building
To each tree-model in a forest model, determines divide feature used in each layer respectively;Wherein, one is being determined
When dividing feature used in destination layer, according to the used division feature of layer each in each tree-model and the foundation characteristic collection,
It determines target foundation characteristic list, information gain meter is carried out according to the sample data set and the target foundation characteristic list
It calculates, obtains the maximum foundation characteristic of information gain as used in the destination layer and divide feature.
Optionally, described according to the used foundation characteristic of layer each in each tree-model and the foundation characteristic collection, determine mesh
Marking foundation characteristic list includes:
If destination layer be the second layer, selection belong to the foundation characteristic collection but not by root node use and it is not same
The foundation characteristic that the second layer of other tree-models uses in forest model is put into the target foundation characteristic list;
If destination layer is the second layer other layers below, selection belongs to the foundation characteristic collection and not by this tree-model
Used foundation characteristic, be put into the target foundation characteristic list.
Optionally, the quantity of tree-model and/or the depth of tree-model are predetermined in the forest model.
Optionally, the sequential forest model completed according to building obtains assemblage characteristic and/or compound characteristics include:
Each feature combinatorial path according to determined by each tree-model is identified as corresponding one-dimensional assemblage characteristic.
Optionally, the sequential forest model completed according to building obtains assemblage characteristic and/or compound characteristics include:
Each tree-model is exported respectively as a corresponding compound characteristics.
According to the another aspect of the application, a kind of feature generating means of computation model are provided, comprising:
Acquiring unit, for obtaining sample data set and foundation characteristic collection;
Model construction unit, for constructing sequential forest model according to the sample data set and the foundation characteristic collection;
Wherein, distribute different foundation characteristics for each forest model, the root node of each tree-model in same forest model using for
The same foundation feature of forest model distribution is as division feature;
Feature generation unit, the sequential forest model for being completed according to building obtain assemblage characteristic and/or compound characteristics.
Optionally, the model construction unit, it is each for being concentrated according to the sample data set to the foundation characteristic
Foundation characteristic carries out preset quantity wheel information gain and calculates, and after every wheel calculates, the information gain which is obtained is maximum
Foundation characteristic extracted from the foundation characteristic collection, and assign them to the forest mould of a unassigned foundation characteristic
Type.
Optionally, the model construction unit, for determining each layer institute respectively to each tree-model in a forest model
The division feature used;Wherein, it when determining division feature used in a destination layer, has been used according to layer each in each tree-model
Division feature and the foundation characteristic collection, target foundation characteristic list is determined, according to the sample data set and the target
Foundation characteristic list carries out information gain calculating, obtains the maximum foundation characteristic of information gain and draws as used in the destination layer
Dtex sign.
Optionally, the model construction unit, if being the second layer for destination layer, selection belongs to the foundation characteristic
Collection but the foundation characteristic for not used and not used by the second layer of other tree-models in same forest model by root node, are put into
The target foundation characteristic list;If destination layer be the second layer other layers below, selection belong to the foundation characteristic collection and
Not by the used foundation characteristic of this tree-model, it is put into the target foundation characteristic list.
Optionally, the quantity of tree-model and/or the depth of tree-model are predetermined in the forest model.
Optionally, the feature generation unit, for each feature combinatorial path point according to determined by each tree-model
It is not determined as corresponding one-dimensional assemblage characteristic.
Optionally, the feature generation unit, for being exported each tree-model respectively as a corresponding compound characteristics.
According to the another aspect of the application, a kind of electronic equipment is provided, comprising: processor;And it is arranged to store
The memory of computer executable instructions, the executable instruction execute the processor such as any of the above-described institute
The method stated.
According to the application's in another aspect, providing a kind of computer readable storage medium, wherein described computer-readable
Storage medium stores one or more programs, and one or more of programs when being executed by a processor, are realized as any of the above-described
The method.
It can be seen from the above, the technical solution of the application, after obtaining sample data set and foundation characteristic collection, according to sample number
Sequential forest model is constructed according to collection and foundation characteristic collection;Wherein, different foundation characteristics, same forest are distributed for each forest model
The root node of each tree-model in model uses the same foundation feature for forest model distribution as feature is divided, finally
Assemblage characteristic and/or compound characteristics are obtained according to the sequential forest model that building is completed.The beneficial effect of the technical solution is,
The intersection that feature is carried out based on sequential forest model is derivative, proposes a kind of new feature generating mode, the choosing of foundation characteristic
More random forest is selected with more controllability, the feature of generation is explanatory stronger, reduces and generates repeated characteristic and useless feature
May, and each forest model can construct parallel, and efficiency is very high.
Above description is only the general introduction of technical scheme, in order to better understand the technological means of the application,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects, features and advantages of the application can
It is clearer and more comprehensible, below the special specific embodiment for lifting the application.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the application
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of process signal of the feature generation method of computation model according to the application one embodiment
Figure;
Fig. 2 shows the structural representations according to the feature generating means of computation model of the application one embodiment a kind of
Figure;
Fig. 3 shows the structural schematic diagram of the electronic equipment according to the application one embodiment;
Fig. 4 shows the structural schematic diagram of the computer readable storage medium according to the application one embodiment.
Specific embodiment
The exemplary embodiment of the application is more fully described below with reference to accompanying drawings.Although showing the application in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the application without should be by embodiments set forth here
It is limited.It is to be able to thoroughly understand the application on the contrary, providing these embodiments, and can be by scope of the present application
It is fully disclosed to those skilled in the art.
Computation model in the embodiment of the present application can be applied according to business demand in corresponding field, such as financial air control
Field, specifically as credit scoring card mold type come using.
Before proposing sequential forest model, we analyze feature deriving mode in the prior art, specific next
It says, the complexity that violence intersects derivative algorithm is excessively high, and can generate a large amount of useless assemblage characteristics;Linear character combinational algorithm
Such as LDA, the non-linear interactive information between feature can not be captured;Nonlinear combination feature such as GBDT, the assemblage characteristic exported are
That feature is constantly accumulated in together as a result, so obtained assemblage characteristic is excessively complicated, interpretation is poor, is not suitable in business
Need strong explanatory scene;Excessively random when random forest selected characteristic, characteristic crossover process is uncontrollable, can also generate repetition
Feature and useless feature;In addition, GBDT and random forest are all based on the thought of greed when constructing each tree, thus may
Local optimum is fallen into, to lose the intersection information between certain slightly weak features.
The sequential forest model proposed based on above-mentioned analysis distributes different foundation characteristics for each forest model, same gloomy
The root node of each tree-model in woods model use for the forest model distribution same foundation feature as division feature, from
And making the selection of foundation characteristic that there is controllability, the feature of generation is explanatory stronger, and in the building of single tree also as far as possible
It avoids falling into local optimum.It is specifically introduced below with reference to embodiment.
Fig. 1 shows a kind of process signal of the feature generation method of computation model according to the application one embodiment
Figure.As shown in Figure 1, this method comprises:
Step S110 obtains sample data set and foundation characteristic collection.Such as sample data can be collage-credit data, carry
The quality of tag identifier user.Accordingly, foundation characteristic may include whether it is any active ues, certain transaction features, geographical position
Set etc., that is to say, that sample data set and foundation characteristic collection can be determined according to business scenario.It can additionally carry out
The pretreatment such as feature branch mailbox.
Step S120 constructs sequential forest model according to sample data set and foundation characteristic collection;It wherein, is each forest model
Different foundation characteristics is distributed, the root node of each tree-model in same forest model is same using distributing for the forest model
One foundation characteristic is as division feature.
Step S130 obtains assemblage characteristic and/or compound characteristics according to the sequential forest model that building is completed.For obtaining
Feature can also carry out feature selecting, such as select filtering type, packaging type or embedded realize.
As it can be seen that method shown in FIG. 1, the intersection that feature is carried out based on sequential forest model is derivative, proposes a kind of new
Feature generating mode, the more random forest of the selection of foundation characteristic have more controllability, the feature of generation is explanatory stronger, reduce
The possibility of repeated characteristic and useless feature is generated, and each forest model can construct parallel, efficiency is very high.
In one embodiment of the application, in the above method, distributing different foundation characteristics for each forest model includes:
Preset quantity wheel information gain is carried out to each foundation characteristic that foundation characteristic is concentrated according to sample data set to calculate, and is calculated in every wheel
After, the maximum foundation characteristic of information gain which obtains is extracted from foundation characteristic collection, and assign them to one
The forest model of a unassigned foundation characteristic.
It is calculated for example, carrying out n wheel information gain, then final choice goes out n foundation characteristic, and accordingly building n
Forest model.The value of n can determine according to actual needs, if excessive, basis used in forest model relatively rearward
For the information gain of feature with regard to lower, actual effect is simultaneously bad, if too small can also be unable to get sufficient amount of feature.It can be seen that this
Kind mode selects foundation characteristic to be allocated, and the controllability compared to random forest greatly enhances.
It is sequential according to sample data set and the building of foundation characteristic collection in the above method in one embodiment of the application
Forest model includes: to determine divide feature used in each layer respectively to each tree-model in a forest model;Wherein, In
When determining division feature used in a destination layer, according to the used division feature of layer each in each tree-model and foundation characteristic
Collection, determines target foundation characteristic list, carries out information gain calculating according to sample data set and target foundation characteristic list, obtains
The maximum foundation characteristic of information gain divides feature as used in the destination layer.
That is, being selected based on foundation characteristic collection, and consider each layer when choosing the division feature of a certain layer
The foundation characteristic used, this makes it possible to determine the list of target foundation characteristic.Just pass through letter when so choosing new division feature
It ceases gain to calculate, obtains the maximum foundation characteristic of information gain as used in the destination layer and divide feature.The application is implemented
Information gain calculation method used in example can realize that the application does not limit this with reference to correlation technique in the prior art
System.
In one embodiment of the application, in the above method, according to the used foundation characteristic of layer each in each tree-model
With foundation characteristic collection, if determine target foundation characteristic list include: destination layer be the second layer, selection belong to foundation characteristic collection,
But the foundation characteristic for not used and not used by the second layer of other tree-models in same forest model by root node, is put into mesh
Mark foundation characteristic list;If destination layer is the second layer other layers below, selection belongs to foundation characteristic collection and not by this tree mould
The used foundation characteristic of type is put into target foundation characteristic list.
It is introduced below with a specific example.Obtain sample data set D and foundation characteristic collection Lall={ x1,x2…,
x6, it is readily appreciated that this is the simplification introduction to actual scene, when it is implemented, the quantity of foundation characteristic often will be more
It is more.
Firstly, initialization Llevel1=empty, this set is the root node for recording each tree-model in a forest model
Used division feature.Initialize Llevel2=empty, this set is second for recording each tree-model in the forest model
Feature is divided used in layer.
It calculatesCalculate information
The maximum feature x of gainlevel1=x1, then as the feature of the root node of all tree-models of the forest model, it is put into
Llevel1, L at this timelevel1={ x1}。
It calculatesAs a result it is
{x2,x3…,x6, calculate the maximum feature x of information gainlevel2=x2, L at this timelevel2={ x2};
Calculate Ldiff3={ xi|xi∈Lall&xi≠xlevel1&xi≠xlevel2, result is { x3…,x6, calculate information increasing
The maximum feature x of benefitlevel3=x3。
The feature combinatorial path x then exportednew=x1、x2、x3, this is to consider the case where depth is 3.
Similarly, subsequent to calculate again
The maximum feature x of gainlevel2=x3, L at this timelevel2={ x2, x3};
Ldiff3={ xi|xi∈Lall&xi≠xlevel1&xi≠xlevel2}={ x2,,x4…,x6, the maximum feature of gain
xlevel3=x2;
The feature combinatorial path x then exportednew=x1、x3、x2。
Specific calculating process can be with Parallel Implementation.x1、x2、x3And x1、x3、x2It is two different cross features, because
It is chosen into different layers, cut-point is different, solves the problems, such as the greedy of tree-model.
In one embodiment of the application, in the above method, the quantity of tree-model and/or tree-model in forest model
Depth is predetermined.Depth is bigger, and the assemblage characteristic of final output is more complicated, and calculation amount is also more, and efficiency can also decline,
It needs to be balanced in the specific implementation.The quantity bl of tree-model can be analogous to the quantity of forest model, because of feature selecting
Similarly related with information gain, the more feature that generates of front is also more important, therefore bl value is without too big.
In one embodiment of the application, in the above method, combined according to the sequential forest model that building is completed
Feature and/or compound characteristics include: to be identified as each feature combinatorial path according to determined by each tree-model accordingly
One-dimensional assemblage characteristic.
For example, the feature combinatorial path x that front exportsnew=x1、x2、x3And the feature combinatorial path x of outputnew=x1、
x3、x2It is exactly the different assemblage characteristic of bidimensional.Accordingly, consider sample data and assemblage characteristic, xnewValue is 1, illustrates sample
By the paths, and xnewValue is 0, illustrates sample without the paths.
In one embodiment of the application, in the above method, combined according to the sequential forest model that building is completed
Feature and/or compound characteristics include: to export each tree-model respectively as a corresponding compound characteristics.Every feature combinatorial path
For a branch mailbox bin of the compound characteristics.
Fig. 2 shows the structural representations according to the feature generating means of computation model of the application one embodiment a kind of
Figure.As shown in Fig. 2, the feature generating means 200 of computation model include:
Acquiring unit 210, for obtaining sample data set and foundation characteristic collection.Such as sample data can be reference number
According to carrying the quality of tag identifier user.Accordingly, foundation characteristic may include whether it is any active ues, certain transaction spy
Sign, geographical location etc., that is to say, that sample data set and foundation characteristic collection can be determined according to business scenario.Furthermore
It can also carry out the pretreatment such as feature branch mailbox.
Model construction unit 220, for constructing sequential forest model according to sample data set and foundation characteristic collection;Wherein,
Different foundation characteristics is distributed for each forest model, it is the forest that the root node of each tree-model in same forest model, which uses,
The same foundation feature of model distribution is as division feature.
Feature generation unit 230, the sequential forest model for being completed according to building obtain assemblage characteristic and/or compound spy
Sign.
Feature selecting can also be carried out for obtained feature, such as select filtering type, packaging type or embedded realize.
As it can be seen that device shown in Fig. 2, the intersection that feature is carried out based on sequential forest model is derivative, proposes a kind of new
Feature generating mode, the more random forest of the selection of foundation characteristic have more controllability, the feature of generation is explanatory stronger, reduce
The possibility of repeated characteristic and useless feature is generated, and each forest model can construct parallel, efficiency is very high.
In one embodiment of the application, in above-mentioned apparatus, model construction unit 220, for according to sample data set
It carries out preset quantity wheel information gain to each foundation characteristic that foundation characteristic is concentrated to calculate, after every wheel calculates, by the wheel
The maximum foundation characteristic of obtained information gain is extracted from foundation characteristic collection, and assigns them to a unassigned basis
The forest model of feature.
In one embodiment of the application, in above-mentioned apparatus, model construction unit 220, for a forest model
In each tree-model, determine divide feature used in each layer respectively;Wherein, feature is divided used in a destination layer determining
When, according to the used division feature of layer each in each tree-model and foundation characteristic collection, target foundation characteristic list is determined, according to sample
Notebook data collection and target foundation characteristic list carry out information gain calculating, obtain the maximum foundation characteristic of information gain as the mesh
It marks and divides feature used in layer.
In one embodiment of the application, in above-mentioned apparatus, model construction unit 220, if being second for destination layer
Layer is then chosen and belongs to foundation characteristic collection but not by root node use and not by second of other tree-models in same forest model
The foundation characteristic that layer uses, is put into target foundation characteristic list;If destination layer is the second layer other layers below, selection belongs to
Foundation characteristic collection and not by the used foundation characteristic of this tree-model, is put into target foundation characteristic list.
In one embodiment of the application, in above-mentioned apparatus, the quantity of tree-model and/or tree-model in forest model
Depth is predetermined.
In one embodiment of the application, in above-mentioned apparatus, feature generation unit 230, being used for will be according to each tree-model
Identified each feature combinatorial path is identified as corresponding one-dimensional assemblage characteristic.
In one embodiment of the application, in above-mentioned apparatus, feature generation unit 230, for distinguishing each tree-model
Output is a corresponding compound characteristics.
It should be noted that the specific embodiment of above-mentioned each Installation practice is referred to aforementioned corresponding method embodiment
Specific embodiment carry out, details are not described herein.
In conclusion the technical solution of the application, after obtaining sample data set and foundation characteristic collection, according to sample data
Collection and foundation characteristic collection construct sequential forest model;Wherein, different foundation characteristics, same forest mould are distributed for each forest model
The root node of each tree-model in type uses the same foundation feature for forest model distribution as division feature, last root
Assemblage characteristic and/or compound characteristics are obtained according to the sequential forest model that building is completed.The beneficial effect of the technical solution is, base
It is derivative come the intersection for carrying out feature in sequential forest model, propose a kind of new feature generating mode, the selection of foundation characteristic
More random forest has more controllability, and the feature of generation is explanatory stronger, reduce generate repeated characteristic and useless feature can
Can, and each forest model can construct parallel, and efficiency is very high, the interactive information between feature can be excavated, in the structure of tree-model
Local optimum problem caused by greedy thought can also be avoided when building.
It should be understood that
Algorithm and display be not inherently related to any certain computer, virtual bench or other equipment provided herein.
Various fexible units can also be used together with teachings based herein.As described above, it constructs required by this kind of device
Structure be obvious.In addition, the application is also not for any particular programming language.It should be understood that can use various
Programming language realizes present context described herein, and the description done above to language-specific is to disclose this Shen
Preferred forms please.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the application
Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the application and help to understand one or more of the various inventive aspects, In
Above in the description of the exemplary embodiment of the application, each feature of the application is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect
Shield this application claims features more more than feature expressly recited in each claim.More precisely, as following
Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself
All as the separate embodiments of the application.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment
Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or
Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any
Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed
All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power
Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose
It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included certain features rather than other feature, but the combination of the feature of different embodiments means to be in the application's
Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed
Meaning one of can in any combination mode come using.
The various component embodiments of the application can be implemented in hardware, or to run on one or more processors
Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice
Microprocessor or digital signal processor (DSP) realize the feature generating means according to the computation model of the embodiment of the present application
In some or all components some or all functions.The application is also implemented as described herein for executing
Some or all device or device programs (for example, computer program and computer program product) of method.In this way
The program of realization the application can store on a computer-readable medium, or can have the shape of one or more signal
Formula.Such signal can be downloaded from an internet website to obtain, and perhaps be provided on the carrier signal or with any other shape
Formula provides.
For example, Fig. 3 shows the structural schematic diagram of the electronic equipment according to the application one embodiment.The electronic equipment
300 include processor 310 and the memory for being arranged to storage computer executable instructions (computer readable program code)
320.Memory 320 can be such as flash memory, EEPROM (electrically erasable programmable read-only memory), EPROM, hard disk or
The electronic memory of ROM etc.Memory 320 has the computer stored for executing any method and step in the above method
The memory space 330 of readable program code 331.For example, the memory space 330 for storing computer readable program code can be with
Including being respectively used to realize each computer readable program code 331 of the various steps in above method.It is computer-readable
Program code 331 can read or be written to this one or more calculating from one or more computer program product
In machine program product.These computer program products include such as hard disk, the journey of compact-disc (CD), storage card or floppy disk etc
Sequence code carrier.Such computer program product is usually computer readable storage medium described in such as Fig. 4.Fig. 4 is shown
According to a kind of structural schematic diagram of the computer readable storage medium of the application one embodiment.The computer-readable storage medium
Matter 400 is stored with for executing the computer readable program code 331 according to the present processes step, can be by electronic equipment
300 processor 310 is read, and when computer readable program code 331 is run by electronic equipment 300, leads to the electronic equipment
300 execute each step in method described above, specifically, the computer of the computer-readable recording medium storage
Readable program code 331 can execute method shown in any of the above-described embodiment.Computer readable program code 331 can be with
Appropriate form is compressed.
The application is limited it should be noted that above-described embodiment illustrates rather than the application, and ability
Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not
Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such
Element.The application can be by means of including the hardware of several different elements and being come by means of properly programmed computer real
It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch
To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame
Claim.
Claims (10)
1. a kind of feature generation method of computation model characterized by comprising
Obtain sample data set and foundation characteristic collection;
Sequential forest model is constructed according to the sample data set and the foundation characteristic collection;Wherein, it is distributed for each forest model
Different foundation characteristics, the root node of each tree-model in same forest model is using the same base for being forest model distribution
Plinth feature is as division feature;
Assemblage characteristic and/or compound characteristics are obtained according to the sequential forest model that building is completed.
2. the method as described in claim 1, which is characterized in that described to distribute different foundation characteristic packets for each forest model
It includes:
Preset quantity wheel information gain meter is carried out to each foundation characteristic that the foundation characteristic is concentrated according to the sample data set
It calculates, after every wheel calculates, the maximum foundation characteristic of information gain which obtains is extracted from the foundation characteristic collection
Come, and assigns them to the forest model of a unassigned foundation characteristic.
3. method according to claim 2, which is characterized in that described according to the sample data set and the foundation characteristic collection
Constructing sequential forest model includes:
To each tree-model in a forest model, determines divide feature used in each layer respectively;
Wherein, when determining division feature used in a destination layer, according to the used division feature of layer each in each tree-model
With the foundation characteristic collection, target foundation characteristic list is determined, arranged according to the sample data set and the target foundation characteristic
Table carries out information gain calculating, obtains the maximum foundation characteristic of information gain as used in the destination layer and divides feature.
4. method as claimed in claim 3, which is characterized in that described according to the used foundation characteristic of layer each in each tree-model
With the foundation characteristic collection, determine that target foundation characteristic list includes:
If destination layer is the second layer, selection belongs to the foundation characteristic collection but not by root node use and not by same forest
The foundation characteristic that the second layer of other tree-models uses in model is put into the target foundation characteristic list;
If destination layer is the second layer other layers below, selection belongs to the foundation characteristic collection and is not made by this tree-model
Used foundation characteristic is put into the target foundation characteristic list.
5. the method as described in claim 1, which is characterized in that the quantity and/or tree-model of tree-model in the forest model
Depth be predetermined.
6. the method as described in claim 1, which is characterized in that the sequential forest model completed according to building is combined
Feature and/or compound characteristics include:
Each feature combinatorial path according to determined by each tree-model is identified as corresponding one-dimensional assemblage characteristic.
7. the method as described in claim 1, which is characterized in that the sequential forest model completed according to building is combined
Feature and/or compound characteristics include:
Each tree-model is exported respectively as a corresponding compound characteristics.
8. a kind of feature generating means of computation model characterized by comprising
Acquiring unit, for obtaining sample data set and foundation characteristic collection;
Model construction unit, for constructing sequential forest model according to the sample data set and the foundation characteristic collection;Wherein,
Different foundation characteristics is distributed for each forest model, it is the forest that the root node of each tree-model in same forest model, which uses,
The same foundation feature of model distribution is as division feature;
Feature generation unit, the sequential forest model for being completed according to building obtain assemblage characteristic and/or compound characteristics.
9. a kind of electronic equipment, wherein the electronic equipment includes: processor;And it is arranged to the executable finger of storage computer
The memory of order, the executable instruction execute the processor as described in any one of claim 1-7
Method.
10. a kind of computer readable storage medium, wherein the computer-readable recording medium storage one or more program,
One or more of programs when being executed by a processor, realize such as method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910596683.4A CN110390400B (en) | 2019-07-02 | 2019-07-02 | Feature generation method and device of computing model, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910596683.4A CN110390400B (en) | 2019-07-02 | 2019-07-02 | Feature generation method and device of computing model, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110390400A true CN110390400A (en) | 2019-10-29 |
CN110390400B CN110390400B (en) | 2023-07-14 |
Family
ID=68286179
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910596683.4A Active CN110390400B (en) | 2019-07-02 | 2019-07-02 | Feature generation method and device of computing model, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110390400B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160210435A1 (en) * | 2013-08-28 | 2016-07-21 | Siemens Aktiengesellschaft | Systems and methods for estimating physiological heart measurements from medical images and clinical data |
CN107894827A (en) * | 2017-10-31 | 2018-04-10 | 广东欧珀移动通信有限公司 | Using method for cleaning, device, storage medium and electronic equipment |
WO2018107492A1 (en) * | 2016-12-16 | 2018-06-21 | 深圳大学 | Intuitionistic fuzzy random forest-based method and device for target tracking |
CN109145959A (en) * | 2018-07-27 | 2019-01-04 | 东软集团股份有限公司 | A kind of feature selection approach, device and equipment |
CN109255393A (en) * | 2018-09-30 | 2019-01-22 | 上海机电工程研究所 | Infrared Imaging Seeker anti-jamming performance evaluation method based on random forest |
CN109284382A (en) * | 2018-09-30 | 2019-01-29 | 武汉斗鱼网络科技有限公司 | A kind of file classification method and computing device |
CN109408531A (en) * | 2018-09-25 | 2019-03-01 | 平安科技(深圳)有限公司 | The detection method and device of slow drop type data, electronic equipment, storage medium |
CN109697447A (en) * | 2017-10-20 | 2019-04-30 | 富士通株式会社 | Disaggregated model construction device, method and electronic equipment based on random forest |
CN109947498A (en) * | 2017-12-20 | 2019-06-28 | 广东欧珀移动通信有限公司 | Application program preloads method, apparatus, storage medium and mobile terminal |
CN109947497A (en) * | 2017-12-20 | 2019-06-28 | 广东欧珀移动通信有限公司 | Application program preloads method, apparatus, storage medium and mobile terminal |
-
2019
- 2019-07-02 CN CN201910596683.4A patent/CN110390400B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160210435A1 (en) * | 2013-08-28 | 2016-07-21 | Siemens Aktiengesellschaft | Systems and methods for estimating physiological heart measurements from medical images and clinical data |
WO2018107492A1 (en) * | 2016-12-16 | 2018-06-21 | 深圳大学 | Intuitionistic fuzzy random forest-based method and device for target tracking |
CN109697447A (en) * | 2017-10-20 | 2019-04-30 | 富士通株式会社 | Disaggregated model construction device, method and electronic equipment based on random forest |
CN107894827A (en) * | 2017-10-31 | 2018-04-10 | 广东欧珀移动通信有限公司 | Using method for cleaning, device, storage medium and electronic equipment |
CN109947498A (en) * | 2017-12-20 | 2019-06-28 | 广东欧珀移动通信有限公司 | Application program preloads method, apparatus, storage medium and mobile terminal |
CN109947497A (en) * | 2017-12-20 | 2019-06-28 | 广东欧珀移动通信有限公司 | Application program preloads method, apparatus, storage medium and mobile terminal |
CN109145959A (en) * | 2018-07-27 | 2019-01-04 | 东软集团股份有限公司 | A kind of feature selection approach, device and equipment |
CN109408531A (en) * | 2018-09-25 | 2019-03-01 | 平安科技(深圳)有限公司 | The detection method and device of slow drop type data, electronic equipment, storage medium |
CN109255393A (en) * | 2018-09-30 | 2019-01-22 | 上海机电工程研究所 | Infrared Imaging Seeker anti-jamming performance evaluation method based on random forest |
CN109284382A (en) * | 2018-09-30 | 2019-01-29 | 武汉斗鱼网络科技有限公司 | A kind of file classification method and computing device |
Non-Patent Citations (2)
Title |
---|
JINLEE LEE ET AL.: ""Feature Selection Algorithm for Intrusions Detection System using Sequential Forward Search and Random Forest Classifier"", 《KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS》, pages 5132 - 5148 * |
张翕茜: ""非平衡大数据应用领域的多决策树及其分布式计算理论研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 01 * |
Also Published As
Publication number | Publication date |
---|---|
CN110390400B (en) | 2023-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109857667A (en) | Automatic interface testing method, test device, test equipment and storage medium | |
CN104050078B (en) | Test script generates system | |
CN109345388A (en) | Block chain intelligence contract verification method, device and storage medium | |
CN108282539A (en) | Decentralization storage system based on double-layer network | |
CN107292560A (en) | Material storage management method and system | |
CN109815228A (en) | Creation method, device, computer equipment and the readable storage medium storing program for executing of database table | |
CN107274023A (en) | Flow of insuring generation method, insure request processing method and device and electronic equipment | |
CN107015853A (en) | The implementation method and device of phased mission system | |
CN106502889A (en) | The method and apparatus of prediction cloud software performance | |
Brummitt et al. | Contagious disruptions and complexity traps in economic development | |
CN106095563A (en) | Physical function and virtual functions map flexibly | |
CN110019111A (en) | Data processing method, device, storage medium and processor | |
CN108268615A (en) | A kind of data processing method, device and system | |
CN106599291B (en) | Data grouping method and device | |
CN108304482A (en) | The recognition methods and device of broker, electronic equipment and readable storage medium storing program for executing | |
CN104950833A (en) | Production plan creation support method and production plan creation support apparatus | |
CN116932008B (en) | Method, device, equipment and medium for updating component data of virtual society simulation | |
CN113362162A (en) | Wind control identification method and device based on network behavior data, electronic equipment and medium | |
CN110390400A (en) | Feature generation method, device, electronic equipment and the storage medium of computation model | |
US20150134704A1 (en) | Real Time Analysis of Big Data | |
CN109584091B (en) | Generation method and device of insurance image file | |
CN116308826A (en) | Insurance product online method, apparatus, equipment and storage medium | |
CN105528718A (en) | Interaction method and device based on user grade | |
CN112988403B (en) | Integrated circuit simulation multithread management parallel method and device with security function | |
CN115660814A (en) | Risk prediction method and device, computer readable storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |