CN110390400A - Feature generation method, device, electronic equipment and the storage medium of computation model - Google Patents

Feature generation method, device, electronic equipment and the storage medium of computation model Download PDF

Info

Publication number
CN110390400A
CN110390400A CN201910596683.4A CN201910596683A CN110390400A CN 110390400 A CN110390400 A CN 110390400A CN 201910596683 A CN201910596683 A CN 201910596683A CN 110390400 A CN110390400 A CN 110390400A
Authority
CN
China
Prior art keywords
model
feature
foundation characteristic
foundation
forest model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910596683.4A
Other languages
Chinese (zh)
Other versions
CN110390400B (en
Inventor
李京昊
陈鹏程
陈金辉
朱晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN201910596683.4A priority Critical patent/CN110390400B/en
Publication of CN110390400A publication Critical patent/CN110390400A/en
Application granted granted Critical
Publication of CN110390400B publication Critical patent/CN110390400B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This application discloses feature generation method, device, electronic equipment and the storage mediums of computation model.The described method includes: obtaining sample data set and foundation characteristic collection;Sequential forest model is constructed according to the sample data set and the foundation characteristic collection;Wherein, different foundation characteristics is distributed for each forest model, the root node of each tree-model in same forest model uses the same foundation feature for forest model distribution as division feature;Assemblage characteristic and/or compound characteristics are obtained according to the sequential forest model that building is completed.The beneficial effect of the technical solution is, the intersection that feature is carried out based on sequential forest model is derivative, propose a kind of new feature generating mode, the more random forest of the selection of foundation characteristic has more controllability, the feature of generation is explanatory stronger, the possibility for generating repeated characteristic and useless feature is reduced, and each forest model can construct parallel, efficiency is very high.

Description

Feature generation method, device, electronic equipment and the storage medium of computation model
Technical field
This application involves machine learning fields, and in particular to the feature generation method of computation model, device, electronic equipment and Storage medium.
Background technique
Machine learning often selects to construct different calculating moulds in different fields based on the building of computation model Type.For example, being often used credit scoring card in financial air control field and user credit risk being modeled and assessed, in order to balance The interpretation and algorithm complexity of computation model, credit scoring card are often implemented on linear model.For linear model For, the interactive information between feature, therefore the Feature Engineering stage before constructing computation model can not be captured, it usually needs into The intersection of row feature is derivative, obtains assemblage characteristic (also referred to as cross feature, combined crosswise feature etc.).
In the prior art, there are some problems for mode derived from feature, such as: linear discriminant analysis LDA can not be captured Non-linear interactive information between feature;Complexity derived from violence intersection is excessively high, interpretation is poor;Gradient boosted tree GBDT, Random forest falls into local optimum, etc..Therefore, it is necessary to a kind of new feature generating modes.
Summary of the invention
In view of the above problems, it proposes on the application overcomes the above problem or at least be partially solved in order to provide one kind State feature generation method, device, electronic equipment and the storage medium of the computation model of problem.
According to the one aspect of the application, a kind of feature generation method of computation model is provided, comprising:
Obtain sample data set and foundation characteristic collection;
Sequential forest model is constructed according to the sample data set and the foundation characteristic collection;It wherein, is each forest model Different foundation characteristics is distributed, the root node of each tree-model in same forest model is same using distributing for the forest model One foundation characteristic is as division feature;
Assemblage characteristic and/or compound characteristics are obtained according to the sequential forest model that building is completed.
Optionally, described to distribute different foundation characteristics for each forest model and include:
Preset quantity wheel information is carried out to each foundation characteristic that the foundation characteristic is concentrated according to the sample data set to increase Benefit calculates, and after every wheel calculates, the maximum foundation characteristic of information gain which obtains is mentioned from the foundation characteristic collection It takes out, and assigns them to the forest model of a unassigned foundation characteristic.
Optionally, described to include: according to the sample data set and the sequential forest model of foundation characteristic collection building
To each tree-model in a forest model, determines divide feature used in each layer respectively;Wherein, one is being determined When dividing feature used in destination layer, according to the used division feature of layer each in each tree-model and the foundation characteristic collection, It determines target foundation characteristic list, information gain meter is carried out according to the sample data set and the target foundation characteristic list It calculates, obtains the maximum foundation characteristic of information gain as used in the destination layer and divide feature.
Optionally, described according to the used foundation characteristic of layer each in each tree-model and the foundation characteristic collection, determine mesh Marking foundation characteristic list includes:
If destination layer be the second layer, selection belong to the foundation characteristic collection but not by root node use and it is not same The foundation characteristic that the second layer of other tree-models uses in forest model is put into the target foundation characteristic list;
If destination layer is the second layer other layers below, selection belongs to the foundation characteristic collection and not by this tree-model Used foundation characteristic, be put into the target foundation characteristic list.
Optionally, the quantity of tree-model and/or the depth of tree-model are predetermined in the forest model.
Optionally, the sequential forest model completed according to building obtains assemblage characteristic and/or compound characteristics include:
Each feature combinatorial path according to determined by each tree-model is identified as corresponding one-dimensional assemblage characteristic.
Optionally, the sequential forest model completed according to building obtains assemblage characteristic and/or compound characteristics include:
Each tree-model is exported respectively as a corresponding compound characteristics.
According to the another aspect of the application, a kind of feature generating means of computation model are provided, comprising:
Acquiring unit, for obtaining sample data set and foundation characteristic collection;
Model construction unit, for constructing sequential forest model according to the sample data set and the foundation characteristic collection; Wherein, distribute different foundation characteristics for each forest model, the root node of each tree-model in same forest model using for The same foundation feature of forest model distribution is as division feature;
Feature generation unit, the sequential forest model for being completed according to building obtain assemblage characteristic and/or compound characteristics.
Optionally, the model construction unit, it is each for being concentrated according to the sample data set to the foundation characteristic Foundation characteristic carries out preset quantity wheel information gain and calculates, and after every wheel calculates, the information gain which is obtained is maximum Foundation characteristic extracted from the foundation characteristic collection, and assign them to the forest mould of a unassigned foundation characteristic Type.
Optionally, the model construction unit, for determining each layer institute respectively to each tree-model in a forest model The division feature used;Wherein, it when determining division feature used in a destination layer, has been used according to layer each in each tree-model Division feature and the foundation characteristic collection, target foundation characteristic list is determined, according to the sample data set and the target Foundation characteristic list carries out information gain calculating, obtains the maximum foundation characteristic of information gain and draws as used in the destination layer Dtex sign.
Optionally, the model construction unit, if being the second layer for destination layer, selection belongs to the foundation characteristic Collection but the foundation characteristic for not used and not used by the second layer of other tree-models in same forest model by root node, are put into The target foundation characteristic list;If destination layer be the second layer other layers below, selection belong to the foundation characteristic collection and Not by the used foundation characteristic of this tree-model, it is put into the target foundation characteristic list.
Optionally, the quantity of tree-model and/or the depth of tree-model are predetermined in the forest model.
Optionally, the feature generation unit, for each feature combinatorial path point according to determined by each tree-model It is not determined as corresponding one-dimensional assemblage characteristic.
Optionally, the feature generation unit, for being exported each tree-model respectively as a corresponding compound characteristics.
According to the another aspect of the application, a kind of electronic equipment is provided, comprising: processor;And it is arranged to store The memory of computer executable instructions, the executable instruction execute the processor such as any of the above-described institute The method stated.
According to the application's in another aspect, providing a kind of computer readable storage medium, wherein described computer-readable Storage medium stores one or more programs, and one or more of programs when being executed by a processor, are realized as any of the above-described The method.
It can be seen from the above, the technical solution of the application, after obtaining sample data set and foundation characteristic collection, according to sample number Sequential forest model is constructed according to collection and foundation characteristic collection;Wherein, different foundation characteristics, same forest are distributed for each forest model The root node of each tree-model in model uses the same foundation feature for forest model distribution as feature is divided, finally Assemblage characteristic and/or compound characteristics are obtained according to the sequential forest model that building is completed.The beneficial effect of the technical solution is, The intersection that feature is carried out based on sequential forest model is derivative, proposes a kind of new feature generating mode, the choosing of foundation characteristic More random forest is selected with more controllability, the feature of generation is explanatory stronger, reduces and generates repeated characteristic and useless feature May, and each forest model can construct parallel, and efficiency is very high.
Above description is only the general introduction of technical scheme, in order to better understand the technological means of the application, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects, features and advantages of the application can It is clearer and more comprehensible, below the special specific embodiment for lifting the application.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the application Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of process signal of the feature generation method of computation model according to the application one embodiment Figure;
Fig. 2 shows the structural representations according to the feature generating means of computation model of the application one embodiment a kind of Figure;
Fig. 3 shows the structural schematic diagram of the electronic equipment according to the application one embodiment;
Fig. 4 shows the structural schematic diagram of the computer readable storage medium according to the application one embodiment.
Specific embodiment
The exemplary embodiment of the application is more fully described below with reference to accompanying drawings.Although showing the application in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the application without should be by embodiments set forth here It is limited.It is to be able to thoroughly understand the application on the contrary, providing these embodiments, and can be by scope of the present application It is fully disclosed to those skilled in the art.
Computation model in the embodiment of the present application can be applied according to business demand in corresponding field, such as financial air control Field, specifically as credit scoring card mold type come using.
Before proposing sequential forest model, we analyze feature deriving mode in the prior art, specific next It says, the complexity that violence intersects derivative algorithm is excessively high, and can generate a large amount of useless assemblage characteristics;Linear character combinational algorithm Such as LDA, the non-linear interactive information between feature can not be captured;Nonlinear combination feature such as GBDT, the assemblage characteristic exported are That feature is constantly accumulated in together as a result, so obtained assemblage characteristic is excessively complicated, interpretation is poor, is not suitable in business Need strong explanatory scene;Excessively random when random forest selected characteristic, characteristic crossover process is uncontrollable, can also generate repetition Feature and useless feature;In addition, GBDT and random forest are all based on the thought of greed when constructing each tree, thus may Local optimum is fallen into, to lose the intersection information between certain slightly weak features.
The sequential forest model proposed based on above-mentioned analysis distributes different foundation characteristics for each forest model, same gloomy The root node of each tree-model in woods model use for the forest model distribution same foundation feature as division feature, from And making the selection of foundation characteristic that there is controllability, the feature of generation is explanatory stronger, and in the building of single tree also as far as possible It avoids falling into local optimum.It is specifically introduced below with reference to embodiment.
Fig. 1 shows a kind of process signal of the feature generation method of computation model according to the application one embodiment Figure.As shown in Figure 1, this method comprises:
Step S110 obtains sample data set and foundation characteristic collection.Such as sample data can be collage-credit data, carry The quality of tag identifier user.Accordingly, foundation characteristic may include whether it is any active ues, certain transaction features, geographical position Set etc., that is to say, that sample data set and foundation characteristic collection can be determined according to business scenario.It can additionally carry out The pretreatment such as feature branch mailbox.
Step S120 constructs sequential forest model according to sample data set and foundation characteristic collection;It wherein, is each forest model Different foundation characteristics is distributed, the root node of each tree-model in same forest model is same using distributing for the forest model One foundation characteristic is as division feature.
Step S130 obtains assemblage characteristic and/or compound characteristics according to the sequential forest model that building is completed.For obtaining Feature can also carry out feature selecting, such as select filtering type, packaging type or embedded realize.
As it can be seen that method shown in FIG. 1, the intersection that feature is carried out based on sequential forest model is derivative, proposes a kind of new Feature generating mode, the more random forest of the selection of foundation characteristic have more controllability, the feature of generation is explanatory stronger, reduce The possibility of repeated characteristic and useless feature is generated, and each forest model can construct parallel, efficiency is very high.
In one embodiment of the application, in the above method, distributing different foundation characteristics for each forest model includes: Preset quantity wheel information gain is carried out to each foundation characteristic that foundation characteristic is concentrated according to sample data set to calculate, and is calculated in every wheel After, the maximum foundation characteristic of information gain which obtains is extracted from foundation characteristic collection, and assign them to one The forest model of a unassigned foundation characteristic.
It is calculated for example, carrying out n wheel information gain, then final choice goes out n foundation characteristic, and accordingly building n Forest model.The value of n can determine according to actual needs, if excessive, basis used in forest model relatively rearward For the information gain of feature with regard to lower, actual effect is simultaneously bad, if too small can also be unable to get sufficient amount of feature.It can be seen that this Kind mode selects foundation characteristic to be allocated, and the controllability compared to random forest greatly enhances.
It is sequential according to sample data set and the building of foundation characteristic collection in the above method in one embodiment of the application Forest model includes: to determine divide feature used in each layer respectively to each tree-model in a forest model;Wherein, In When determining division feature used in a destination layer, according to the used division feature of layer each in each tree-model and foundation characteristic Collection, determines target foundation characteristic list, carries out information gain calculating according to sample data set and target foundation characteristic list, obtains The maximum foundation characteristic of information gain divides feature as used in the destination layer.
That is, being selected based on foundation characteristic collection, and consider each layer when choosing the division feature of a certain layer The foundation characteristic used, this makes it possible to determine the list of target foundation characteristic.Just pass through letter when so choosing new division feature It ceases gain to calculate, obtains the maximum foundation characteristic of information gain as used in the destination layer and divide feature.The application is implemented Information gain calculation method used in example can realize that the application does not limit this with reference to correlation technique in the prior art System.
In one embodiment of the application, in the above method, according to the used foundation characteristic of layer each in each tree-model With foundation characteristic collection, if determine target foundation characteristic list include: destination layer be the second layer, selection belong to foundation characteristic collection, But the foundation characteristic for not used and not used by the second layer of other tree-models in same forest model by root node, is put into mesh Mark foundation characteristic list;If destination layer is the second layer other layers below, selection belongs to foundation characteristic collection and not by this tree mould The used foundation characteristic of type is put into target foundation characteristic list.
It is introduced below with a specific example.Obtain sample data set D and foundation characteristic collection Lall={ x1,x2…, x6, it is readily appreciated that this is the simplification introduction to actual scene, when it is implemented, the quantity of foundation characteristic often will be more It is more.
Firstly, initialization Llevel1=empty, this set is the root node for recording each tree-model in a forest model Used division feature.Initialize Llevel2=empty, this set is second for recording each tree-model in the forest model Feature is divided used in layer.
It calculatesCalculate information The maximum feature x of gainlevel1=x1, then as the feature of the root node of all tree-models of the forest model, it is put into Llevel1, L at this timelevel1={ x1}。
It calculatesAs a result it is {x2,x3…,x6, calculate the maximum feature x of information gainlevel2=x2, L at this timelevel2={ x2};
Calculate Ldiff3={ xi|xi∈Lall&xi≠xlevel1&xi≠xlevel2, result is { x3…,x6, calculate information increasing The maximum feature x of benefitlevel3=x3
The feature combinatorial path x then exportednew=x1、x2、x3, this is to consider the case where depth is 3.
Similarly, subsequent to calculate again
The maximum feature x of gainlevel2=x3, L at this timelevel2={ x2, x3};
Ldiff3={ xi|xi∈Lall&xi≠xlevel1&xi≠xlevel2}={ x2,,x4…,x6, the maximum feature of gain xlevel3=x2
The feature combinatorial path x then exportednew=x1、x3、x2
Specific calculating process can be with Parallel Implementation.x1、x2、x3And x1、x3、x2It is two different cross features, because It is chosen into different layers, cut-point is different, solves the problems, such as the greedy of tree-model.
In one embodiment of the application, in the above method, the quantity of tree-model and/or tree-model in forest model Depth is predetermined.Depth is bigger, and the assemblage characteristic of final output is more complicated, and calculation amount is also more, and efficiency can also decline, It needs to be balanced in the specific implementation.The quantity bl of tree-model can be analogous to the quantity of forest model, because of feature selecting Similarly related with information gain, the more feature that generates of front is also more important, therefore bl value is without too big.
In one embodiment of the application, in the above method, combined according to the sequential forest model that building is completed Feature and/or compound characteristics include: to be identified as each feature combinatorial path according to determined by each tree-model accordingly One-dimensional assemblage characteristic.
For example, the feature combinatorial path x that front exportsnew=x1、x2、x3And the feature combinatorial path x of outputnew=x1、 x3、x2It is exactly the different assemblage characteristic of bidimensional.Accordingly, consider sample data and assemblage characteristic, xnewValue is 1, illustrates sample By the paths, and xnewValue is 0, illustrates sample without the paths.
In one embodiment of the application, in the above method, combined according to the sequential forest model that building is completed Feature and/or compound characteristics include: to export each tree-model respectively as a corresponding compound characteristics.Every feature combinatorial path For a branch mailbox bin of the compound characteristics.
Fig. 2 shows the structural representations according to the feature generating means of computation model of the application one embodiment a kind of Figure.As shown in Fig. 2, the feature generating means 200 of computation model include:
Acquiring unit 210, for obtaining sample data set and foundation characteristic collection.Such as sample data can be reference number According to carrying the quality of tag identifier user.Accordingly, foundation characteristic may include whether it is any active ues, certain transaction spy Sign, geographical location etc., that is to say, that sample data set and foundation characteristic collection can be determined according to business scenario.Furthermore It can also carry out the pretreatment such as feature branch mailbox.
Model construction unit 220, for constructing sequential forest model according to sample data set and foundation characteristic collection;Wherein, Different foundation characteristics is distributed for each forest model, it is the forest that the root node of each tree-model in same forest model, which uses, The same foundation feature of model distribution is as division feature.
Feature generation unit 230, the sequential forest model for being completed according to building obtain assemblage characteristic and/or compound spy Sign.
Feature selecting can also be carried out for obtained feature, such as select filtering type, packaging type or embedded realize.
As it can be seen that device shown in Fig. 2, the intersection that feature is carried out based on sequential forest model is derivative, proposes a kind of new Feature generating mode, the more random forest of the selection of foundation characteristic have more controllability, the feature of generation is explanatory stronger, reduce The possibility of repeated characteristic and useless feature is generated, and each forest model can construct parallel, efficiency is very high.
In one embodiment of the application, in above-mentioned apparatus, model construction unit 220, for according to sample data set It carries out preset quantity wheel information gain to each foundation characteristic that foundation characteristic is concentrated to calculate, after every wheel calculates, by the wheel The maximum foundation characteristic of obtained information gain is extracted from foundation characteristic collection, and assigns them to a unassigned basis The forest model of feature.
In one embodiment of the application, in above-mentioned apparatus, model construction unit 220, for a forest model In each tree-model, determine divide feature used in each layer respectively;Wherein, feature is divided used in a destination layer determining When, according to the used division feature of layer each in each tree-model and foundation characteristic collection, target foundation characteristic list is determined, according to sample Notebook data collection and target foundation characteristic list carry out information gain calculating, obtain the maximum foundation characteristic of information gain as the mesh It marks and divides feature used in layer.
In one embodiment of the application, in above-mentioned apparatus, model construction unit 220, if being second for destination layer Layer is then chosen and belongs to foundation characteristic collection but not by root node use and not by second of other tree-models in same forest model The foundation characteristic that layer uses, is put into target foundation characteristic list;If destination layer is the second layer other layers below, selection belongs to Foundation characteristic collection and not by the used foundation characteristic of this tree-model, is put into target foundation characteristic list.
In one embodiment of the application, in above-mentioned apparatus, the quantity of tree-model and/or tree-model in forest model Depth is predetermined.
In one embodiment of the application, in above-mentioned apparatus, feature generation unit 230, being used for will be according to each tree-model Identified each feature combinatorial path is identified as corresponding one-dimensional assemblage characteristic.
In one embodiment of the application, in above-mentioned apparatus, feature generation unit 230, for distinguishing each tree-model Output is a corresponding compound characteristics.
It should be noted that the specific embodiment of above-mentioned each Installation practice is referred to aforementioned corresponding method embodiment Specific embodiment carry out, details are not described herein.
In conclusion the technical solution of the application, after obtaining sample data set and foundation characteristic collection, according to sample data Collection and foundation characteristic collection construct sequential forest model;Wherein, different foundation characteristics, same forest mould are distributed for each forest model The root node of each tree-model in type uses the same foundation feature for forest model distribution as division feature, last root Assemblage characteristic and/or compound characteristics are obtained according to the sequential forest model that building is completed.The beneficial effect of the technical solution is, base It is derivative come the intersection for carrying out feature in sequential forest model, propose a kind of new feature generating mode, the selection of foundation characteristic More random forest has more controllability, and the feature of generation is explanatory stronger, reduce generate repeated characteristic and useless feature can Can, and each forest model can construct parallel, and efficiency is very high, the interactive information between feature can be excavated, in the structure of tree-model Local optimum problem caused by greedy thought can also be avoided when building.
It should be understood that
Algorithm and display be not inherently related to any certain computer, virtual bench or other equipment provided herein. Various fexible units can also be used together with teachings based herein.As described above, it constructs required by this kind of device Structure be obvious.In addition, the application is also not for any particular programming language.It should be understood that can use various Programming language realizes present context described herein, and the description done above to language-specific is to disclose this Shen Preferred forms please.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the application Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the application and help to understand one or more of the various inventive aspects, In Above in the description of the exemplary embodiment of the application, each feature of the application is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect Shield this application claims features more more than feature expressly recited in each claim.More precisely, as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as the separate embodiments of the application.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments means to be in the application's Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed Meaning one of can in any combination mode come using.
The various component embodiments of the application can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice Microprocessor or digital signal processor (DSP) realize the feature generating means according to the computation model of the embodiment of the present application In some or all components some or all functions.The application is also implemented as described herein for executing Some or all device or device programs (for example, computer program and computer program product) of method.In this way The program of realization the application can store on a computer-readable medium, or can have the shape of one or more signal Formula.Such signal can be downloaded from an internet website to obtain, and perhaps be provided on the carrier signal or with any other shape Formula provides.
For example, Fig. 3 shows the structural schematic diagram of the electronic equipment according to the application one embodiment.The electronic equipment 300 include processor 310 and the memory for being arranged to storage computer executable instructions (computer readable program code) 320.Memory 320 can be such as flash memory, EEPROM (electrically erasable programmable read-only memory), EPROM, hard disk or The electronic memory of ROM etc.Memory 320 has the computer stored for executing any method and step in the above method The memory space 330 of readable program code 331.For example, the memory space 330 for storing computer readable program code can be with Including being respectively used to realize each computer readable program code 331 of the various steps in above method.It is computer-readable Program code 331 can read or be written to this one or more calculating from one or more computer program product In machine program product.These computer program products include such as hard disk, the journey of compact-disc (CD), storage card or floppy disk etc Sequence code carrier.Such computer program product is usually computer readable storage medium described in such as Fig. 4.Fig. 4 is shown According to a kind of structural schematic diagram of the computer readable storage medium of the application one embodiment.The computer-readable storage medium Matter 400 is stored with for executing the computer readable program code 331 according to the present processes step, can be by electronic equipment 300 processor 310 is read, and when computer readable program code 331 is run by electronic equipment 300, leads to the electronic equipment 300 execute each step in method described above, specifically, the computer of the computer-readable recording medium storage Readable program code 331 can execute method shown in any of the above-described embodiment.Computer readable program code 331 can be with Appropriate form is compressed.
The application is limited it should be noted that above-described embodiment illustrates rather than the application, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The application can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame Claim.

Claims (10)

1. a kind of feature generation method of computation model characterized by comprising
Obtain sample data set and foundation characteristic collection;
Sequential forest model is constructed according to the sample data set and the foundation characteristic collection;Wherein, it is distributed for each forest model Different foundation characteristics, the root node of each tree-model in same forest model is using the same base for being forest model distribution Plinth feature is as division feature;
Assemblage characteristic and/or compound characteristics are obtained according to the sequential forest model that building is completed.
2. the method as described in claim 1, which is characterized in that described to distribute different foundation characteristic packets for each forest model It includes:
Preset quantity wheel information gain meter is carried out to each foundation characteristic that the foundation characteristic is concentrated according to the sample data set It calculates, after every wheel calculates, the maximum foundation characteristic of information gain which obtains is extracted from the foundation characteristic collection Come, and assigns them to the forest model of a unassigned foundation characteristic.
3. method according to claim 2, which is characterized in that described according to the sample data set and the foundation characteristic collection Constructing sequential forest model includes:
To each tree-model in a forest model, determines divide feature used in each layer respectively;
Wherein, when determining division feature used in a destination layer, according to the used division feature of layer each in each tree-model With the foundation characteristic collection, target foundation characteristic list is determined, arranged according to the sample data set and the target foundation characteristic Table carries out information gain calculating, obtains the maximum foundation characteristic of information gain as used in the destination layer and divides feature.
4. method as claimed in claim 3, which is characterized in that described according to the used foundation characteristic of layer each in each tree-model With the foundation characteristic collection, determine that target foundation characteristic list includes:
If destination layer is the second layer, selection belongs to the foundation characteristic collection but not by root node use and not by same forest The foundation characteristic that the second layer of other tree-models uses in model is put into the target foundation characteristic list;
If destination layer is the second layer other layers below, selection belongs to the foundation characteristic collection and is not made by this tree-model Used foundation characteristic is put into the target foundation characteristic list.
5. the method as described in claim 1, which is characterized in that the quantity and/or tree-model of tree-model in the forest model Depth be predetermined.
6. the method as described in claim 1, which is characterized in that the sequential forest model completed according to building is combined Feature and/or compound characteristics include:
Each feature combinatorial path according to determined by each tree-model is identified as corresponding one-dimensional assemblage characteristic.
7. the method as described in claim 1, which is characterized in that the sequential forest model completed according to building is combined Feature and/or compound characteristics include:
Each tree-model is exported respectively as a corresponding compound characteristics.
8. a kind of feature generating means of computation model characterized by comprising
Acquiring unit, for obtaining sample data set and foundation characteristic collection;
Model construction unit, for constructing sequential forest model according to the sample data set and the foundation characteristic collection;Wherein, Different foundation characteristics is distributed for each forest model, it is the forest that the root node of each tree-model in same forest model, which uses, The same foundation feature of model distribution is as division feature;
Feature generation unit, the sequential forest model for being completed according to building obtain assemblage characteristic and/or compound characteristics.
9. a kind of electronic equipment, wherein the electronic equipment includes: processor;And it is arranged to the executable finger of storage computer The memory of order, the executable instruction execute the processor as described in any one of claim 1-7 Method.
10. a kind of computer readable storage medium, wherein the computer-readable recording medium storage one or more program, One or more of programs when being executed by a processor, realize such as method of any of claims 1-7.
CN201910596683.4A 2019-07-02 2019-07-02 Feature generation method and device of computing model, electronic equipment and storage medium Active CN110390400B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910596683.4A CN110390400B (en) 2019-07-02 2019-07-02 Feature generation method and device of computing model, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910596683.4A CN110390400B (en) 2019-07-02 2019-07-02 Feature generation method and device of computing model, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110390400A true CN110390400A (en) 2019-10-29
CN110390400B CN110390400B (en) 2023-07-14

Family

ID=68286179

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910596683.4A Active CN110390400B (en) 2019-07-02 2019-07-02 Feature generation method and device of computing model, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110390400B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160210435A1 (en) * 2013-08-28 2016-07-21 Siemens Aktiengesellschaft Systems and methods for estimating physiological heart measurements from medical images and clinical data
CN107894827A (en) * 2017-10-31 2018-04-10 广东欧珀移动通信有限公司 Using method for cleaning, device, storage medium and electronic equipment
WO2018107492A1 (en) * 2016-12-16 2018-06-21 深圳大学 Intuitionistic fuzzy random forest-based method and device for target tracking
CN109145959A (en) * 2018-07-27 2019-01-04 东软集团股份有限公司 A kind of feature selection approach, device and equipment
CN109255393A (en) * 2018-09-30 2019-01-22 上海机电工程研究所 Infrared Imaging Seeker anti-jamming performance evaluation method based on random forest
CN109284382A (en) * 2018-09-30 2019-01-29 武汉斗鱼网络科技有限公司 A kind of file classification method and computing device
CN109408531A (en) * 2018-09-25 2019-03-01 平安科技(深圳)有限公司 The detection method and device of slow drop type data, electronic equipment, storage medium
CN109697447A (en) * 2017-10-20 2019-04-30 富士通株式会社 Disaggregated model construction device, method and electronic equipment based on random forest
CN109947498A (en) * 2017-12-20 2019-06-28 广东欧珀移动通信有限公司 Application program preloads method, apparatus, storage medium and mobile terminal
CN109947497A (en) * 2017-12-20 2019-06-28 广东欧珀移动通信有限公司 Application program preloads method, apparatus, storage medium and mobile terminal

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160210435A1 (en) * 2013-08-28 2016-07-21 Siemens Aktiengesellschaft Systems and methods for estimating physiological heart measurements from medical images and clinical data
WO2018107492A1 (en) * 2016-12-16 2018-06-21 深圳大学 Intuitionistic fuzzy random forest-based method and device for target tracking
CN109697447A (en) * 2017-10-20 2019-04-30 富士通株式会社 Disaggregated model construction device, method and electronic equipment based on random forest
CN107894827A (en) * 2017-10-31 2018-04-10 广东欧珀移动通信有限公司 Using method for cleaning, device, storage medium and electronic equipment
CN109947498A (en) * 2017-12-20 2019-06-28 广东欧珀移动通信有限公司 Application program preloads method, apparatus, storage medium and mobile terminal
CN109947497A (en) * 2017-12-20 2019-06-28 广东欧珀移动通信有限公司 Application program preloads method, apparatus, storage medium and mobile terminal
CN109145959A (en) * 2018-07-27 2019-01-04 东软集团股份有限公司 A kind of feature selection approach, device and equipment
CN109408531A (en) * 2018-09-25 2019-03-01 平安科技(深圳)有限公司 The detection method and device of slow drop type data, electronic equipment, storage medium
CN109255393A (en) * 2018-09-30 2019-01-22 上海机电工程研究所 Infrared Imaging Seeker anti-jamming performance evaluation method based on random forest
CN109284382A (en) * 2018-09-30 2019-01-29 武汉斗鱼网络科技有限公司 A kind of file classification method and computing device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JINLEE LEE ET AL.: ""Feature Selection Algorithm for Intrusions Detection System using Sequential Forward Search and Random Forest Classifier"", 《KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS》, pages 5132 - 5148 *
张翕茜: ""非平衡大数据应用领域的多决策树及其分布式计算理论研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 01 *

Also Published As

Publication number Publication date
CN110390400B (en) 2023-07-14

Similar Documents

Publication Publication Date Title
CN109857667A (en) Automatic interface testing method, test device, test equipment and storage medium
CN104050078B (en) Test script generates system
CN109345388A (en) Block chain intelligence contract verification method, device and storage medium
CN108282539A (en) Decentralization storage system based on double-layer network
CN107292560A (en) Material storage management method and system
CN109815228A (en) Creation method, device, computer equipment and the readable storage medium storing program for executing of database table
CN107274023A (en) Flow of insuring generation method, insure request processing method and device and electronic equipment
CN107015853A (en) The implementation method and device of phased mission system
CN106502889A (en) The method and apparatus of prediction cloud software performance
Brummitt et al. Contagious disruptions and complexity traps in economic development
CN106095563A (en) Physical function and virtual functions map flexibly
CN110019111A (en) Data processing method, device, storage medium and processor
CN108268615A (en) A kind of data processing method, device and system
CN106599291B (en) Data grouping method and device
CN108304482A (en) The recognition methods and device of broker, electronic equipment and readable storage medium storing program for executing
CN104950833A (en) Production plan creation support method and production plan creation support apparatus
CN116932008B (en) Method, device, equipment and medium for updating component data of virtual society simulation
CN113362162A (en) Wind control identification method and device based on network behavior data, electronic equipment and medium
CN110390400A (en) Feature generation method, device, electronic equipment and the storage medium of computation model
US20150134704A1 (en) Real Time Analysis of Big Data
CN109584091B (en) Generation method and device of insurance image file
CN116308826A (en) Insurance product online method, apparatus, equipment and storage medium
CN105528718A (en) Interaction method and device based on user grade
CN112988403B (en) Integrated circuit simulation multithread management parallel method and device with security function
CN115660814A (en) Risk prediction method and device, computer readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant