CN110334720A - Feature extracting method, device, server and the storage medium of business datum - Google Patents
Feature extracting method, device, server and the storage medium of business datum Download PDFInfo
- Publication number
- CN110334720A CN110334720A CN201810289688.8A CN201810289688A CN110334720A CN 110334720 A CN110334720 A CN 110334720A CN 201810289688 A CN201810289688 A CN 201810289688A CN 110334720 A CN110334720 A CN 110334720A
- Authority
- CN
- China
- Prior art keywords
- rule
- business datum
- target
- feature
- dimensionality reduction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Abstract
The embodiment of the invention discloses a kind of feature extracting method of business datum, device, server and storage mediums, this method comprises: at least one in the target code rule of determining business datum, target normalization rule and target dimensionality reduction rule;Wherein, the target code rule is determined from each candidate code rule being provided previously, the target normalization rule is determined from each candidate feature normalization rule being provided previously, and the target dimensionality reduction rule is determined from each candidate dimensionality reduction rule being provided previously;According at least one in the target code rule of the business datum, target normalization rule and target dimensionality reduction rule, the feature vector of the business datum is determined.Automatically generating for the corresponding feature vector of business datum can be completed by the configuration parameter of modification Feature Engineering in the embodiment of the present invention.Modularization, automation and the reusability for realizing Feature Engineering improve the formation efficiency and accuracy of feature vector.
Description
Technical field
The present embodiments relate to machine learning techniques field more particularly to a kind of feature extracting method of business datum,
Device, server and storage medium.
Background technique
With the continuous development that computer technology and big data are applied, more and more technical fields can all be based on big data
Carry out machine learning modeling makes each electronic product provide more humane user experience to imitate the thoughtcast of the mankind.
The premise of machine learning modeling is to handle business datum, obtains after simplifying and can represent business number completely
According to the feature vector of feature.Based on feature vector carry out machine learning model building, improve model building efficiency and
Accuracy.Existing machine learning Modeling Platform provides the pattern manipulation interface developed convenient for research staff, researches and develops people
Although member does not have to write a large amount of program code progress business data processing, feature vector is obtained carrying out Feature Engineering
Process, to business datum do field feature extract, the operation such as feature coding and dimensionality reduction when, it is still necessary to according to service concept manually by
One processing, manually carries out the Feature Engineering of feature coding, normalization and dimensionality reduction.
However, the artificial treatment mode limitation of Feature Engineering is very big.The lesser business datum of characteristic dimension is also artificial
In the range of being capable of handling, once but characteristic dimension increase, the artificial mode for carrying out Feature Engineering will expend a large amount of manpowers and
Time, and user needs repeatedly to attempt Feature Engineering each method with Optimized model.Sample data unbalanced or abnormal simultaneously is also
It can have a adverse impact to modeling effect.And then research staff needs to take much time and does the Feature Engineering and sample of repeatability
This analysis can not quickly meet business demand and model iteration so that the model online period is very long.
Summary of the invention
The embodiment of the invention provides a kind of feature extracting method of business datum, device, server and storage medium, energy
Modularization, automation and the reusability for enough realizing Feature Engineering, improve the formation efficiency and accuracy of feature vector.
In a first aspect, the embodiment of the invention provides a kind of feature extracting methods of business datum, comprising:
Determine at least one in the target code rule, target normalization rule and target dimensionality reduction rule of business datum;
And
According at least one in the target code rule of the business datum, target normalization rule and target dimensionality reduction rule
, determine the feature vector of the business datum.
Second aspect, the embodiment of the invention provides a kind of feature deriving means of business datum, comprising:
Rule configuration module, for determining the target code rule, target normalization rule and target dimensionality reduction of business datum
At least one of in rule;
Feature generation module, for target code rule, target normalization rule and the target according to the business datum
At least one of in dimensionality reduction rule, determine the feature vector of the business datum.
The third aspect, the embodiment of the invention provides a kind of servers, comprising:
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing
Device realizes the feature extracting method of business datum described in any embodiment of that present invention.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage mediums, are stored thereon with computer journey
Sequence realizes the feature extracting method of business datum described in any embodiment of that present invention when the program is executed by processor.
The embodiment of the present invention is by determining business datum according to the configuration parameter for modifying Feature Engineering the characteristics of business datum
Target code rule, in target normalization rule and target dimensionality reduction rule at least one of, generate industry according to the rule determined
The feature vector for data of being engaged in.The embodiment of the present invention makes research staff only need to be from the allocation optimum angle that operational angle or system generate
The configuration parameter of degree modification Feature Engineering, business datum is associated with Feature Engineering, the corresponding feature of business datum can be completed
Vector automatically generates.Modularization, automation and the reusability for realizing Feature Engineering, improve the formation efficiency of feature vector
And accuracy.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the feature extracting method for business datum that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of the feature extracting method of business datum provided by Embodiment 2 of the present invention;
Fig. 3 is the exemplary diagram of each characteristic processing links configurable rule provided by Embodiment 2 of the present invention;
Fig. 4 is the process of the model training mode provided by Embodiment 2 of the present invention that platform is automated based on Feature Engineering
Figure;
Fig. 5 is a kind of structural schematic diagram of the feature deriving means for business datum that the embodiment of the present invention three provides;
Fig. 6 is a kind of structural schematic diagram for server that the embodiment of the present invention four provides.
Specific embodiment
The embodiment of the present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this
Locate described specific embodiment to be used only for explaining the present invention rather than limiting the invention.It also should be noted that
For ease of description, only parts related to embodiments of the present invention are shown in attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is a kind of flow chart of the feature extracting method for business datum that the embodiment of the present invention one provides, the present embodiment
It is applicable to carry out the case where Feature Engineering is to generate feature vector to business datum, this method can be mentioned by the feature of business datum
Device is taken to execute.This method specifically comprises the following steps:
In S110, the target code rule for determining business datum, target normalization rule and target dimensionality reduction rule at least
One.
In the specific embodiment of the invention, business datum refers to the data to be analyzed for constructing machine learning model, example
Such as the registration user information of certain website, the data informations such as age, occupation and income comprising each user can pass through crawler technology
Or the various ways such as access database obtain required business datum.
Target code rule, target normalization rule and target dimensionality reduction rule, which refer to, targetedly matches for the business datum
The data processing rule set.Wherein, target code rule defines the feature coding method of the business datum.Target code rule
It realizes and converts the numerical identity that computer can identify for the attributive character of original business datum.Such as One-Hot coding
That is the various features coding methods such as an efficient coding, numerical value mapping and Interval Maps.Target normalization rule defines this
The feature normalization method of business datum.Target normalization rule is realized characteristic bi-directional scaling, is allowed to fall into one
Small specific sections remove the unit limitation of data, nondimensional pure values are translated into, convenient for not commensurate or magnitude
Index, which is able to carry out, to be compared and weights.Such as most typical data normalization processing is exactly that data are uniformly mapped to [0,1] area
Between on.Target dimensionality reduction rule defines the feature dimension reduction method of the business datum.Target dimensionality reduction rule is realized to original feature
It is reconfigured or is deleted, to reduce the dimension of feature, reduced and machine mould is produced since intrinsic dimensionality is excessive or redundancy
Raw undesirable influence.Such as principal component analytical method (Principal Components Analysis, PCA) turns multi objective
Turn to a few overall target.
In one embodiment, target code rule can be determined from each candidate code rule being provided previously, target
What normalization rule can be determined from each candidate feature normalization rule being provided previously, similarly target dimensionality reduction rule can also be with
It is determined from each candidate dimensionality reduction rule being provided previously.Wherein, candidate code rule, candidate feature normalization rule and candidate drop
Dimension rule can be using service fields belonging to business datum and/or business scenario as foundation, packed for research staff in advance
Selection.
It is worth noting that, system models experience according to history, it is characterized the configuration mode of engineering installation default.According to
When carrying out the rule configuration of Feature Engineering according to business datum, it is thus necessary to determine that the target code rule of business datum, target normalization
At least one of in rule and target dimensionality reduction rule, so that Feature Engineering can be handled data according to the rule of configuration.
Illustratively, system models experience according to history, is characterized the configuration mode of engineering installation default, age characteristics
Using Interval Maps coding method, each age range is converted into corresponding scalar value.For example, indicating age area using numerical value 1
Between [0,18), using numerical value 2 indicate age range [18,30), using numerical value 3 indicate age range [30,40), and so on;
Job characteristics use One-Hot coding method, such as there are three kinds of occupations to include [teacher, doctor, police], then One-Hot is compiled
After code, [1,0,0] indicates teacher, and [0,1,0] indicates doctor, and [0,0,1] indicates police;Feature income and deposit are graceful using Aunar
Z-score model carries out feature normalization.When research staff models, it can be modified and be configured according to own service.Meanwhile being
System can automatic " trial " other configurations.For example, modification age range mapping relations be using numerical value 1 indicate age range [0,
25), using numerical value 2 indicate age range [25,45), using numerical value 3 indicate age range [45,65), and so on;It is professional special
Sign modification is encoded using evidence weight (Weight of Evidence, WOE);Income and deposit feature use Min-Max i.e. most
Small-maximum specification method carries out the normalized of feature.
In S120, the target code rule according to business datum, target normalization rule and target dimensionality reduction rule at least
One, determine the feature vector of business datum.
In the specific embodiment of the invention, carry out Feature Engineering rule with postponing, system can be according to being configured
Rule realize to the automation characteristic processing of business datum.Wherein, automation characteristic processing generally comprises following four step.
The first step is data prediction:, equally can be corresponding for data prediction configuration with the configuration flow of above-mentioned rule
Preprocessing rule, using data prediction engine implementation to the filtering of the ranks of business datum and data cleansing.Wherein, row can be with
The classification of feature is represented, different rows indicates different data attributes;Corresponding column can represent feature, different data attribute because
Personal feature and have differences, vice versa.By configuring ranks filtering screening condition, all characteristics are read out and
Screening, to remove abnormal data and junk data.
Illustratively, by taking the user data of certain bank in 2017 makees air control model as an example, user service data include gender,
Age, educational background, occupation, income, consumption, real estate, debt etc., research staff can configure the reasonable interval of each field, such as
Age is [0,120], and gender must be male or female, and income is not negative.And then it can be abnormal data, such as year in data cleansing
Age is 200 years old or data and junk data of the income for -5 ten thousand, for example, repeated data or gender field be empty data into
Row is removed.
Second step is characterized coding:, will be original using feature coding engine implementation by the target code rule configured
The attributive character of business datum is converted into the numerical identity that computer can identify.Illustratively, sex character uses One-Hot
Coding method, i.e., for two kinds of genders, gender male is 10 after coding, and gender female is 01.Academic feature uses the volume of numerical value mapping
Code method, educational background include following five kinds, i.e., [senior middle school is hereinafter, training, undergraduate course, master, doctor], are corresponding in turn to scalar number after coding
It is worth [1,2,3,4,5].Age characteristics use Interval Maps method, i.e., using numerical value 1 indicate age range [0,18), using numerical value
2 indicate age ranges [18,30), using numerical value 3 indicate age range [30,40), then the age 25 can be encoded to 2.
Third step is characterized normalization: rule is normalized by the target configured, it will using feature normalization engine implementation
Characteristic bi-directional scaling is simultaneously converted into nondimensional pure values.For example, income and deposit feature use the graceful Z-score of Aunar
Model carries out feature normalization.
4th step is characterized dimensionality reduction: by the target dimensionality reduction rule configured, using Feature Dimension Reduction selection engine implementation to original
Some features are reconfigured or are deleted, to reduce the dimension of feature.Illustratively, according to PCA and factor-analysis approach,
It can be configured to 10 characteristic synthetics, 4 character representations, and business datum changed into 4 dimensional features, wherein combination of multiple features can
It is configured to occupation and income one feature of synthesis, such as booming income occupation.Simultaneously can be according to the importance of feature, it will be to mould
Type influences lesser feature and leaves out.
The present embodiment can be advised by the characteristic processing process of aforementioned four step according to the items of configuration feature engineering
Then, the automation characteristic processing to business datum is realized.
The technical solution of the present embodiment, by determining according to the configuration parameter for modifying Feature Engineering the characteristics of business datum
Target code rule, the target of business datum normalize at least one in rule and target dimensionality reduction rule, according to determining rule
Then generate the feature vector of business datum.The embodiment of the present invention generate research staff only need to from operational angle or system most
The configuration parameter of excellent arrangement angles modification Feature Engineering, business datum is associated with Feature Engineering, business datum pair can be completed
The feature vector answered automatically generates.Modularization, automation and the reusability for realizing Feature Engineering, improve feature vector
The manpower and time cost of Feature Engineering is greatly decreased in formation efficiency and accuracy.
Embodiment two
The present embodiment on the basis of the above embodiment 1, provides one of the feature extracting method of a kind of business datum
Preferred embodiment, the feature vector that can be generated using automation carry out the building of machine learning model.Fig. 2 is that the present invention is real
A kind of flow chart of the feature extracting method of business datum of the offer of example two is applied, as shown in Fig. 2, this method includes walking in detail below
It is rapid:
Service fields and/or business scenario belonging to S210, foundation business datum provide candidate code rule for business datum
Then, at least one in candidate feature normalization rule and candidate dimensionality reduction rule.
In the specific embodiment of the invention, service fields and business scenario refer to the attributive character of business datum, wherein industry
Business field refers to that attribute of business itself, business scenario refer to the attributive character of business institute application scenarios.Integrated Services Digital institute
The service fields and/or business scenario of category sum up business datum feature, so as to provide time on this basis for business datum
Select at least one in coding rule, candidate feature normalization rule and candidate dimensionality reduction rule.Illustratively, according to characteristic processing
Process can configure preprocessing rule, such as ranks filtering rule and data cleaning rule etc. for data prediction engine;It can be with
It is characterized coding engine configuration codes rule, such as One-Hot coding, numerical value mapping, Interval Maps, WOE, Logistic recurrence
(LOG) and numerical discretization etc.;Normalization engine configuration normalization rule, such as Min-Max scaling, Z- can be characterized
Score, data normalization and two-value conversion etc.;Can be characterized dimensionality reduction selection engine configuration dimensionality reduction rule, such as PCA, because
Son analysis, combination of multiple features and feature importance etc..These candidate rules can be with service fields belonging to business datum
And/or business scenario be foundation, in advance it is packed for research staff selection.
In S220, the target code rule for determining business datum, target normalization rule and target dimensionality reduction rule at least
One.
In the specific embodiment of the invention, the configuration of Feature Engineering parameter may include traffic data field definition, business
Scene, engine parameter of regularity and model score.Wherein, traffic data field definition and business scenario can be at business datums
The selection gist of configuration rule is provided when reason, in traffic data field definition and the associated candidate rule of business scenario, for spy
Sign handles each engine and configures corresponding parameter of regularity.The model quality number of historical machine learning model feedback can also be received simultaneously
According to i.e. model score, with this configuration parameter optimal according to modeling effect selection.System models experience according to history, is characterized work
Journey is provided with the configuration mode of default.When carrying out the rule configuration of Feature Engineering according to business datum, need from candidate rule
Target code rule, the target of middle determining business datum normalize at least one in rule and target dimensionality reduction rule, so that special
Sign engineering can be handled data according to the rule of configuration.
In S230, the target code rule according to business datum, target normalization rule and target dimensionality reduction rule at least
One, determine the feature vector of business datum.
In the specific embodiment of the invention, carry out Feature Engineering rule with postponing, system can be according to being configured
Rule realize to the automation characteristic processing of business datum.Wherein, automation characteristic processing generally comprises following four step,
Including data prediction, feature coding, feature normalization and Feature Dimension Reduction.According to the items rule of research staff's configuration to industry
Business data are handled, and to generate the feature vector that can sufficiently represent business datum, are modeled and are used for the later period.
S240, machine learning model is constructed using the feature vector of business datum.
In the specific embodiment of the invention, by the Feature Engineering of modularization, automation and reusability, it is embodied as business
The corresponding characteristic processing rule of data configuration, according to determining target code rule, target normalization rule and target dimensionality reduction
At least one of in rule, the feature vector of business datum is efficiently generated, and accuracy rate is higher.Finally to automate generation
The building of feature vector progress machine learning model.
Preferably, the balanced rule of target sample of business datum is determined from the balanced rule of candidate samples being provided previously;
Business datum is screened using target sample balanced rule;Machine is constructed using the corresponding feature vector of the business datum of screening
Device learning model.
In the specific embodiment of the invention, other than aforementioned four characteristic processing, before carrying out model construction, also need
Unbalanced sample is handled.It equally can be sample regulating allocation with the configuration flow of above-mentioned characteristic processing rule
Corresponding rule adjusts engine implementation using sample and screens to unbalanced business datum.Specifically, imbalanced training sets are
Refer to that there are the sample sizes under some or certain feature classifications to be much larger than the sample size under other feature classifications in sample, i.e.,
Sample size in sample set under each feature classification differs greatly, and is unable to satisfy the building requirement of model.Therefore in building mould
Before type, in order to reach the building effect of better machine learning model, need to handle imbalanced training sets problem.Show
Example property, for the process flow of business datum before model construction, the rule that can configure in each link is as shown in Figure 3.Wherein, number
Candidate rule in Data preprocess, feature coding, feature normalization and Feature Dimension Reduction four processes is as described above, herein no longer
It repeats.And it is directed to the processing of imbalanced training sets, stochastical sampling method or synthesis minority class oversampling technique can be configured
(Synthetic Minority Oversampling Technique, SMOTE) and editor arest neighbors (Edited Nearest
Neighbor, ENN) one or both of combine method realize sample equilibrium.It finally, is equilibrium using the business datum of screening
The corresponding feature vector of sample data construct machine learning model.
If the quality of S250, machine learning model is higher than the historical machine learning model of business datum, by the machine
In learning model associated target code rule, target normalization rule and target dimensionality reduction rule at least one of be updated to it is described
The default configuration rule of business datum.
In the specific embodiment of the invention, by above-mentioned five characteristic processing steps, finally obtained business datum will be used
Feature automation platform is fed back in training Optimized model, and according to modelling effect, updates the default configuration rule of each engine.?
After model construction, if more preferable by the modelling effect that the feature vector that new configuration rule generates trains, i.e., newest building
Machine learning model quality be higher than the business datum historical machine learning model, then by the new configuration rule i.e. new engine
At least one in the associated target code rule of learning model, target normalization rule and target dimensionality reduction rule is updated to the industry
The default configuration rule for data of being engaged in is come directly to do feature work according to default configuration rule when the new business data of the business next time
Journey improves formation efficiency and the accuracy of feature vector.
In conclusion the process of the model training mode based on Feature Engineering automation platform is as shown in Figure 4.Research staff
Business datum and its service fields and/business scenario and Feature Engineering need to only be bound, Feature Engineering is in candidate configuration rule
On the basis of, it carries out artificial setting configuration parameter or carries out automatic setting configuration parameter according to default configuration rule, so that feature
Engineering automates platform and is handled according to the characteristic that automation can be realized in the data processing rule of business datum and configuration, from
And the corresponding feature vector of business datum for being able to carry out model training is efficiently obtained, and the accuracy of feature vector is higher.
The final training and assessment that model is carried out according to feature vector, and modelling effect is fed back into feature automation platform, it obtains most
The corresponding configuration parameter of excellent model is updated to the default configuration rule of each engine.
The technical solution of the present embodiment is in advance industry by the service fields according to belonging to business datum and business scenario
Business data provide the corresponding multiple candidate rules of characteristic processing links, thus in the spy automated to business datum
When levying vector generation, any rule is picked out from candidate rule and is configured, realize special according to modifying the characteristics of business datum
The configuration parameter for levying engineering, the feature vector according to the rule generation business datum determined;And sample data is carried out equal
Weighing apparatus processing carries out machine learning model building using the corresponding feature vector of the balanced sample filtered out;Finally by modelling effect
Feature Engineering automation platform is fed back to, such business datum is set for the corresponding configuration rule of the highest model of quality with this
Default configuration rule.
The embodiment of the present invention reduces the machine learning the relevant technologies threshold of modeling personnel, so that related service personnel may be used
To carry out the excavation and modeling of data, it is only necessary to which the allocation optimum angle modification Feature Engineering generated from operational angle or system is matched
Parameter is set, business datum is associated with Feature Engineering, automatically generating for the corresponding feature vector of business datum can be completed;And it will
The corresponding configuration parameter of the modeling preferable feature vector of effect updates the configuration parameter for being set as Feature Engineering default.Realize spy
Modularization, automation and the reusability for levying engineering, improve the formation efficiency and accuracy of feature vector, feature work are greatly decreased
The manpower and time cost of journey, the building effect and efficiency of lift scheme.
Embodiment three
Fig. 5 is a kind of structural schematic diagram of the feature deriving means for business datum that the embodiment of the present invention three provides, this reality
It applies example to be applicable to carry out business datum the case where Feature Engineering is to generate feature vector, which can realize of the invention any
The feature extracting method of business datum described in embodiment.The device specifically includes:
Rule configuration module 510, for determining the target code rule, target normalization rule and target drop of business datum
At least one of in dimension rule;Wherein, the target code rule is determined from each candidate code rule being provided previously,
The target normalization rule is determined from each candidate feature normalization rule being provided previously, the target dimensionality reduction rule
It is to be determined from each candidate dimensionality reduction rule being provided previously;
Feature generation module 520, for target code rule, target normalization rule and the mesh according to the business datum
At least one in dimensionality reduction rule is marked, determines the feature vector of the business datum.
Further, described device includes:
Regular supply module 530, for the determining business datum target code rule, target normalization rule and
Before at least one in target dimensionality reduction rule, according to service fields belonging to the business datum and/or business scenario, for institute
State at least one during business datum offer candidate code is regular, the regular and candidate dimensionality reduction of candidate feature normalization is regular.
Further, described device further include:
Model construction module 540, in the target code rule according to the business datum, target normalization rule
Then in target dimensionality reduction rule at least one of, after the feature vector for determining the business datum, using the business datum
Feature vector construct machine learning model;
Default rule update module 550, if the quality for the machine learning model is higher than going through for the business datum
History machine learning model then drops the associated target code rule of the machine learning model, target normalization rule and target
At least one in dimension rule is updated to the default configuration rule of the business datum.
Preferably, the model construction module 540, comprising:
Balanced rule determination unit, for determining the business datum from the balanced rule of candidate samples being provided previously
The balanced rule of target sample;
Data screening unit, for being screened using the balanced rule of the target sample to the business datum;
Model construction unit constructs machine learning model for the corresponding feature vector of business datum using screening.
The technical solution of the present embodiment realizes Feature Engineering parameter by the mutual cooperation between each functional module
Configuration, being associated with of business datum and Feature Engineering, the automation of feature vector generates, the building of machine learning model, model
The functions such as the update of feedback and Feature Engineering default configuration of effect.The embodiment of the present invention makes research staff only need to be from business
The configuration parameter for the allocation optimum angle modification Feature Engineering that angle or system generate, business datum is associated with Feature Engineering,
Automatically generating for the corresponding feature vector of business datum can be completed;And the corresponding configuration of the preferable feature vector of effect will be modeled
Parameter updates the configuration parameter for being set as Feature Engineering default.Modularization, automation and the reusability of Feature Engineering are realized,
The formation efficiency and accuracy for improving feature vector, are greatly decreased the manpower and time cost of Feature Engineering, the structure of lift scheme
Build effect and efficiency.
Example IV
Fig. 6 is a kind of structural schematic diagram for server that the embodiment of the present invention four provides, and Fig. 6, which is shown, to be suitable for being used to realizing
The block diagram of the exemplary servers of embodiment of the embodiment of the present invention.The server that Fig. 6 is shown is only an example, should not be right
The function and use scope of the embodiment of the present invention bring any restrictions.
The server 12 that Fig. 6 is shown is only an example, should not function and use scope band to the embodiment of the present invention
Carry out any restrictions.
As shown in fig. 6, server 12 is showed in the form of universal computing device.The component of server 12 may include but not
Be limited to: one or more processor or processing unit 16, system storage 28 connect different system components (including system
Memory 28 and processing unit 16) bus 18.
Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts
For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC)
Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.
Server 12 typically comprises a variety of computer system readable media.These media can be and any can be serviced
The usable medium that device 12 accesses, including volatile and non-volatile media, moveable and immovable medium.
System storage 28 may include the computer system readable media of form of volatile memory, such as arbitrary access
Memory (RAM) 30 and/or cache memory 32.Server 12 may further include other removable/nonremovable
, volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing not removable
Dynamic, non-volatile magnetic media (Fig. 6 do not show, commonly referred to as " hard disk drive ").Although being not shown in Fig. 6, can provide
Disc driver for being read and write to removable non-volatile magnetic disk (such as " floppy disk "), and to removable anonvolatile optical disk
The CD drive of (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, each driver can
To be connected by one or more data media interfaces with bus 18.Memory 28 may include at least one program product,
The program product has one group of (for example, at least one) program module, these program modules are configured to perform the embodiment of the present invention
The function of each embodiment.
Program/utility 40 with one group of (at least one) program module 42 can store in such as memory 28
In, such program module 42 include but is not limited to operating system, one or more application program, other program modules and
It may include the realization of network environment in program data, each of these examples or certain combination.Program module 42 is usual
Execute the function and/or method in described embodiment of the embodiment of the present invention.
Server 12 can also be logical with one or more external equipments 14 (such as keyboard, sensing equipment, display 24 etc.)
Letter, can also be enabled a user to one or more equipment interact with the server 12 communicate, and/or with make the server
The 12 any equipment (such as network interface card, modem etc.) that can be communicated with one or more of the other calculating equipment communicate.
This communication can be carried out by input/output (I/O) interface 22.Also, server 12 can also pass through network adapter 20
With one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.
As shown, network adapter 20 is communicated by bus 18 with other modules of server 12.It should be understood that although not showing in figure
Out, can in conjunction with server 12 use other hardware and/or software module, including but not limited to: microcode, device driver,
Redundant processing unit, external disk drive array, RAID system, tape drive and data backup storage system etc..
Processing unit 16 by the program that is stored in system storage 28 of operation, thereby executing various function application and
Data processing, such as realize the feature extracting method of business datum provided by the embodiment of the present invention.
Embodiment five
The embodiment of the present invention five also provides a kind of computer readable storage medium, be stored thereon with computer program (or
For computer executable instructions), it, should for executing a kind of feature extracting method of business datum when which is executed by processor
Method includes:
Determine at least one in the target code rule, target normalization rule and target dimensionality reduction rule of business datum;
And
According at least one in the target code rule of the business datum, target normalization rule and target dimensionality reduction rule
, determine the feature vector of the business datum.
The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media
Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable
Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or
Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool
There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires
(ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-
ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage
Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device
Using or it is in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited
In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
Can with one or more programming languages or combinations thereof come write for execute the embodiment of the present invention operation
Computer program code, described program design language include object oriented program language-such as Java,
Smalltalk, C++, further include conventional procedural programming language-such as " C " language or similar program design language
Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence
Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or
It is executed on server.In situations involving remote computers, remote computer can pass through the network of any kind --- packet
It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit
It is connected with ISP by internet).
Note that above are only the preferred embodiment and institute's application technology principle of the embodiment of the present invention.Those skilled in the art
It will be appreciated that the embodiment of the present invention is not limited to specific embodiment described here, it is able to carry out for a person skilled in the art each
The apparent variation of kind, readjustment and the protection scope substituted without departing from the embodiment of the present invention.Therefore, although more than passing through
Embodiment is described in further detail the embodiment of the present invention, but the embodiment of the present invention is not limited only to the above implementation
Example can also include more other equivalent embodiments in the case where not departing from design of the embodiment of the present invention, and the present invention is implemented
The range of example is determined by the scope of the appended claims.
Claims (10)
1. a kind of feature extracting method of business datum characterized by comprising
Determine at least one in the target code rule, target normalization rule and target dimensionality reduction rule of business datum;And
Target code rule, target according to the business datum normalize at least one in rule and target dimensionality reduction rule,
Determine the feature vector of the business datum.
2. the method according to claim 1, wherein in the target code rule of the determining business datum, mesh
Before marking at least one in normalization rule and target dimensionality reduction rule, the method also includes:
According to service fields belonging to the business datum and/or business scenario, candidate code rule are provided for the business datum
Then, at least one in candidate feature normalization rule and candidate dimensionality reduction rule.
3. the method according to claim 1, wherein being advised in the target code according to the business datum
Then, at least one in target normalization rule and target dimensionality reduction rule, after the feature vector for determining the business datum, institute
State method further include:
Machine learning model is constructed using the feature vector of the business datum;
If the quality of the machine learning model is higher than the historical machine learning model of the business datum, by the engineering
At least one that the target code of habit model interaction is regular, target normalizes in regular and target dimensionality reduction rule is updated to the industry
The default configuration rule for data of being engaged in.
4. according to the method described in claim 3, it is characterized in that, the feature vector using the business datum constructs machine
Device learning model, comprising:
The balanced rule of target sample of the business datum is determined from the balanced rule of candidate samples being provided previously;
The business datum is screened using the target sample balanced rule;
Machine learning model is constructed using the corresponding feature vector of the business datum of screening.
5. a kind of feature deriving means of business datum characterized by comprising
Rule configuration module, for determining the target code rule, target normalization rule and target dimensionality reduction rule of business datum
At least one of in;
Feature generation module, for target code rule, target normalization rule and the target dimensionality reduction according to the business datum
At least one of in rule, determine the feature vector of the business datum.
6. device according to claim 5, which is characterized in that described device includes:
Regular supply module, for the target code rule, target normalization rule and target drop in the determining business datum
It is the business according to service fields belonging to the business datum and/or business scenario before at least one in dimension rule
Data provide at least one in regular candidate code, candidate feature normalization rule and candidate dimensionality reduction rule.
7. device according to claim 5, which is characterized in that described device further include:
Model construction module, in the target code rule, target normalization rule and mesh according to the business datum
At least one in dimensionality reduction rule is marked, after the feature vector for determining the business datum, using the feature of the business datum
Vector constructs machine learning model;
Default rule update module, if being higher than the historical machine of the business datum for the quality of the machine learning model
Practise model, then it will be in the associated target code rule of the machine learning model, target normalization rule and target dimensionality reduction rule
At least one of be updated to the business datum default configuration rule.
8. device according to claim 7, which is characterized in that the model construction module, comprising:
Balanced rule determination unit, for determining the target of the business datum from the balanced rule of candidate samples being provided previously
The balanced rule of sample;
Data screening unit, for being screened using the balanced rule of the target sample to the business datum;
Model construction unit constructs machine learning model for the corresponding feature vector of business datum using screening.
9. a kind of server characterized by comprising
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
The now feature extracting method of business datum according to any one of claims 1 to 4.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
The feature extracting method of business datum according to any one of claims 1 to 4 is realized when execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810289688.8A CN110334720A (en) | 2018-03-30 | 2018-03-30 | Feature extracting method, device, server and the storage medium of business datum |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810289688.8A CN110334720A (en) | 2018-03-30 | 2018-03-30 | Feature extracting method, device, server and the storage medium of business datum |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110334720A true CN110334720A (en) | 2019-10-15 |
Family
ID=68139901
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810289688.8A Pending CN110334720A (en) | 2018-03-30 | 2018-03-30 | Feature extracting method, device, server and the storage medium of business datum |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110334720A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111522797A (en) * | 2020-04-27 | 2020-08-11 | 支付宝(杭州)信息技术有限公司 | Method and device for building business model based on business database |
CN111581305A (en) * | 2020-05-18 | 2020-08-25 | 北京字节跳动网络技术有限公司 | Feature processing method, feature processing device, electronic device, and medium |
CN113010510A (en) * | 2019-12-20 | 2021-06-22 | 中国移动通信集团安徽有限公司 | Service identification method, device and system and computing equipment |
CN113158022A (en) * | 2021-01-29 | 2021-07-23 | 北京达佳互联信息技术有限公司 | Service recommendation method, device, server and storage medium |
RU2785764C1 (en) * | 2019-10-31 | 2022-12-13 | Биго Текнолоджи Пте. Лтд. | Information recommendation method, device, recommendation server and storage device |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103176983A (en) * | 2011-12-20 | 2013-06-26 | 中国科学院计算机网络信息中心 | Event warning method based on Internet information |
CN103743486A (en) * | 2014-01-02 | 2014-04-23 | 上海大学 | Automatic grading system and method based on mass tobacco leaf data |
CN103854063A (en) * | 2012-11-29 | 2014-06-11 | 中国科学院计算机网络信息中心 | Internet open information-based event occurrence risk prediction and early-warning method |
CN104156562A (en) * | 2014-07-15 | 2014-11-19 | 清华大学 | Failure predication system and failure predication method for background operation and maintenance system of bank |
CN104239856A (en) * | 2014-09-04 | 2014-12-24 | 电子科技大学 | Face recognition method based on Gabor characteristics and self-adaptive linear regression |
CN104268595A (en) * | 2014-09-24 | 2015-01-07 | 深圳市华尊科技有限公司 | General object detecting method and system |
CN104468711A (en) * | 2014-10-31 | 2015-03-25 | 上海融军科技有限公司 | Universal data management coding method and system for internet of things |
CN104933075A (en) * | 2014-03-20 | 2015-09-23 | 百度在线网络技术(北京)有限公司 | User attribute predicting platform and method |
CN105302911A (en) * | 2015-11-10 | 2016-02-03 | 珠海多玩信息技术有限公司 | Data screening engine establishing method and data screening engine |
CN105426356A (en) * | 2015-10-29 | 2016-03-23 | 杭州九言科技股份有限公司 | Target information identification method and apparatus |
CN106682067A (en) * | 2016-11-08 | 2017-05-17 | 浙江邦盛科技有限公司 | Machine learning anti-fraud monitoring system based on transaction data |
CN106779087A (en) * | 2016-11-30 | 2017-05-31 | 福建亿榕信息技术有限公司 | A kind of general-purpose machinery learning data analysis platform |
CN107025141A (en) * | 2017-05-18 | 2017-08-08 | 成都海天数联科技有限公司 | A kind of dispatching method based on big data mixture operation model |
CN107423442A (en) * | 2017-08-07 | 2017-12-01 | 火烈鸟网络(广州)股份有限公司 | Method and system, storage medium and computer equipment are recommended in application based on user's portrait behavioural analysis |
CN107463703A (en) * | 2017-08-16 | 2017-12-12 | 电子科技大学 | English social media account number classification method based on information gain |
-
2018
- 2018-03-30 CN CN201810289688.8A patent/CN110334720A/en active Pending
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103176983A (en) * | 2011-12-20 | 2013-06-26 | 中国科学院计算机网络信息中心 | Event warning method based on Internet information |
CN103854063A (en) * | 2012-11-29 | 2014-06-11 | 中国科学院计算机网络信息中心 | Internet open information-based event occurrence risk prediction and early-warning method |
CN103743486A (en) * | 2014-01-02 | 2014-04-23 | 上海大学 | Automatic grading system and method based on mass tobacco leaf data |
CN104933075A (en) * | 2014-03-20 | 2015-09-23 | 百度在线网络技术(北京)有限公司 | User attribute predicting platform and method |
CN104156562A (en) * | 2014-07-15 | 2014-11-19 | 清华大学 | Failure predication system and failure predication method for background operation and maintenance system of bank |
CN104239856A (en) * | 2014-09-04 | 2014-12-24 | 电子科技大学 | Face recognition method based on Gabor characteristics and self-adaptive linear regression |
CN104268595A (en) * | 2014-09-24 | 2015-01-07 | 深圳市华尊科技有限公司 | General object detecting method and system |
CN104468711A (en) * | 2014-10-31 | 2015-03-25 | 上海融军科技有限公司 | Universal data management coding method and system for internet of things |
CN105426356A (en) * | 2015-10-29 | 2016-03-23 | 杭州九言科技股份有限公司 | Target information identification method and apparatus |
CN105302911A (en) * | 2015-11-10 | 2016-02-03 | 珠海多玩信息技术有限公司 | Data screening engine establishing method and data screening engine |
CN106682067A (en) * | 2016-11-08 | 2017-05-17 | 浙江邦盛科技有限公司 | Machine learning anti-fraud monitoring system based on transaction data |
CN106779087A (en) * | 2016-11-30 | 2017-05-31 | 福建亿榕信息技术有限公司 | A kind of general-purpose machinery learning data analysis platform |
CN107025141A (en) * | 2017-05-18 | 2017-08-08 | 成都海天数联科技有限公司 | A kind of dispatching method based on big data mixture operation model |
CN107423442A (en) * | 2017-08-07 | 2017-12-01 | 火烈鸟网络(广州)股份有限公司 | Method and system, storage medium and computer equipment are recommended in application based on user's portrait behavioural analysis |
CN107463703A (en) * | 2017-08-16 | 2017-12-12 | 电子科技大学 | English social media account number classification method based on information gain |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2785764C1 (en) * | 2019-10-31 | 2022-12-13 | Биго Текнолоджи Пте. Лтд. | Information recommendation method, device, recommendation server and storage device |
CN113010510A (en) * | 2019-12-20 | 2021-06-22 | 中国移动通信集团安徽有限公司 | Service identification method, device and system and computing equipment |
CN113010510B (en) * | 2019-12-20 | 2024-03-19 | 中国移动通信集团安徽有限公司 | Service identification method, device, system and computing equipment |
CN111522797A (en) * | 2020-04-27 | 2020-08-11 | 支付宝(杭州)信息技术有限公司 | Method and device for building business model based on business database |
CN111522797B (en) * | 2020-04-27 | 2023-06-02 | 支付宝(杭州)信息技术有限公司 | Method and device for constructing business model based on business database |
CN111581305A (en) * | 2020-05-18 | 2020-08-25 | 北京字节跳动网络技术有限公司 | Feature processing method, feature processing device, electronic device, and medium |
CN111581305B (en) * | 2020-05-18 | 2023-08-08 | 抖音视界有限公司 | Feature processing method, device, electronic equipment and medium |
CN113158022A (en) * | 2021-01-29 | 2021-07-23 | 北京达佳互联信息技术有限公司 | Service recommendation method, device, server and storage medium |
CN113158022B (en) * | 2021-01-29 | 2024-03-12 | 北京达佳互联信息技术有限公司 | Service recommendation method, device, server and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3467723B1 (en) | Machine learning based network model construction method and apparatus | |
JP6708847B1 (en) | Machine learning apparatus and method | |
CN110334720A (en) | Feature extracting method, device, server and the storage medium of business datum | |
CN108804641A (en) | A kind of computational methods of text similarity, device, equipment and storage medium | |
CN104077303B (en) | Method and apparatus for data to be presented | |
CN107169586A (en) | Resource optimization method, device and storage medium based on artificial intelligence | |
CN109726661A (en) | Image processing method and device, medium and calculating equipment | |
CN109035028A (en) | Intelligence, which is thrown, cares for strategy-generating method and device, electronic equipment, storage medium | |
CN111027600A (en) | Image category prediction method and device | |
CN110852785B (en) | User grading method, device and computer readable storage medium | |
CN115730597A (en) | Multi-level semantic intention recognition method and related equipment thereof | |
Bian et al. | Research on an artificial intelligence-based professional ability evaluation system from the perspective of industry-education integration | |
CN108629381A (en) | Crowd's screening technique based on big data and terminal device | |
CN113850666A (en) | Service scheduling method, device, equipment and storage medium | |
CN116402166B (en) | Training method and device of prediction model, electronic equipment and storage medium | |
CN117057852A (en) | Internet marketing system and method based on artificial intelligence technology | |
WO2023164312A1 (en) | An apparatus for classifying candidates to postings and a method for its use | |
CN114168795B (en) | Building three-dimensional model mapping and storing method and device, electronic equipment and medium | |
US11620550B2 (en) | Automated data table discovery for automated machine learning | |
Zhou et al. | Data-driven maintenance priority recommendations for civil aircraft engine fleets using reliability-based bivariate cluster analysis | |
CN111126629A (en) | Model generation method, system, device and medium for identifying brushing behavior | |
CN111259138A (en) | Tax field short text emotion classification method and device | |
CN109858532A (en) | A kind of user draws a portrait method, apparatus, readable storage medium storing program for executing and terminal device | |
CN117708351B (en) | Deep learning-based technical standard auxiliary review method, system and storage medium | |
US11599921B2 (en) | System and method for determining an alimentary preparation provider |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191015 |
|
RJ01 | Rejection of invention patent application after publication |