CN109408583A - Data processing method and device, computer readable storage medium, electronic equipment - Google Patents
Data processing method and device, computer readable storage medium, electronic equipment Download PDFInfo
- Publication number
- CN109408583A CN109408583A CN201811117037.7A CN201811117037A CN109408583A CN 109408583 A CN109408583 A CN 109408583A CN 201811117037 A CN201811117037 A CN 201811117037A CN 109408583 A CN109408583 A CN 109408583A
- Authority
- CN
- China
- Prior art keywords
- branch mailbox
- data
- target
- data processing
- processing method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The disclosure belongs to big data technical field, it is related to a kind of data processing method and device, computer readable storage medium, electronic equipment, the data processing method includes: to obtain multiple sample datas, and each sample data includes the increment notebook data of one or more dimensions;The increment notebook data of each dimension is divided into multiple groups branch mailbox respectively, and multiple single argument branch mailbox decision trees are formed according to the branch mailbox;Target branch mailbox corresponding with each dimension is obtained according to multiple single argument branch mailbox decision trees;The target branch mailbox is input to prediction model, to carry out machine training to the prediction model.On the one hand this method can eliminate noise data, improve the stability of model;On the other hand, branch mailbox method is simple, does not need data mining personnel with business background knowledge abundant;And by data branch mailbox, reducing a large amount of duplicate values, improve the speed of algorithm.
Description
Technical field
This disclosure relates to big data technical field, in particular to a kind of data processing method, data processing equipment,
Computer readable storage medium and electronic equipment.
Background technique
With economic growth and social progress, the intelligent terminals such as computer, smart phone using more and more extensive,
In order to obtain valuable data information, it usually needs excavated, analyzed to data.
Since data can have the numerical value noise such as random error, exceptional value, extremum in measurement, numerical value noise be will affect
The accuracy of model, in addition measurement data can have a large amount of unduplicated values, will affect the speed of algorithm if direct use, and
And some algorithm does not support continuous variable, it is therefore desirable to pre-process to data.Generally use the means of branch mailbox by data into
Row discretization, while eliminating numerical value noise, reducing duplicate value.But common branch mailbox method mainly waits frequency, equidistant equal part
Case method, those branch mailbox Method means are single, and frequency and distance are not easy to determine, and data mining personnel is needed to have foot to data
Enough business background cognitions, otherwise cannot effective branch mailbox, cause the accuracy of model lower.
Therefore, this field needs a kind of new data processing method and device.
It should be noted that information is only used for reinforcing the reason to the background of the disclosure disclosed in above-mentioned background technology part
Solution, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
The disclosure is designed to provide a kind of data processing method, data processing equipment, computer readable storage medium
And electronic equipment, and then numerical value noise is overcome to can be avoided the influence of model stability simultaneously at least to a certain extent
Data mining personnel in the case where lacking business background knowledge can not effectively discretization data, to improve the flexibility ratio of model
And calculating speed.
According to one aspect of the disclosure, a kind of data processing method is provided characterized by comprising
Multiple sample datas are obtained, each sample data includes the increment notebook data of one or more dimensions;
The increment notebook data of the dimension is divided into multiple groups branch mailbox respectively, and multiple lists are formed according to the branch mailbox
Variable branch mailbox decision tree;
Target branch mailbox corresponding with the dimension is obtained according to multiple single argument branch mailbox decision trees;
The target branch mailbox is input to prediction model, to carry out machine training to the prediction model.
In an exemplary embodiment of the disclosure, the increment notebook data of the dimension is divided into multicomponent respectively
Case, comprising:
The increment notebook data is divided into multiple groups branch mailbox according to different frequencies;Or
The increment notebook data is divided into multiple groups branch mailbox according to default number of nodes.
In an exemplary embodiment of the disclosure, each sample data includes target data, is formed according to the branch mailbox
Multiple single argument branch mailbox decision trees, comprising:
Using the increment notebook data as root node, the branch mailbox for nonleaf node and the target data is leaf node, shape
At the single argument branch mailbox decision tree.
In an exemplary embodiment of the disclosure, it is obtained and the dimension pair according to multiple single argument branch mailbox decision trees
The target branch mailbox answered, comprising:
Calculate the sub-information value of each leaf node in each single argument branch mailbox decision tree;
The value of information of each single argument branch mailbox decision tree is calculated according to the sub-information value;
Compare the size of the value of information of each single argument branch mailbox decision tree, and with the described monotropic of minimal information value
The corresponding branch mailbox of branch mailbox decision tree is measured as the target branch mailbox.
In an exemplary embodiment of the disclosure, each single argument branch mailbox decision tree is calculated according to the sub-information value
The value of information, comprising:
The sub-information value of each leaf node in each single argument branch mailbox decision tree is added to obtain the letter
Breath value.
In an exemplary embodiment of the disclosure, each sample data further includes target data, by the target branch mailbox
It is input to prediction model, to carry out machine training to the prediction model, comprising:
The target branch mailbox is input to the prediction mould as input vector, the target data as output vector
Type, to carry out machine training to the prediction model.
In an exemplary embodiment of the disclosure, the method also includes:
Data to be analyzed are obtained, the data to be analyzed have the data with the sample data identical dimensional;
The data to be analyzed are input to the prediction model, to obtain prediction result.
According to one aspect of the disclosure, a kind of data processing equipment is provided characterized by comprising
First obtains module, and for obtaining multiple sample datas, each sample data includes one or more dimensions
Increment notebook data;
Decision tree forms module, for the increment notebook data of the dimension to be divided into multiple groups branch mailbox, and root respectively
Multiple single argument branch mailbox decision trees are formed according to the branch mailbox;
Target branch mailbox obtains module, corresponding with the dimension for being obtained according to multiple single argument branch mailbox decision trees
Target branch mailbox;
Model training module, for the target branch mailbox to be input to prediction model, to carry out machine to the prediction model
Device training.
According to one aspect of the disclosure, a kind of computer readable storage medium is provided, computer program is stored thereon with,
The computer program realizes data processing method described in above-mentioned any one when being executed by processor.
According to one aspect of the disclosure, a kind of electronic equipment is provided, comprising:
Processor;And
Memory, for storing the executable instruction of the processor;
Wherein, the processor is configured to execute number described in above-mentioned any one via the executable instruction is executed
According to processing method.
The data processing method of the disclosure is that the increment notebook data of each dimension is divided into multiple groups branch mailbox, according to branch mailbox shape
After single argument branch mailbox decision tree, by calculating the value of information and the summation of each single argument branch mailbox decision tree leaf node, obtain each
The corresponding value of information of group branch mailbox;Then the size for comparing each group branch mailbox value of information, using the branch mailbox with minimal information value as mesh
Mark branch mailbox.After obtaining the corresponding target branch mailbox of each dimension, by target branch mailbox and target data input prediction model, to prediction mould
Type carries out machine training;After completing training, being analysed to data input prediction model can be obtained prediction result.The number of the disclosure
Noise data on the one hand can be eliminated according to processing method, improves the stability of model;On the other hand, branch mailbox method is simple, is not required to
Want data mining personnel that there is business background knowledge abundant;And it by data branch mailbox, reducing a large amount of duplicate values, mentions
The high speed of algorithm.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure
Example, and together with specification for explaining the principles of this disclosure.It should be evident that the accompanying drawings in the following description is only the disclosure
Some embodiments for those of ordinary skill in the art without creative efforts, can also basis
These attached drawings obtain other attached drawings.
Fig. 1 schematically shows a kind of flow chart of data processing method;
Fig. 2 schematically shows a kind of Application Scenarios-Example figure of data processing method;
Fig. 3 A-3C schematically shows a kind of structural schematic diagram of single argument branch mailbox decision tree;
Fig. 4 schematically shows a kind of block diagram of data processing equipment;
Fig. 5 schematically shows a kind of electronic equipment example block diagram for realizing data processing method;
Fig. 6 schematically shows a kind of computer readable storage medium for realizing data processing method.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes
Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, thesing embodiments are provided so that the disclosure will more
Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, knot
Structure or characteristic can be incorporated in any suitable manner in one or more embodiments.In the following description, it provides perhaps
More details fully understand embodiment of the present disclosure to provide.It will be appreciated, however, by one skilled in the art that can
It is omitted with technical solution of the disclosure one or more in the specific detail, or others side can be used
Method, constituent element, device, step etc..In other cases, be not shown in detail or describe known solution to avoid a presumptuous guest usurps the role of the host and
So that all aspects of this disclosure thicken.
In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing mark in figure
Note indicates same or similar part, thus will omit repetition thereof.Some block diagrams shown in the drawings are function
Energy entity, not necessarily must be corresponding with physically or logically independent entity.These function can be realized using software form
Energy entity, or these functional entitys are realized in one or more hardware modules or integrated circuit, or at heterogeneous networks and/or place
These functional entitys are realized in reason device device and/or microcontroller device.
This field in the related technology, for data mining personnel in mining data, data can have random error, exception
The numerical value noise such as value, extremum, these numerical value noises will affect the accuracy of model, for example extremum will lead to model parameter mistake
It is high or too low, or cause model by manifestation of dishonesty " fascination ", the relationship being not present originally is learnt as important model.In order to
Existing numerical value noise when eliminating data mining generallys use equal frequency, equidistant branch mailbox method to data progress discretization, still
Single Deng the branch mailbox Method means such as frequency, equidistant, frequency, distance are not easy to determine, and need to have data enough business backgrounds to recognize
Know, therefore the stability of model in the related technology is poor, and due to there are a large amount of duplicate values in data, can also lead
Causing the calculating speed of model reduces.
In view of problem present in the relevant technologies of this field, a kind of data processing is provided firstly in this example embodiment
Method, the data processing method can run on server, can also run on server cluster or Cloud Server etc., certainly,
Those skilled in the art can also run disclosed method in other platforms according to demand, in the present exemplary embodiment not to this
Do particular determination.Refering to what is shown in Fig. 1, the data processing method may comprise steps of:
Step S110. obtains multiple sample datas, and each sample data includes the subsample number of one or more dimensions
According to;
The increment notebook data of the dimension is divided into multiple groups branch mailbox respectively by step S120., and according to the branch mailbox
Form multiple single argument branch mailbox decision trees;
Step S130. obtains target branch mailbox corresponding with the dimension according to multiple single argument branch mailbox decision trees;
The target branch mailbox is input to prediction model by step S140., to carry out machine training to the prediction model.
In above-mentioned data processing method, by the way that the increment notebook data of each dimension is divided into multiple groups branch mailbox, formed according to branch mailbox
Multiple single argument branch mailbox decision trees;Target branch mailbox corresponding with each dimension is obtained then according to single argument branch mailbox decision tree;Finally
Target branch mailbox is input to prediction model, for training prediction model.Data processing method in the disclosure on the one hand can
Noise data is eliminated, the stability of model is improved;On the other hand, branch mailbox method is simple, does not need data mining personnel with rich
Rich business background knowledge;And by data branch mailbox, reducing a large amount of duplicate values, improve the speed of algorithm.
In the following, it is detailed that Fig. 2 will be combined to carry out each step in data processing method above-mentioned in this example embodiment
Explanation and explanation.
In step s 110, multiple sample datas are obtained, each sample data includes the increment of one or more dimensions
Notebook data.
In an exemplary embodiment of the disclosure, it can be obtained from the data warehouse of server 201 or terminal device 202
Multiple sample datas carry out business personnel with insurance company specifically, sample data includes behavioral data and attribute data
Evaluation and test, for judging a possibility that business personnel becomes supervisor, the behavioral data in sample data can be the moon of business personnel
Features, the attribute data such as income, moon sales volume, season sales volume, number of turning out for work can be age or technical ability of business personnel etc.
The features such as grade.
In an exemplary embodiment of the disclosure, which may include the increment notebook data an of dimension, can also
To include the increment notebook data of multiple dimensions, such as a possibility that business personnel becomes supervisor is only predicted by the age, it can also be with
A possibility that business personnel becomes supervisor is predicted by the data of the dimensions such as age, season sales volume, monthly income.Certainly, in order to
It more fully assesses, obtains accurately prediction result, it is preferable for carrying out prediction using the data of various dimensions.
In the step s 120, the increment notebook data of each dimension is divided into multiple groups branch mailbox respectively, and according to institute
It states branch mailbox and forms multiple single argument branch mailbox decision trees.
In an exemplary embodiment of the disclosure, the increment notebook data of each dimension can be divided into multiple groups branch mailbox, divided
Increment notebook data can be divided by multiple groups branch mailbox according to different frequencies when case, wherein frequency refers to every number in each group branch mailbox
The data volume for including according to section, correspondingly, different frequencies refer to that the data volume of the corresponding data interval of each group branch mailbox is different,
Increment notebook data can be divided into multiple groups branch mailbox according to default number of nodes, naturally it is also possible to using other binning rules, originally
It is open that this is not especially limited.Such as the age of business personnel be -60 years old 20 years old, then can with setpoint frequency be 8,10,20,
Age data is divided into three groups of branch mailbox.When frequency be 8 when, branch mailbox be [20,28), [28,36), [36,44), [44,52), [52,
60];When frequency be 10 when, branch mailbox be [20,30), [30,40), [40,50), [50,60];When frequency is 20, branch mailbox is
[20,40),[40,60].3 nodes can also be set, by any branch mailbox of age data, form multiple groups branch mailbox.Such as setting 3
Node, forms three groups of branch mailbox, and every component case includes 4 branch mailbox, wherein first group of branch mailbox are as follows: [20,30), [30,40), [40,
50),[50,60];Second group of branch mailbox are as follows: [20,35), [35,45), [45,55), [55,60];Third component case are as follows: [20,
25)、[25,45)、[45,55)、[55,60]。
In an exemplary embodiment of the disclosure, sample data further includes target data, such as target data can be this
Whether business personnel is supervisor etc..Table 1 shows the sample data comprising a dimension increment notebook data, and table 1 is as follows:
Age | It whether is supervisor | |
1 | 20 | N |
2 | 28 | Y |
3 | 30 | N |
4 | 35 | Y |
5 | 37 | Y |
6 | 42 | Y |
7 | 45 | N |
8 | 50 | Y |
9 | 55 | N |
10 | 60 | N |
Table 1
Table 1 shows the sample data and the corresponding target data of each sample data of age dimension.In the disclosure, with son
Sample data forms single argument branch mailbox decision tree as leaf node as nonleaf node, target data as root node, branch mailbox.Figure
3A-3C shows the structural schematic diagram of single argument branch mailbox decision tree, as shown in figs. 3 a-3 c, respectively illustrates first group of branch mailbox,
Two groups of branch mailbox and third component box-shaped at single argument branch mailbox decision tree.
In step s 130, target point corresponding with the dimension is obtained according to multiple single argument branch mailbox decision trees
Case.
In an exemplary embodiment of the disclosure, after forming single argument branch mailbox decision tree, the son of each leaf node can be calculated
Then the value of information obtains the corresponding value of information of this group of branch mailbox according to the sub-information value of leaf node, finally more each single argument branch mailbox
The size of the value of information of decision tree, and using the corresponding branch mailbox of single argument branch mailbox decision tree with minimal information value as the target
Branch mailbox.
In an exemplary embodiment of the disclosure, the calculation formula of the sub-information value of leaf node are as follows:
Wherein, m is the quantity for becoming the business personnel of supervisor in target data, and n is not become supervisor's in target data
The quantity of business personnel.
For the single argument branch mailbox decision tree shown in Fig. 3 A-3C, calculated according to formula (1) it is found that in first group of branch mailbox
The sub-information value of each leaf node is respectively 0.3010,0.2764,0.3010,0.2764;The son of each leaf node in second group of branch mailbox
The value of information is respectively 0.2764,0,0.3010,0;The sub-information value of each leaf node is respectively 0,0.2173 in third component case,
0.3010,0.Then the leaf node sub-information value in three groups of branch mailbox is added, the value of information of each group branch mailbox, each branch mailbox can be obtained
The value of information be respectively: 1.1548,0.5774,0.5183.Compared the value of information minimum it is found that third component case, then target
Branch mailbox is third branch mailbox.
In step S140, the target branch mailbox is input to prediction model, to carry out machine instruction to the prediction model
Practice.
It in an exemplary embodiment of the disclosure, can be using target branch mailbox as input vector, target data as output
Vector is input in prediction model, to carry out machine training to prediction model.The prediction model can be with neural network model, can also
To be the models such as decision-tree model, the disclosure is not specifically limited in this embodiment.It is more when being obtained according to the increment notebook data of multiple dimensions
After a target branch mailbox, multiple target branch mailbox and corresponding target data can be input to prediction model, to obtain prediction result.
Noise data on the one hand can be eliminated by the data processing method in the disclosure, improves the stability of model;Separately
On the one hand, branch mailbox method is simple, does not need data mining personnel with business background knowledge abundant;And by data point
Case reduces a large amount of duplicate values, improves the speed of algorithm.
In an exemplary embodiment of the disclosure, after model training, available data to be analyzed then will be to
Analysis data are input in prediction model, and then obtain the prediction result of prediction model output.The data to be analyzed have and sample
The data of notebook data identical dimensional, such as data to be analyzed can be age or certain business personnel of certain business personnel
Age, season sales volume, grade of skill etc., by by its age or age, season sales volume and grade of skill input prediction mould
Type can be obtained a possibility that it becomes supervisor.
The disclosure additionally provides a kind of data processing equipment.Fig. 4 shows the structural schematic diagram of data processing equipment, such as schemes
Shown in 4, which may include data acquisition module 410, decision tree forms module 420, target branch mailbox obtains mould
Block 430 and model training module 440.Wherein:
Data acquisition module 410, for obtaining multiple sample datas, each sample data includes one or more dimensions
Increment notebook data;
Decision tree forms module 420, for the increment notebook data of the dimension to be divided into multiple groups branch mailbox respectively, and
Multiple single argument branch mailbox decision trees are formed according to the branch mailbox;
Target branch mailbox obtains module 430, for being obtained and the dimension pair according to multiple single argument branch mailbox decision trees
The target branch mailbox answered;
Model training module 440, for the target branch mailbox to be input to prediction model, to be carried out to the prediction model
Machine training.
The detail of each module has carried out in corresponding data processing method in detail in above-mentioned data processing equipment
Thin description, therefore details are not described herein again.
It should be noted that although being referred to several modules or list for acting the equipment executed in the above detailed description
Member, but this division is not enforceable.In fact, according to embodiment of the present disclosure, it is above-described two or more
Module or the feature and function of unit can embody in a module or unit.Conversely, an above-described mould
The feature and function of block or unit can be to be embodied by multiple modules or unit with further division.
In addition, although describing each step of method in the disclosure in the accompanying drawings with particular order, this does not really want
These steps must be executed in this particular order by asking or implying, or having to carry out step shown in whole could realize
Desired result.Additional or alternative, it is convenient to omit multiple steps are merged into a step and executed by certain steps, and/
Or a step is decomposed into execution of multiple steps etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented
Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the disclosure
The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one
Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating
Equipment (can be personal computer, server, mobile terminal or network equipment etc.) is executed according to disclosure embodiment
Method.
In an exemplary embodiment of the disclosure, a kind of electronic equipment that can be realized the above method is additionally provided.
Person of ordinary skill in the field it is understood that various aspects of the disclosure can be implemented as system, method or
Program product.Therefore, various aspects of the disclosure can be with specific implementation is as follows, it may be assumed that complete hardware embodiment, complete
The embodiment combined in terms of full Software Implementation (including firmware, microcode etc.) or hardware and software, can unite here
Referred to as circuit, " module " or " system ".
The electronic equipment 500 of this embodiment according to the disclosure is described referring to Fig. 5.The electronics that Fig. 5 is shown
Equipment 500 is only an example, should not function to the embodiment of the present disclosure and use scope bring any restrictions.
As shown in figure 5, electronic equipment 500 is showed in the form of universal computing device.The component of electronic equipment 500 can wrap
It includes but is not limited to: at least one above-mentioned processing unit 510, at least one above-mentioned storage unit 520, the different system components of connection
The bus 530 of (including storage unit 520 and processing unit 510).
Wherein, the storage unit is stored with program code, and said program code can be held by the processing unit 510
Row, so that various according to the disclosure described in the execution of the processing unit 510 above-mentioned " illustrative methods " part of this specification
The step of illustrative embodiments.For example, the processing unit 510 can execute step S110 as shown in fig. 1: obtaining more
A sample data, each sample data include the increment notebook data of one or more dimensions;Step S120: respectively by the dimension
The increment notebook data of degree is divided into multiple groups branch mailbox, and forms multiple single argument branch mailbox decision trees according to the branch mailbox;Step
S130: target branch mailbox corresponding with the dimension is obtained according to multiple single argument branch mailbox decision trees;Step S140: will be described
Target branch mailbox is input to prediction model, to carry out machine training to the prediction model.
Storage unit 520 may include the readable medium of volatile memory cell form, such as Random Access Storage Unit
(RAM) 5201 and/or cache memory unit 5202, it can further include read-only memory unit (ROM) 5203.
Storage unit 520 can also include program/utility with one group of (at least one) program module 5205
5204, such program module 5205 includes but is not limited to: operating system, one or more application program, other program moulds
It may include the realization of network environment in block and program data, each of these examples or certain combination.
Bus 530 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage
Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in a variety of bus structures
Local bus.
Electronic equipment 500 can also be with one or more external equipments 700 (such as keyboard, sensing equipment, bluetooth equipment
Deng) communication, can also be enabled a user to one or more equipment interact with the electronic equipment 500 communicate, and/or with make
Any equipment (such as the router, modulation /demodulation that the electronic equipment 500 can be communicated with one or more of the other calculating equipment
Device etc.) communication.This communication can be carried out by input/output (I/O) interface 550.Also, electronic equipment 500 can be with
By network adapter 560 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network,
Such as internet) communication.As shown, network adapter 560 is communicated by bus 530 with other modules of electronic equipment 500.
It should be understood that although not shown in the drawings, other hardware and/or software module can not used in conjunction with electronic equipment 500, including but not
Be limited to: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and
Data backup storage system etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented
Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the disclosure
The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one
Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating
Equipment (can be personal computer, server, terminal installation or network equipment etc.) is executed according to disclosure embodiment
Method.
In an exemplary embodiment of the disclosure, a kind of computer readable storage medium is additionally provided, energy is stored thereon with
Enough realize the program product of this specification above method.In some possible embodiments, various aspects of the disclosure may be used also
In the form of being embodied as a kind of program product comprising program code, when described program product is run on the terminal device, institute
Program code is stated for executing the terminal device described in above-mentioned " illustrative methods " part of this specification according to this public affairs
The step of opening various illustrative embodiments.
Refering to what is shown in Fig. 6, describing the program product for realizing the above method according to embodiment of the present disclosure
600, can using portable compact disc read only memory (CD-ROM) and including program code, and can in terminal device,
Such as it is run on PC.However, the program product of the disclosure is without being limited thereto, in this document, readable storage medium storing program for executing can be with
To be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or
It is in connection.
Described program product can be using any combination of one or more readable mediums.Readable medium can be readable letter
Number medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can be but be not limited to electricity, magnetic, optical, electromagnetic, infrared ray or
System, device or the device of semiconductor, or any above combination.The more specific example of readable storage medium storing program for executing is (non exhaustive
List) include: electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM), read-only
Memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read only memory
(CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
In carry readable program code.The data-signal of this propagation can take various forms, including but not limited to electromagnetic signal,
Optical signal or above-mentioned any appropriate combination.Readable signal medium can also be any readable Jie other than readable storage medium storing program for executing
Matter, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or and its
The program of combined use.
The program code for including on readable medium can transmit with any suitable medium, including but not limited to wirelessly, have
Line, optical cable, RF etc. or above-mentioned any appropriate combination.
Can with any combination of one or more programming languages come write for execute the disclosure operation program
Code, described program design language include object oriented program language-Java, C++ etc., further include conventional
Procedural programming language-such as " C " language or similar programming language.Program code can be fully in user
It calculates and executes in equipment, partly executes on a user device, being executed as an independent software package, partially in user's calculating
Upper side point is executed on a remote computing or is executed in remote computing device or server completely.It is being related to far
Journey calculates in the situation of equipment, and remote computing device can pass through the network of any kind, including local area network (LAN) or wide area network
(WAN), it is connected to user calculating equipment, or, it may be connected to external computing device (such as utilize ISP
To be connected by internet).
In addition, above-mentioned attached drawing is only the schematic theory of the processing according to included by the method for disclosure exemplary embodiment
It is bright, rather than limit purpose.It can be readily appreciated that the time that above-mentioned processing shown in the drawings did not indicated or limited these processing is suitable
Sequence.In addition, be also easy to understand, these processing, which can be, for example either synchronously or asynchronously to be executed in multiple modules.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure
His embodiment.The disclosure is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or
Adaptive change follow the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure or
Conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by claim
It points out.
Claims (10)
1. a kind of data processing method characterized by comprising
Multiple sample datas are obtained, each sample data includes the increment notebook data of one or more dimensions;
The increment notebook data of the dimension is divided into multiple groups branch mailbox respectively, and multiple single arguments are formed according to the branch mailbox
Branch mailbox decision tree;
Target branch mailbox corresponding with the dimension is obtained according to multiple single argument branch mailbox decision trees;
The target branch mailbox is input to prediction model, to carry out machine training to the prediction model.
2. data processing method according to claim 1, which is characterized in that respectively by the subsample number of the dimension
According to being divided into multiple groups branch mailbox, comprising:
The increment notebook data is divided into multiple groups branch mailbox according to different frequencies;Or
The increment notebook data is divided into multiple groups branch mailbox according to default number of nodes.
3. data processing method according to claim 2, which is characterized in that each sample data includes target data,
Multiple single argument branch mailbox decision trees are formed according to the branch mailbox, comprising:
Using the increment notebook data as root node, the branch mailbox for nonleaf node and the target data is leaf node, form institute
State single argument branch mailbox decision tree.
4. data processing method according to claim 1, which is characterized in that according to multiple single argument branch mailbox decision trees
Obtain target branch mailbox corresponding with the dimension, comprising:
Calculate the sub-information value of each leaf node in each single argument branch mailbox decision tree;
The value of information of each single argument branch mailbox decision tree is calculated according to the sub-information value;
Compare the size of the value of information of each single argument branch mailbox decision tree, and with the single argument with minimal information value point
The corresponding branch mailbox of case decision tree is as the target branch mailbox.
5. data processing method according to claim 4, which is characterized in that calculate each list according to the sub-information value
The value of information of variable branch mailbox decision tree, comprising:
The sub-information value of each leaf node in each single argument branch mailbox decision tree is added to obtain the value of information.
6. data processing method according to claim 1, which is characterized in that each sample data further includes number of targets
According to, the target branch mailbox is input to prediction model, it is trained to carry out machine to the prediction model, comprising:
The target branch mailbox is input to the prediction model as input vector, the target data as output vector, with
Machine training is carried out to the prediction model.
7. data processing method according to claim 1, which is characterized in that the method also includes:
Data to be analyzed are obtained, the data to be analyzed have the data with the sample data identical dimensional;
The data to be analyzed are input to the prediction model, to obtain prediction result.
8. a kind of data processing equipment characterized by comprising
First obtains module, and for obtaining multiple sample datas, each sample data includes the increment of one or more dimensions
Notebook data;
Decision tree forms module, for the increment notebook data of the dimension to be divided into multiple groups branch mailbox respectively, and according to institute
It states branch mailbox and forms multiple single argument branch mailbox decision trees;
Target branch mailbox obtains module, for obtaining target corresponding with the dimension according to multiple single argument branch mailbox decision trees
Branch mailbox;
Model training module, for the target branch mailbox to be input to prediction model, to carry out machine instruction to the prediction model
Practice.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt
Claim 1-7 described in any item data processing methods are realized when processor executes.
10. a kind of electronic equipment characterized by comprising
Processor;And
Memory, for storing the executable instruction of the processor;
Wherein, the processor is configured to require 1-7 described in any item via executing the executable instruction and carry out perform claim
Data processing method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811117037.7A CN109408583B (en) | 2018-09-25 | 2018-09-25 | Data processing method and device, computer readable storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811117037.7A CN109408583B (en) | 2018-09-25 | 2018-09-25 | Data processing method and device, computer readable storage medium and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109408583A true CN109408583A (en) | 2019-03-01 |
CN109408583B CN109408583B (en) | 2023-04-07 |
Family
ID=65465141
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811117037.7A Active CN109408583B (en) | 2018-09-25 | 2018-09-25 | Data processing method and device, computer readable storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109408583B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110245688A (en) * | 2019-05-21 | 2019-09-17 | 中国平安财产保险股份有限公司 | A kind of method and relevant apparatus of data processing |
CN110532266A (en) * | 2019-08-28 | 2019-12-03 | 京东数字科技控股有限公司 | A kind of method and apparatus of data processing |
CN110708285A (en) * | 2019-08-30 | 2020-01-17 | 中国平安人寿保险股份有限公司 | Flow monitoring method, device, medium and electronic equipment |
CN110798227A (en) * | 2019-09-19 | 2020-02-14 | 平安科技(深圳)有限公司 | Model prediction optimization method, device and equipment and readable storage medium |
CN111507479A (en) * | 2020-04-15 | 2020-08-07 | 深圳前海微众银行股份有限公司 | Feature binning method, device, equipment and computer-readable storage medium |
CN111782900A (en) * | 2020-08-06 | 2020-10-16 | 平安银行股份有限公司 | Abnormal service detection method and device, electronic equipment and storage medium |
CN112667741A (en) * | 2020-04-13 | 2021-04-16 | 华控清交信息科技(北京)有限公司 | Data processing method and device and data processing device |
CN113495906A (en) * | 2020-03-20 | 2021-10-12 | 北京京东振世信息技术有限公司 | Data processing method and device, computer readable storage medium and electronic equipment |
CN113837865A (en) * | 2021-09-29 | 2021-12-24 | 重庆富民银行股份有限公司 | Method for extracting multi-dimensional risk feature strategy |
CN111782900B (en) * | 2020-08-06 | 2024-03-19 | 平安银行股份有限公司 | Abnormal service detection method and device, electronic equipment and storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070185896A1 (en) * | 2006-02-01 | 2007-08-09 | Oracle International Corporation | Binning predictors using per-predictor trees and MDL pruning |
US20070192341A1 (en) * | 2006-02-01 | 2007-08-16 | Oracle International Corporation | System and method for building decision tree classifiers using bitmap techniques |
US7792770B1 (en) * | 2007-08-24 | 2010-09-07 | Louisiana Tech Research Foundation; A Division Of Louisiana Tech University Foundation, Inc. | Method to indentify anomalous data using cascaded K-Means clustering and an ID3 decision tree |
US20140122381A1 (en) * | 2012-10-25 | 2014-05-01 | Microsoft Corporation | Decision tree training in machine learning |
CN103942604A (en) * | 2013-01-18 | 2014-07-23 | 上海安迪泰信息技术有限公司 | Prediction method and system based on forest discrimination model |
US20150379430A1 (en) * | 2014-06-30 | 2015-12-31 | Amazon Technologies, Inc. | Efficient duplicate detection for machine learning data sets |
CN106250986A (en) * | 2015-06-04 | 2016-12-21 | 波音公司 | Advanced analysis base frame for machine learning |
CN106707060A (en) * | 2016-12-16 | 2017-05-24 | 中国电力科学研究院 | Method for acquiring discrete state parameters of power transformer |
CN107633265A (en) * | 2017-09-04 | 2018-01-26 | 深圳市华傲数据技术有限公司 | For optimizing the data processing method and device of credit evaluation model |
CN108021984A (en) * | 2016-11-01 | 2018-05-11 | 第四范式(北京)技术有限公司 | Determine the method and system of the feature importance of machine learning sample |
CN108182634A (en) * | 2018-01-31 | 2018-06-19 | 国信优易数据有限公司 | A kind of training method for borrowing or lending money prediction model, debt-credit Forecasting Methodology and device |
-
2018
- 2018-09-25 CN CN201811117037.7A patent/CN109408583B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070185896A1 (en) * | 2006-02-01 | 2007-08-09 | Oracle International Corporation | Binning predictors using per-predictor trees and MDL pruning |
US20070192341A1 (en) * | 2006-02-01 | 2007-08-16 | Oracle International Corporation | System and method for building decision tree classifiers using bitmap techniques |
US7792770B1 (en) * | 2007-08-24 | 2010-09-07 | Louisiana Tech Research Foundation; A Division Of Louisiana Tech University Foundation, Inc. | Method to indentify anomalous data using cascaded K-Means clustering and an ID3 decision tree |
US20140122381A1 (en) * | 2012-10-25 | 2014-05-01 | Microsoft Corporation | Decision tree training in machine learning |
CN103942604A (en) * | 2013-01-18 | 2014-07-23 | 上海安迪泰信息技术有限公司 | Prediction method and system based on forest discrimination model |
US20150379430A1 (en) * | 2014-06-30 | 2015-12-31 | Amazon Technologies, Inc. | Efficient duplicate detection for machine learning data sets |
CN106250986A (en) * | 2015-06-04 | 2016-12-21 | 波音公司 | Advanced analysis base frame for machine learning |
CN108021984A (en) * | 2016-11-01 | 2018-05-11 | 第四范式(北京)技术有限公司 | Determine the method and system of the feature importance of machine learning sample |
CN106707060A (en) * | 2016-12-16 | 2017-05-24 | 中国电力科学研究院 | Method for acquiring discrete state parameters of power transformer |
CN107633265A (en) * | 2017-09-04 | 2018-01-26 | 深圳市华傲数据技术有限公司 | For optimizing the data processing method and device of credit evaluation model |
CN108182634A (en) * | 2018-01-31 | 2018-06-19 | 国信优易数据有限公司 | A kind of training method for borrowing or lending money prediction model, debt-credit Forecasting Methodology and device |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110245688A (en) * | 2019-05-21 | 2019-09-17 | 中国平安财产保险股份有限公司 | A kind of method and relevant apparatus of data processing |
CN110532266A (en) * | 2019-08-28 | 2019-12-03 | 京东数字科技控股有限公司 | A kind of method and apparatus of data processing |
CN110708285A (en) * | 2019-08-30 | 2020-01-17 | 中国平安人寿保险股份有限公司 | Flow monitoring method, device, medium and electronic equipment |
CN110798227A (en) * | 2019-09-19 | 2020-02-14 | 平安科技(深圳)有限公司 | Model prediction optimization method, device and equipment and readable storage medium |
CN110798227B (en) * | 2019-09-19 | 2023-07-25 | 平安科技(深圳)有限公司 | Model prediction optimization method, device, equipment and readable storage medium |
CN113495906A (en) * | 2020-03-20 | 2021-10-12 | 北京京东振世信息技术有限公司 | Data processing method and device, computer readable storage medium and electronic equipment |
CN113495906B (en) * | 2020-03-20 | 2023-09-26 | 北京京东振世信息技术有限公司 | Data processing method and device, computer readable storage medium and electronic equipment |
CN112667741A (en) * | 2020-04-13 | 2021-04-16 | 华控清交信息科技(北京)有限公司 | Data processing method and device and data processing device |
CN112667741B (en) * | 2020-04-13 | 2022-07-08 | 华控清交信息科技(北京)有限公司 | Data processing method and device and data processing device |
CN111507479B (en) * | 2020-04-15 | 2021-08-10 | 深圳前海微众银行股份有限公司 | Feature binning method, device, equipment and computer-readable storage medium |
CN111507479A (en) * | 2020-04-15 | 2020-08-07 | 深圳前海微众银行股份有限公司 | Feature binning method, device, equipment and computer-readable storage medium |
CN111782900A (en) * | 2020-08-06 | 2020-10-16 | 平安银行股份有限公司 | Abnormal service detection method and device, electronic equipment and storage medium |
CN111782900B (en) * | 2020-08-06 | 2024-03-19 | 平安银行股份有限公司 | Abnormal service detection method and device, electronic equipment and storage medium |
CN113837865A (en) * | 2021-09-29 | 2021-12-24 | 重庆富民银行股份有限公司 | Method for extracting multi-dimensional risk feature strategy |
Also Published As
Publication number | Publication date |
---|---|
CN109408583B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109408583A (en) | Data processing method and device, computer readable storage medium, electronic equipment | |
CN110222762A (en) | Object prediction method, apparatus, equipment and medium | |
CN107357874A (en) | User classification method and device, electronic equipment, storage medium | |
US9949140B2 (en) | Visual representation of signal strength using machine learning models | |
CN110708285B (en) | Flow monitoring method, device, medium and electronic equipment | |
CN111338897A (en) | Identification method of abnormal node in application host, monitoring equipment and electronic equipment | |
CN109960650A (en) | Application assessment method, apparatus, medium and electronic equipment based on big data | |
CN109656815A (en) | There are test statement write method, device, medium and the electronic equipment of configuration file | |
CN109670161A (en) | Commodity similarity calculating method and device, storage medium, electronic equipment | |
KR20220166241A (en) | Method and apparatus for processing data, electronic device, storage medium and program | |
CN110109824A (en) | Big data automatic regression test method, apparatus, computer equipment and storage medium | |
CN109871891A (en) | A kind of object identification method, device and storage medium | |
CN109657056A (en) | Target sample acquisition methods, device, storage medium and electronic equipment | |
CN113849848A (en) | Data permission configuration method and system | |
CN109615312A (en) | Business abnormal investigation method, apparatus, electronic equipment and storage medium in execution | |
US11373022B2 (en) | Designing a structural product | |
CN109597482A (en) | Automatic page turning method and apparatus, medium and the electronic equipment of e-book | |
CN110070016A (en) | A kind of robot control method, device and storage medium | |
CN109522010A (en) | Initial code adding method and device, storage medium, electronic equipment | |
CN109284450A (en) | Order is at the determination method and device of single path, storage medium, electronic equipment | |
CN110334720A (en) | Feature extracting method, device, server and the storage medium of business datum | |
CN110348581A (en) | User characteristics optimization method, device, medium and electronic equipment in user characteristics group | |
CN109684207A (en) | Method, apparatus, electronic equipment and the storage medium of sequence of operation encapsulation | |
CN110060183A (en) | Client intelligent matching process, device, computer equipment and storage medium | |
CN110020195A (en) | Article recommended method and device, storage medium, electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |