CN109408583A - Data processing method and device, computer readable storage medium, electronic equipment - Google Patents

Data processing method and device, computer readable storage medium, electronic equipment Download PDF

Info

Publication number
CN109408583A
CN109408583A CN201811117037.7A CN201811117037A CN109408583A CN 109408583 A CN109408583 A CN 109408583A CN 201811117037 A CN201811117037 A CN 201811117037A CN 109408583 A CN109408583 A CN 109408583A
Authority
CN
China
Prior art keywords
branch mailbox
data
target
data processing
processing method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811117037.7A
Other languages
Chinese (zh)
Other versions
CN109408583B (en
Inventor
郭继昌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811117037.7A priority Critical patent/CN109408583B/en
Publication of CN109408583A publication Critical patent/CN109408583A/en
Application granted granted Critical
Publication of CN109408583B publication Critical patent/CN109408583B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The disclosure belongs to big data technical field, it is related to a kind of data processing method and device, computer readable storage medium, electronic equipment, the data processing method includes: to obtain multiple sample datas, and each sample data includes the increment notebook data of one or more dimensions;The increment notebook data of each dimension is divided into multiple groups branch mailbox respectively, and multiple single argument branch mailbox decision trees are formed according to the branch mailbox;Target branch mailbox corresponding with each dimension is obtained according to multiple single argument branch mailbox decision trees;The target branch mailbox is input to prediction model, to carry out machine training to the prediction model.On the one hand this method can eliminate noise data, improve the stability of model;On the other hand, branch mailbox method is simple, does not need data mining personnel with business background knowledge abundant;And by data branch mailbox, reducing a large amount of duplicate values, improve the speed of algorithm.

Description

Data processing method and device, computer readable storage medium, electronic equipment
Technical field
This disclosure relates to big data technical field, in particular to a kind of data processing method, data processing equipment, Computer readable storage medium and electronic equipment.
Background technique
With economic growth and social progress, the intelligent terminals such as computer, smart phone using more and more extensive, In order to obtain valuable data information, it usually needs excavated, analyzed to data.
Since data can have the numerical value noise such as random error, exceptional value, extremum in measurement, numerical value noise be will affect The accuracy of model, in addition measurement data can have a large amount of unduplicated values, will affect the speed of algorithm if direct use, and And some algorithm does not support continuous variable, it is therefore desirable to pre-process to data.Generally use the means of branch mailbox by data into Row discretization, while eliminating numerical value noise, reducing duplicate value.But common branch mailbox method mainly waits frequency, equidistant equal part Case method, those branch mailbox Method means are single, and frequency and distance are not easy to determine, and data mining personnel is needed to have foot to data Enough business background cognitions, otherwise cannot effective branch mailbox, cause the accuracy of model lower.
Therefore, this field needs a kind of new data processing method and device.
It should be noted that information is only used for reinforcing the reason to the background of the disclosure disclosed in above-mentioned background technology part Solution, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
The disclosure is designed to provide a kind of data processing method, data processing equipment, computer readable storage medium And electronic equipment, and then numerical value noise is overcome to can be avoided the influence of model stability simultaneously at least to a certain extent Data mining personnel in the case where lacking business background knowledge can not effectively discretization data, to improve the flexibility ratio of model And calculating speed.
According to one aspect of the disclosure, a kind of data processing method is provided characterized by comprising
Multiple sample datas are obtained, each sample data includes the increment notebook data of one or more dimensions;
The increment notebook data of the dimension is divided into multiple groups branch mailbox respectively, and multiple lists are formed according to the branch mailbox Variable branch mailbox decision tree;
Target branch mailbox corresponding with the dimension is obtained according to multiple single argument branch mailbox decision trees;
The target branch mailbox is input to prediction model, to carry out machine training to the prediction model.
In an exemplary embodiment of the disclosure, the increment notebook data of the dimension is divided into multicomponent respectively Case, comprising:
The increment notebook data is divided into multiple groups branch mailbox according to different frequencies;Or
The increment notebook data is divided into multiple groups branch mailbox according to default number of nodes.
In an exemplary embodiment of the disclosure, each sample data includes target data, is formed according to the branch mailbox Multiple single argument branch mailbox decision trees, comprising:
Using the increment notebook data as root node, the branch mailbox for nonleaf node and the target data is leaf node, shape At the single argument branch mailbox decision tree.
In an exemplary embodiment of the disclosure, it is obtained and the dimension pair according to multiple single argument branch mailbox decision trees The target branch mailbox answered, comprising:
Calculate the sub-information value of each leaf node in each single argument branch mailbox decision tree;
The value of information of each single argument branch mailbox decision tree is calculated according to the sub-information value;
Compare the size of the value of information of each single argument branch mailbox decision tree, and with the described monotropic of minimal information value The corresponding branch mailbox of branch mailbox decision tree is measured as the target branch mailbox.
In an exemplary embodiment of the disclosure, each single argument branch mailbox decision tree is calculated according to the sub-information value The value of information, comprising:
The sub-information value of each leaf node in each single argument branch mailbox decision tree is added to obtain the letter Breath value.
In an exemplary embodiment of the disclosure, each sample data further includes target data, by the target branch mailbox It is input to prediction model, to carry out machine training to the prediction model, comprising:
The target branch mailbox is input to the prediction mould as input vector, the target data as output vector Type, to carry out machine training to the prediction model.
In an exemplary embodiment of the disclosure, the method also includes:
Data to be analyzed are obtained, the data to be analyzed have the data with the sample data identical dimensional;
The data to be analyzed are input to the prediction model, to obtain prediction result.
According to one aspect of the disclosure, a kind of data processing equipment is provided characterized by comprising
First obtains module, and for obtaining multiple sample datas, each sample data includes one or more dimensions Increment notebook data;
Decision tree forms module, for the increment notebook data of the dimension to be divided into multiple groups branch mailbox, and root respectively Multiple single argument branch mailbox decision trees are formed according to the branch mailbox;
Target branch mailbox obtains module, corresponding with the dimension for being obtained according to multiple single argument branch mailbox decision trees Target branch mailbox;
Model training module, for the target branch mailbox to be input to prediction model, to carry out machine to the prediction model Device training.
According to one aspect of the disclosure, a kind of computer readable storage medium is provided, computer program is stored thereon with, The computer program realizes data processing method described in above-mentioned any one when being executed by processor.
According to one aspect of the disclosure, a kind of electronic equipment is provided, comprising:
Processor;And
Memory, for storing the executable instruction of the processor;
Wherein, the processor is configured to execute number described in above-mentioned any one via the executable instruction is executed According to processing method.
The data processing method of the disclosure is that the increment notebook data of each dimension is divided into multiple groups branch mailbox, according to branch mailbox shape After single argument branch mailbox decision tree, by calculating the value of information and the summation of each single argument branch mailbox decision tree leaf node, obtain each The corresponding value of information of group branch mailbox;Then the size for comparing each group branch mailbox value of information, using the branch mailbox with minimal information value as mesh Mark branch mailbox.After obtaining the corresponding target branch mailbox of each dimension, by target branch mailbox and target data input prediction model, to prediction mould Type carries out machine training;After completing training, being analysed to data input prediction model can be obtained prediction result.The number of the disclosure Noise data on the one hand can be eliminated according to processing method, improves the stability of model;On the other hand, branch mailbox method is simple, is not required to Want data mining personnel that there is business background knowledge abundant;And it by data branch mailbox, reducing a large amount of duplicate values, mentions The high speed of algorithm.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.It should be evident that the accompanying drawings in the following description is only the disclosure Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.
Fig. 1 schematically shows a kind of flow chart of data processing method;
Fig. 2 schematically shows a kind of Application Scenarios-Example figure of data processing method;
Fig. 3 A-3C schematically shows a kind of structural schematic diagram of single argument branch mailbox decision tree;
Fig. 4 schematically shows a kind of block diagram of data processing equipment;
Fig. 5 schematically shows a kind of electronic equipment example block diagram for realizing data processing method;
Fig. 6 schematically shows a kind of computer readable storage medium for realizing data processing method.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, thesing embodiments are provided so that the disclosure will more Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, knot Structure or characteristic can be incorporated in any suitable manner in one or more embodiments.In the following description, it provides perhaps More details fully understand embodiment of the present disclosure to provide.It will be appreciated, however, by one skilled in the art that can It is omitted with technical solution of the disclosure one or more in the specific detail, or others side can be used Method, constituent element, device, step etc..In other cases, be not shown in detail or describe known solution to avoid a presumptuous guest usurps the role of the host and So that all aspects of this disclosure thicken.
In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing mark in figure Note indicates same or similar part, thus will omit repetition thereof.Some block diagrams shown in the drawings are function Energy entity, not necessarily must be corresponding with physically or logically independent entity.These function can be realized using software form Energy entity, or these functional entitys are realized in one or more hardware modules or integrated circuit, or at heterogeneous networks and/or place These functional entitys are realized in reason device device and/or microcontroller device.
This field in the related technology, for data mining personnel in mining data, data can have random error, exception The numerical value noise such as value, extremum, these numerical value noises will affect the accuracy of model, for example extremum will lead to model parameter mistake It is high or too low, or cause model by manifestation of dishonesty " fascination ", the relationship being not present originally is learnt as important model.In order to Existing numerical value noise when eliminating data mining generallys use equal frequency, equidistant branch mailbox method to data progress discretization, still Single Deng the branch mailbox Method means such as frequency, equidistant, frequency, distance are not easy to determine, and need to have data enough business backgrounds to recognize Know, therefore the stability of model in the related technology is poor, and due to there are a large amount of duplicate values in data, can also lead Causing the calculating speed of model reduces.
In view of problem present in the relevant technologies of this field, a kind of data processing is provided firstly in this example embodiment Method, the data processing method can run on server, can also run on server cluster or Cloud Server etc., certainly, Those skilled in the art can also run disclosed method in other platforms according to demand, in the present exemplary embodiment not to this Do particular determination.Refering to what is shown in Fig. 1, the data processing method may comprise steps of:
Step S110. obtains multiple sample datas, and each sample data includes the subsample number of one or more dimensions According to;
The increment notebook data of the dimension is divided into multiple groups branch mailbox respectively by step S120., and according to the branch mailbox Form multiple single argument branch mailbox decision trees;
Step S130. obtains target branch mailbox corresponding with the dimension according to multiple single argument branch mailbox decision trees;
The target branch mailbox is input to prediction model by step S140., to carry out machine training to the prediction model.
In above-mentioned data processing method, by the way that the increment notebook data of each dimension is divided into multiple groups branch mailbox, formed according to branch mailbox Multiple single argument branch mailbox decision trees;Target branch mailbox corresponding with each dimension is obtained then according to single argument branch mailbox decision tree;Finally Target branch mailbox is input to prediction model, for training prediction model.Data processing method in the disclosure on the one hand can Noise data is eliminated, the stability of model is improved;On the other hand, branch mailbox method is simple, does not need data mining personnel with rich Rich business background knowledge;And by data branch mailbox, reducing a large amount of duplicate values, improve the speed of algorithm.
In the following, it is detailed that Fig. 2 will be combined to carry out each step in data processing method above-mentioned in this example embodiment Explanation and explanation.
In step s 110, multiple sample datas are obtained, each sample data includes the increment of one or more dimensions Notebook data.
In an exemplary embodiment of the disclosure, it can be obtained from the data warehouse of server 201 or terminal device 202 Multiple sample datas carry out business personnel with insurance company specifically, sample data includes behavioral data and attribute data Evaluation and test, for judging a possibility that business personnel becomes supervisor, the behavioral data in sample data can be the moon of business personnel Features, the attribute data such as income, moon sales volume, season sales volume, number of turning out for work can be age or technical ability of business personnel etc. The features such as grade.
In an exemplary embodiment of the disclosure, which may include the increment notebook data an of dimension, can also To include the increment notebook data of multiple dimensions, such as a possibility that business personnel becomes supervisor is only predicted by the age, it can also be with A possibility that business personnel becomes supervisor is predicted by the data of the dimensions such as age, season sales volume, monthly income.Certainly, in order to It more fully assesses, obtains accurately prediction result, it is preferable for carrying out prediction using the data of various dimensions.
In the step s 120, the increment notebook data of each dimension is divided into multiple groups branch mailbox respectively, and according to institute It states branch mailbox and forms multiple single argument branch mailbox decision trees.
In an exemplary embodiment of the disclosure, the increment notebook data of each dimension can be divided into multiple groups branch mailbox, divided Increment notebook data can be divided by multiple groups branch mailbox according to different frequencies when case, wherein frequency refers to every number in each group branch mailbox The data volume for including according to section, correspondingly, different frequencies refer to that the data volume of the corresponding data interval of each group branch mailbox is different, Increment notebook data can be divided into multiple groups branch mailbox according to default number of nodes, naturally it is also possible to using other binning rules, originally It is open that this is not especially limited.Such as the age of business personnel be -60 years old 20 years old, then can with setpoint frequency be 8,10,20, Age data is divided into three groups of branch mailbox.When frequency be 8 when, branch mailbox be [20,28), [28,36), [36,44), [44,52), [52, 60];When frequency be 10 when, branch mailbox be [20,30), [30,40), [40,50), [50,60];When frequency is 20, branch mailbox is [20,40),[40,60].3 nodes can also be set, by any branch mailbox of age data, form multiple groups branch mailbox.Such as setting 3 Node, forms three groups of branch mailbox, and every component case includes 4 branch mailbox, wherein first group of branch mailbox are as follows: [20,30), [30,40), [40, 50),[50,60];Second group of branch mailbox are as follows: [20,35), [35,45), [45,55), [55,60];Third component case are as follows: [20, 25)、[25,45)、[45,55)、[55,60]。
In an exemplary embodiment of the disclosure, sample data further includes target data, such as target data can be this Whether business personnel is supervisor etc..Table 1 shows the sample data comprising a dimension increment notebook data, and table 1 is as follows:
Age It whether is supervisor
1 20 N
2 28 Y
3 30 N
4 35 Y
5 37 Y
6 42 Y
7 45 N
8 50 Y
9 55 N
10 60 N
Table 1
Table 1 shows the sample data and the corresponding target data of each sample data of age dimension.In the disclosure, with son Sample data forms single argument branch mailbox decision tree as leaf node as nonleaf node, target data as root node, branch mailbox.Figure 3A-3C shows the structural schematic diagram of single argument branch mailbox decision tree, as shown in figs. 3 a-3 c, respectively illustrates first group of branch mailbox, Two groups of branch mailbox and third component box-shaped at single argument branch mailbox decision tree.
In step s 130, target point corresponding with the dimension is obtained according to multiple single argument branch mailbox decision trees Case.
In an exemplary embodiment of the disclosure, after forming single argument branch mailbox decision tree, the son of each leaf node can be calculated Then the value of information obtains the corresponding value of information of this group of branch mailbox according to the sub-information value of leaf node, finally more each single argument branch mailbox The size of the value of information of decision tree, and using the corresponding branch mailbox of single argument branch mailbox decision tree with minimal information value as the target Branch mailbox.
In an exemplary embodiment of the disclosure, the calculation formula of the sub-information value of leaf node are as follows:
Wherein, m is the quantity for becoming the business personnel of supervisor in target data, and n is not become supervisor's in target data The quantity of business personnel.
For the single argument branch mailbox decision tree shown in Fig. 3 A-3C, calculated according to formula (1) it is found that in first group of branch mailbox The sub-information value of each leaf node is respectively 0.3010,0.2764,0.3010,0.2764;The son of each leaf node in second group of branch mailbox The value of information is respectively 0.2764,0,0.3010,0;The sub-information value of each leaf node is respectively 0,0.2173 in third component case, 0.3010,0.Then the leaf node sub-information value in three groups of branch mailbox is added, the value of information of each group branch mailbox, each branch mailbox can be obtained The value of information be respectively: 1.1548,0.5774,0.5183.Compared the value of information minimum it is found that third component case, then target Branch mailbox is third branch mailbox.
In step S140, the target branch mailbox is input to prediction model, to carry out machine instruction to the prediction model Practice.
It in an exemplary embodiment of the disclosure, can be using target branch mailbox as input vector, target data as output Vector is input in prediction model, to carry out machine training to prediction model.The prediction model can be with neural network model, can also To be the models such as decision-tree model, the disclosure is not specifically limited in this embodiment.It is more when being obtained according to the increment notebook data of multiple dimensions After a target branch mailbox, multiple target branch mailbox and corresponding target data can be input to prediction model, to obtain prediction result.
Noise data on the one hand can be eliminated by the data processing method in the disclosure, improves the stability of model;Separately On the one hand, branch mailbox method is simple, does not need data mining personnel with business background knowledge abundant;And by data point Case reduces a large amount of duplicate values, improves the speed of algorithm.
In an exemplary embodiment of the disclosure, after model training, available data to be analyzed then will be to Analysis data are input in prediction model, and then obtain the prediction result of prediction model output.The data to be analyzed have and sample The data of notebook data identical dimensional, such as data to be analyzed can be age or certain business personnel of certain business personnel Age, season sales volume, grade of skill etc., by by its age or age, season sales volume and grade of skill input prediction mould Type can be obtained a possibility that it becomes supervisor.
The disclosure additionally provides a kind of data processing equipment.Fig. 4 shows the structural schematic diagram of data processing equipment, such as schemes Shown in 4, which may include data acquisition module 410, decision tree forms module 420, target branch mailbox obtains mould Block 430 and model training module 440.Wherein:
Data acquisition module 410, for obtaining multiple sample datas, each sample data includes one or more dimensions Increment notebook data;
Decision tree forms module 420, for the increment notebook data of the dimension to be divided into multiple groups branch mailbox respectively, and Multiple single argument branch mailbox decision trees are formed according to the branch mailbox;
Target branch mailbox obtains module 430, for being obtained and the dimension pair according to multiple single argument branch mailbox decision trees The target branch mailbox answered;
Model training module 440, for the target branch mailbox to be input to prediction model, to be carried out to the prediction model Machine training.
The detail of each module has carried out in corresponding data processing method in detail in above-mentioned data processing equipment Thin description, therefore details are not described herein again.
It should be noted that although being referred to several modules or list for acting the equipment executed in the above detailed description Member, but this division is not enforceable.In fact, according to embodiment of the present disclosure, it is above-described two or more Module or the feature and function of unit can embody in a module or unit.Conversely, an above-described mould The feature and function of block or unit can be to be embodied by multiple modules or unit with further division.
In addition, although describing each step of method in the disclosure in the accompanying drawings with particular order, this does not really want These steps must be executed in this particular order by asking or implying, or having to carry out step shown in whole could realize Desired result.Additional or alternative, it is convenient to omit multiple steps are merged into a step and executed by certain steps, and/ Or a step is decomposed into execution of multiple steps etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the disclosure The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating Equipment (can be personal computer, server, mobile terminal or network equipment etc.) is executed according to disclosure embodiment Method.
In an exemplary embodiment of the disclosure, a kind of electronic equipment that can be realized the above method is additionally provided.
Person of ordinary skill in the field it is understood that various aspects of the disclosure can be implemented as system, method or Program product.Therefore, various aspects of the disclosure can be with specific implementation is as follows, it may be assumed that complete hardware embodiment, complete The embodiment combined in terms of full Software Implementation (including firmware, microcode etc.) or hardware and software, can unite here Referred to as circuit, " module " or " system ".
The electronic equipment 500 of this embodiment according to the disclosure is described referring to Fig. 5.The electronics that Fig. 5 is shown Equipment 500 is only an example, should not function to the embodiment of the present disclosure and use scope bring any restrictions.
As shown in figure 5, electronic equipment 500 is showed in the form of universal computing device.The component of electronic equipment 500 can wrap It includes but is not limited to: at least one above-mentioned processing unit 510, at least one above-mentioned storage unit 520, the different system components of connection The bus 530 of (including storage unit 520 and processing unit 510).
Wherein, the storage unit is stored with program code, and said program code can be held by the processing unit 510 Row, so that various according to the disclosure described in the execution of the processing unit 510 above-mentioned " illustrative methods " part of this specification The step of illustrative embodiments.For example, the processing unit 510 can execute step S110 as shown in fig. 1: obtaining more A sample data, each sample data include the increment notebook data of one or more dimensions;Step S120: respectively by the dimension The increment notebook data of degree is divided into multiple groups branch mailbox, and forms multiple single argument branch mailbox decision trees according to the branch mailbox;Step S130: target branch mailbox corresponding with the dimension is obtained according to multiple single argument branch mailbox decision trees;Step S140: will be described Target branch mailbox is input to prediction model, to carry out machine training to the prediction model.
Storage unit 520 may include the readable medium of volatile memory cell form, such as Random Access Storage Unit (RAM) 5201 and/or cache memory unit 5202, it can further include read-only memory unit (ROM) 5203.
Storage unit 520 can also include program/utility with one group of (at least one) program module 5205 5204, such program module 5205 includes but is not limited to: operating system, one or more application program, other program moulds It may include the realization of network environment in block and program data, each of these examples or certain combination.
Bus 530 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in a variety of bus structures Local bus.
Electronic equipment 500 can also be with one or more external equipments 700 (such as keyboard, sensing equipment, bluetooth equipment Deng) communication, can also be enabled a user to one or more equipment interact with the electronic equipment 500 communicate, and/or with make Any equipment (such as the router, modulation /demodulation that the electronic equipment 500 can be communicated with one or more of the other calculating equipment Device etc.) communication.This communication can be carried out by input/output (I/O) interface 550.Also, electronic equipment 500 can be with By network adapter 560 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, Such as internet) communication.As shown, network adapter 560 is communicated by bus 530 with other modules of electronic equipment 500. It should be understood that although not shown in the drawings, other hardware and/or software module can not used in conjunction with electronic equipment 500, including but not Be limited to: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and Data backup storage system etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the disclosure The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating Equipment (can be personal computer, server, terminal installation or network equipment etc.) is executed according to disclosure embodiment Method.
In an exemplary embodiment of the disclosure, a kind of computer readable storage medium is additionally provided, energy is stored thereon with Enough realize the program product of this specification above method.In some possible embodiments, various aspects of the disclosure may be used also In the form of being embodied as a kind of program product comprising program code, when described program product is run on the terminal device, institute Program code is stated for executing the terminal device described in above-mentioned " illustrative methods " part of this specification according to this public affairs The step of opening various illustrative embodiments.
Refering to what is shown in Fig. 6, describing the program product for realizing the above method according to embodiment of the present disclosure 600, can using portable compact disc read only memory (CD-ROM) and including program code, and can in terminal device, Such as it is run on PC.However, the program product of the disclosure is without being limited thereto, in this document, readable storage medium storing program for executing can be with To be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or It is in connection.
Described program product can be using any combination of one or more readable mediums.Readable medium can be readable letter Number medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can be but be not limited to electricity, magnetic, optical, electromagnetic, infrared ray or System, device or the device of semiconductor, or any above combination.The more specific example of readable storage medium storing program for executing is (non exhaustive List) include: electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM), read-only Memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, In carry readable program code.The data-signal of this propagation can take various forms, including but not limited to electromagnetic signal, Optical signal or above-mentioned any appropriate combination.Readable signal medium can also be any readable Jie other than readable storage medium storing program for executing Matter, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or and its The program of combined use.
The program code for including on readable medium can transmit with any suitable medium, including but not limited to wirelessly, have Line, optical cable, RF etc. or above-mentioned any appropriate combination.
Can with any combination of one or more programming languages come write for execute the disclosure operation program Code, described program design language include object oriented program language-Java, C++ etc., further include conventional Procedural programming language-such as " C " language or similar programming language.Program code can be fully in user It calculates and executes in equipment, partly executes on a user device, being executed as an independent software package, partially in user's calculating Upper side point is executed on a remote computing or is executed in remote computing device or server completely.It is being related to far Journey calculates in the situation of equipment, and remote computing device can pass through the network of any kind, including local area network (LAN) or wide area network (WAN), it is connected to user calculating equipment, or, it may be connected to external computing device (such as utilize ISP To be connected by internet).
In addition, above-mentioned attached drawing is only the schematic theory of the processing according to included by the method for disclosure exemplary embodiment It is bright, rather than limit purpose.It can be readily appreciated that the time that above-mentioned processing shown in the drawings did not indicated or limited these processing is suitable Sequence.In addition, be also easy to understand, these processing, which can be, for example either synchronously or asynchronously to be executed in multiple modules.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure His embodiment.The disclosure is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Adaptive change follow the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure or Conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by claim It points out.

Claims (10)

1. a kind of data processing method characterized by comprising
Multiple sample datas are obtained, each sample data includes the increment notebook data of one or more dimensions;
The increment notebook data of the dimension is divided into multiple groups branch mailbox respectively, and multiple single arguments are formed according to the branch mailbox Branch mailbox decision tree;
Target branch mailbox corresponding with the dimension is obtained according to multiple single argument branch mailbox decision trees;
The target branch mailbox is input to prediction model, to carry out machine training to the prediction model.
2. data processing method according to claim 1, which is characterized in that respectively by the subsample number of the dimension According to being divided into multiple groups branch mailbox, comprising:
The increment notebook data is divided into multiple groups branch mailbox according to different frequencies;Or
The increment notebook data is divided into multiple groups branch mailbox according to default number of nodes.
3. data processing method according to claim 2, which is characterized in that each sample data includes target data, Multiple single argument branch mailbox decision trees are formed according to the branch mailbox, comprising:
Using the increment notebook data as root node, the branch mailbox for nonleaf node and the target data is leaf node, form institute State single argument branch mailbox decision tree.
4. data processing method according to claim 1, which is characterized in that according to multiple single argument branch mailbox decision trees Obtain target branch mailbox corresponding with the dimension, comprising:
Calculate the sub-information value of each leaf node in each single argument branch mailbox decision tree;
The value of information of each single argument branch mailbox decision tree is calculated according to the sub-information value;
Compare the size of the value of information of each single argument branch mailbox decision tree, and with the single argument with minimal information value point The corresponding branch mailbox of case decision tree is as the target branch mailbox.
5. data processing method according to claim 4, which is characterized in that calculate each list according to the sub-information value The value of information of variable branch mailbox decision tree, comprising:
The sub-information value of each leaf node in each single argument branch mailbox decision tree is added to obtain the value of information.
6. data processing method according to claim 1, which is characterized in that each sample data further includes number of targets According to, the target branch mailbox is input to prediction model, it is trained to carry out machine to the prediction model, comprising:
The target branch mailbox is input to the prediction model as input vector, the target data as output vector, with Machine training is carried out to the prediction model.
7. data processing method according to claim 1, which is characterized in that the method also includes:
Data to be analyzed are obtained, the data to be analyzed have the data with the sample data identical dimensional;
The data to be analyzed are input to the prediction model, to obtain prediction result.
8. a kind of data processing equipment characterized by comprising
First obtains module, and for obtaining multiple sample datas, each sample data includes the increment of one or more dimensions Notebook data;
Decision tree forms module, for the increment notebook data of the dimension to be divided into multiple groups branch mailbox respectively, and according to institute It states branch mailbox and forms multiple single argument branch mailbox decision trees;
Target branch mailbox obtains module, for obtaining target corresponding with the dimension according to multiple single argument branch mailbox decision trees Branch mailbox;
Model training module, for the target branch mailbox to be input to prediction model, to carry out machine instruction to the prediction model Practice.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt Claim 1-7 described in any item data processing methods are realized when processor executes.
10. a kind of electronic equipment characterized by comprising
Processor;And
Memory, for storing the executable instruction of the processor;
Wherein, the processor is configured to require 1-7 described in any item via executing the executable instruction and carry out perform claim Data processing method.
CN201811117037.7A 2018-09-25 2018-09-25 Data processing method and device, computer readable storage medium and electronic equipment Active CN109408583B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811117037.7A CN109408583B (en) 2018-09-25 2018-09-25 Data processing method and device, computer readable storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811117037.7A CN109408583B (en) 2018-09-25 2018-09-25 Data processing method and device, computer readable storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN109408583A true CN109408583A (en) 2019-03-01
CN109408583B CN109408583B (en) 2023-04-07

Family

ID=65465141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811117037.7A Active CN109408583B (en) 2018-09-25 2018-09-25 Data processing method and device, computer readable storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN109408583B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245688A (en) * 2019-05-21 2019-09-17 中国平安财产保险股份有限公司 A kind of method and relevant apparatus of data processing
CN110532266A (en) * 2019-08-28 2019-12-03 京东数字科技控股有限公司 A kind of method and apparatus of data processing
CN110708285A (en) * 2019-08-30 2020-01-17 中国平安人寿保险股份有限公司 Flow monitoring method, device, medium and electronic equipment
CN110798227A (en) * 2019-09-19 2020-02-14 平安科技(深圳)有限公司 Model prediction optimization method, device and equipment and readable storage medium
CN111507479A (en) * 2020-04-15 2020-08-07 深圳前海微众银行股份有限公司 Feature binning method, device, equipment and computer-readable storage medium
CN111782900A (en) * 2020-08-06 2020-10-16 平安银行股份有限公司 Abnormal service detection method and device, electronic equipment and storage medium
CN112667741A (en) * 2020-04-13 2021-04-16 华控清交信息科技(北京)有限公司 Data processing method and device and data processing device
CN113495906A (en) * 2020-03-20 2021-10-12 北京京东振世信息技术有限公司 Data processing method and device, computer readable storage medium and electronic equipment
CN113837865A (en) * 2021-09-29 2021-12-24 重庆富民银行股份有限公司 Method for extracting multi-dimensional risk feature strategy
CN111782900B (en) * 2020-08-06 2024-03-19 平安银行股份有限公司 Abnormal service detection method and device, electronic equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070185896A1 (en) * 2006-02-01 2007-08-09 Oracle International Corporation Binning predictors using per-predictor trees and MDL pruning
US20070192341A1 (en) * 2006-02-01 2007-08-16 Oracle International Corporation System and method for building decision tree classifiers using bitmap techniques
US7792770B1 (en) * 2007-08-24 2010-09-07 Louisiana Tech Research Foundation; A Division Of Louisiana Tech University Foundation, Inc. Method to indentify anomalous data using cascaded K-Means clustering and an ID3 decision tree
US20140122381A1 (en) * 2012-10-25 2014-05-01 Microsoft Corporation Decision tree training in machine learning
CN103942604A (en) * 2013-01-18 2014-07-23 上海安迪泰信息技术有限公司 Prediction method and system based on forest discrimination model
US20150379430A1 (en) * 2014-06-30 2015-12-31 Amazon Technologies, Inc. Efficient duplicate detection for machine learning data sets
CN106250986A (en) * 2015-06-04 2016-12-21 波音公司 Advanced analysis base frame for machine learning
CN106707060A (en) * 2016-12-16 2017-05-24 中国电力科学研究院 Method for acquiring discrete state parameters of power transformer
CN107633265A (en) * 2017-09-04 2018-01-26 深圳市华傲数据技术有限公司 For optimizing the data processing method and device of credit evaluation model
CN108021984A (en) * 2016-11-01 2018-05-11 第四范式(北京)技术有限公司 Determine the method and system of the feature importance of machine learning sample
CN108182634A (en) * 2018-01-31 2018-06-19 国信优易数据有限公司 A kind of training method for borrowing or lending money prediction model, debt-credit Forecasting Methodology and device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070185896A1 (en) * 2006-02-01 2007-08-09 Oracle International Corporation Binning predictors using per-predictor trees and MDL pruning
US20070192341A1 (en) * 2006-02-01 2007-08-16 Oracle International Corporation System and method for building decision tree classifiers using bitmap techniques
US7792770B1 (en) * 2007-08-24 2010-09-07 Louisiana Tech Research Foundation; A Division Of Louisiana Tech University Foundation, Inc. Method to indentify anomalous data using cascaded K-Means clustering and an ID3 decision tree
US20140122381A1 (en) * 2012-10-25 2014-05-01 Microsoft Corporation Decision tree training in machine learning
CN103942604A (en) * 2013-01-18 2014-07-23 上海安迪泰信息技术有限公司 Prediction method and system based on forest discrimination model
US20150379430A1 (en) * 2014-06-30 2015-12-31 Amazon Technologies, Inc. Efficient duplicate detection for machine learning data sets
CN106250986A (en) * 2015-06-04 2016-12-21 波音公司 Advanced analysis base frame for machine learning
CN108021984A (en) * 2016-11-01 2018-05-11 第四范式(北京)技术有限公司 Determine the method and system of the feature importance of machine learning sample
CN106707060A (en) * 2016-12-16 2017-05-24 中国电力科学研究院 Method for acquiring discrete state parameters of power transformer
CN107633265A (en) * 2017-09-04 2018-01-26 深圳市华傲数据技术有限公司 For optimizing the data processing method and device of credit evaluation model
CN108182634A (en) * 2018-01-31 2018-06-19 国信优易数据有限公司 A kind of training method for borrowing or lending money prediction model, debt-credit Forecasting Methodology and device

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245688A (en) * 2019-05-21 2019-09-17 中国平安财产保险股份有限公司 A kind of method and relevant apparatus of data processing
CN110532266A (en) * 2019-08-28 2019-12-03 京东数字科技控股有限公司 A kind of method and apparatus of data processing
CN110708285A (en) * 2019-08-30 2020-01-17 中国平安人寿保险股份有限公司 Flow monitoring method, device, medium and electronic equipment
CN110798227A (en) * 2019-09-19 2020-02-14 平安科技(深圳)有限公司 Model prediction optimization method, device and equipment and readable storage medium
CN110798227B (en) * 2019-09-19 2023-07-25 平安科技(深圳)有限公司 Model prediction optimization method, device, equipment and readable storage medium
CN113495906A (en) * 2020-03-20 2021-10-12 北京京东振世信息技术有限公司 Data processing method and device, computer readable storage medium and electronic equipment
CN113495906B (en) * 2020-03-20 2023-09-26 北京京东振世信息技术有限公司 Data processing method and device, computer readable storage medium and electronic equipment
CN112667741A (en) * 2020-04-13 2021-04-16 华控清交信息科技(北京)有限公司 Data processing method and device and data processing device
CN112667741B (en) * 2020-04-13 2022-07-08 华控清交信息科技(北京)有限公司 Data processing method and device and data processing device
CN111507479B (en) * 2020-04-15 2021-08-10 深圳前海微众银行股份有限公司 Feature binning method, device, equipment and computer-readable storage medium
CN111507479A (en) * 2020-04-15 2020-08-07 深圳前海微众银行股份有限公司 Feature binning method, device, equipment and computer-readable storage medium
CN111782900A (en) * 2020-08-06 2020-10-16 平安银行股份有限公司 Abnormal service detection method and device, electronic equipment and storage medium
CN111782900B (en) * 2020-08-06 2024-03-19 平安银行股份有限公司 Abnormal service detection method and device, electronic equipment and storage medium
CN113837865A (en) * 2021-09-29 2021-12-24 重庆富民银行股份有限公司 Method for extracting multi-dimensional risk feature strategy

Also Published As

Publication number Publication date
CN109408583B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN109408583A (en) Data processing method and device, computer readable storage medium, electronic equipment
CN110222762A (en) Object prediction method, apparatus, equipment and medium
CN107357874A (en) User classification method and device, electronic equipment, storage medium
US9949140B2 (en) Visual representation of signal strength using machine learning models
CN110708285B (en) Flow monitoring method, device, medium and electronic equipment
CN111338897A (en) Identification method of abnormal node in application host, monitoring equipment and electronic equipment
CN109960650A (en) Application assessment method, apparatus, medium and electronic equipment based on big data
CN109656815A (en) There are test statement write method, device, medium and the electronic equipment of configuration file
CN109670161A (en) Commodity similarity calculating method and device, storage medium, electronic equipment
KR20220166241A (en) Method and apparatus for processing data, electronic device, storage medium and program
CN110109824A (en) Big data automatic regression test method, apparatus, computer equipment and storage medium
CN109871891A (en) A kind of object identification method, device and storage medium
CN109657056A (en) Target sample acquisition methods, device, storage medium and electronic equipment
CN113849848A (en) Data permission configuration method and system
CN109615312A (en) Business abnormal investigation method, apparatus, electronic equipment and storage medium in execution
US11373022B2 (en) Designing a structural product
CN109597482A (en) Automatic page turning method and apparatus, medium and the electronic equipment of e-book
CN110070016A (en) A kind of robot control method, device and storage medium
CN109522010A (en) Initial code adding method and device, storage medium, electronic equipment
CN109284450A (en) Order is at the determination method and device of single path, storage medium, electronic equipment
CN110334720A (en) Feature extracting method, device, server and the storage medium of business datum
CN110348581A (en) User characteristics optimization method, device, medium and electronic equipment in user characteristics group
CN109684207A (en) Method, apparatus, electronic equipment and the storage medium of sequence of operation encapsulation
CN110060183A (en) Client intelligent matching process, device, computer equipment and storage medium
CN110020195A (en) Article recommended method and device, storage medium, electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant