CN108984790A - A kind of data branch mailbox method and device - Google Patents

A kind of data branch mailbox method and device Download PDF

Info

Publication number
CN108984790A
CN108984790A CN201810858624.5A CN201810858624A CN108984790A CN 108984790 A CN108984790 A CN 108984790A CN 201810858624 A CN201810858624 A CN 201810858624A CN 108984790 A CN108984790 A CN 108984790A
Authority
CN
China
Prior art keywords
branch mailbox
sample data
positive sample
attribute value
mailbox
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810858624.5A
Other languages
Chinese (zh)
Inventor
曾伟雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Joint digital technology (Beijing) Co., Ltd
Original Assignee
Bee Wisdom (beijing) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bee Wisdom (beijing) Technology Co Ltd filed Critical Bee Wisdom (beijing) Technology Co Ltd
Priority to CN201810858624.5A priority Critical patent/CN108984790A/en
Publication of CN108984790A publication Critical patent/CN108984790A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a kind of data branch mailbox method and devices, which comprises according to the attribute value of positive sample data each in preset ordering rule and full dose sample data, is ranked up to the positive sample data;According to the destination number to branch mailbox, the positive sample data after sequence are divided into multiple groups, wherein for each group of positive sample attribution data in a branch mailbox, the quantity of grouping is identical as the destination number;The Target Attribute values section of the branch mailbox is determined according to the attribute value of positive sample data each in the branch mailbox for each branch mailbox;According to the attribute value of negative sample data each in the Target Attribute values section of determining each branch mailbox and full dose sample data, branch mailbox is carried out to the negative sample data.Provide the data branch mailbox scheme that a kind of quantity for guaranteeing positive sample data in each branch mailbox is almost the same, and the quantity of positive sample data in different branch mailbox is avoided to have big difference.

Description

A kind of data branch mailbox method and device
Technical field
The present invention relates to big data scientific domain more particularly to a kind of data branch mailbox method and devices.
Background technique
Data branch mailbox is a kind of common data preprocessing method, and branch mailbox is actually according to the corresponding a certain category of data Property attribute value divide subinterval, such as according to the age divide subinterval, according to height division subinterval, if one The attribute value of data is within the scope of some subinterval, and just the data are put into branch mailbox representated by the subinterval (bin).Point Data after case are usually used in being smoothed data, or eliminate a large amount of unduplicated attribute values, such as: it can be for every A branch mailbox, by mean value, median, the boundary value of attribute value etc. of data all in the branch mailbox, to replace every number in the branch mailbox According to attribute value.
Branch mailbox method commonly used in the prior art is usually that the value range of attribute value is divided into k wide sections, often A section gathers data for k class as k branch mailbox as a branch mailbox, or according to the corresponding attribute value of data, but is clustering It needs to guarantee the order of branch mailbox in the process, that is, guarantees that the corresponding attribute-value ranges of each branch mailbox are not overlapped.However in reality, There are many small probability events, and such as transaction swindling event, the probability of generation is about 2/10000ths (2BP), when needing to small probability When event models, it is often necessary to carry out branch mailbox to modeling full dose sample data used, and in modeling usually by full dose sample In notebook data, the sample data for the generation small probability event for needing to predict is known as positive sample data, such as above-mentioned transaction swindling data, The sample data that small probability event does not occur is known as negative sample data, such as arm's length dealing data.However, according to existing branch mailbox side Method carries out branch mailbox to full dose sample data, it is easy to which the quantity for positive sample data in the branch mailbox of part occur is more, in the branch mailbox of part The case where quantity probability of positive sample data is zero even zero, the quantity of positive sample data has big difference in different branch mailbox, leads Cause the model established based on branch mailbox unstable.
Summary of the invention
The present invention provides a kind of branch mailbox method and device, to solve to exist in the prior art positive sample number in different branch mailbox According to quantity have big difference, lead to the unstable problem of model established based on branch mailbox.
In a first aspect, the invention discloses a kind of data branch mailbox methods, which comprises
According to the attribute value of positive sample data each in preset ordering rule and full dose sample data, to the positive sample Data are ranked up;
According to the destination number to branch mailbox, the positive sample data after sequence are divided into multiple groups, wherein each group Positive sample attribution data in a branch mailbox, the quantity of grouping is identical as the destination number;
The objective attribute target attribute of the branch mailbox is determined according to the attribute value of positive sample data each in the branch mailbox for each branch mailbox It is worth section;
According to the category of negative sample data each in the Target Attribute values section of determining each branch mailbox and full dose sample data Property value, to the negative sample data carry out branch mailbox.
Further, the positive sample data after sequence are divided into multiple by destination number of the basis to branch mailbox Group includes:
According to tagy_bin=1+int (mybinnum* (tagx_count/ (fnum+beta))), each positive sample is determined The grouping of attribution data, wherein tagy_bin be positive sample data ownership grouping grouping serial number, mybinnum be to branch mailbox Destination number, tagx_count is positive sample data corresponding sequence serial number after sequence, and fnum is positive the total of sample data Quantity, beta is one and is not more than 0 and the real number less than 1, and int is downward bracket function.
Further, the positive sample data after sequence are divided into multiple by destination number of the basis to branch mailbox After group, the Target Attribute values of the branch mailbox are determined according to the attribute value of positive sample data each in the branch mailbox for each branch mailbox Before section, the method also includes:
For two branch mailbox of arbitrary neighborhood, the maximum attribute value of positive sample data in two adjacent branch mailbox is identified It is whether identical;
If so, two adjacent branch mailbox are merged into a branch mailbox;If not, carrying out subsequent step.
Further, described to be directed to each branch mailbox if preset ordering rule is ascending sort, according to every in the branch mailbox The attribute value of a positive sample data determines that the Target Attribute values section of the branch mailbox includes:
For each branch mailbox, identify whether the branch mailbox is first branch mailbox or the last one branch mailbox;
If not, by the maximum attribute value of positive sample data in a upper branch mailbox adjacent with the branch mailbox into the branch mailbox positive sample The attribute value section that the maximum attribute value of notebook data is constituted, the Target Attribute values section as the branch mailbox;
If so, judging whether the branch mailbox is first branch mailbox;If the branch mailbox is first branch mailbox, will bear infinite to this The attribute value section that the maximum attribute value of positive sample data is constituted in branch mailbox, the Target Attribute values section as the branch mailbox;If The branch mailbox is not first branch mailbox, by the maximum attribute value of positive sample data in a upper branch mailbox adjacent with the branch mailbox to just infinite The attribute value section of composition, the Target Attribute values section as the branch mailbox.
Further, described to be directed to each branch mailbox if preset ordering rule is descending sort, according to every in the branch mailbox The attribute value of a positive sample data determines that the Target Attribute values section of the branch mailbox includes:
For each branch mailbox, identify whether the branch mailbox is first branch mailbox or the last one branch mailbox;
If not, by the maximum attribute value of positive sample data in next branch mailbox adjacent with the branch mailbox into the branch mailbox positive sample The attribute value section that the maximum attribute value of notebook data is constituted, the Target Attribute values section as the branch mailbox;
If so, judging whether the branch mailbox is first branch mailbox;If the branch mailbox is first branch mailbox, will be with the branch mailbox phase Target of the maximum attribute value of positive sample data to the attribute value section of just infinite composition, as the branch mailbox in adjacent next branch mailbox Attribute value section;If the branch mailbox is not first branch mailbox, the maximum attribute value of the infinite positive sample data into the branch mailbox will be born The attribute value section of composition, the Target Attribute values section as the branch mailbox.
Second aspect, the invention discloses a kind of data box separation device, described device includes:
Sorting module, for the attribute according to positive sample data each in preset ordering rule and full dose sample data Value, is ranked up the positive sample data;
First branch mailbox module, for according to the destination number to branch mailbox, the positive sample data after sequence to be divided into Multiple groups, wherein for each group of positive sample attribution data in a branch mailbox, the quantity of grouping is identical as the destination number;
Determining module, according to the attribute value of positive sample data each in the branch mailbox, determines this point for being directed to each branch mailbox The Target Attribute values section of case;
Second branch mailbox module, for according in the Target Attribute values section of determining each branch mailbox and full dose sample data The attribute value of each negative sample data carries out branch mailbox to the negative sample data.
Further, the first branch mailbox module is specifically used for according to tagy_bin=1+int (mybinnum* (tagx_ Count/ (fnum+beta))), determine the grouping of each positive sample attribution data, wherein tagy_bin be positive sample data ownership Grouping grouping serial number, mybinnum is destination number to branch mailbox, and the tagx_count sample data that is positive is right after sequence The sequence serial number answered, fnum are positive the total quantity of sample data, and beta is one and is not more than 0 and the real number less than 1, int be to Lower bracket function.
Further, described device further include:
Merging module identifies positive sample number in two adjacent branch mailbox for being directed to two branch mailbox of arbitrary neighborhood According to maximum attribute value it is whether identical;If so, two adjacent branch mailbox are merged into a branch mailbox, and trigger determining mould Block, if not, directly triggering determining module.
Further, the determining module, if being specifically used for preset ordering rule is ascending sort, for each point Case identifies whether the branch mailbox is first branch mailbox or the last one branch mailbox;If not, by a upper branch mailbox adjacent with the branch mailbox The attribute value section that the maximum attribute value of the maximum attribute value of positive sample data positive sample data into the branch mailbox is constituted, as this The Target Attribute values section of branch mailbox;If so, judging whether the branch mailbox is first branch mailbox;If the branch mailbox is the first point Case, the attribute value section that the maximum attribute value for bearing the infinite positive sample data into the branch mailbox is constituted, the target as the branch mailbox Attribute value section;If the branch mailbox is not first branch mailbox, most by positive sample data in a upper branch mailbox adjacent with the branch mailbox Target Attribute values section of the large attribute value to the attribute value section of just infinite composition, as the branch mailbox.
Further, the determining module, if being specifically used for preset ordering rule is descending sort, for each point Case identifies whether the branch mailbox is first branch mailbox or the last one branch mailbox;If not, by next branch mailbox adjacent with the branch mailbox The attribute value section that the maximum attribute value of the maximum attribute value of positive sample data positive sample data into the branch mailbox is constituted, as this The Target Attribute values section of branch mailbox;If so, judging whether the branch mailbox is first branch mailbox;If the branch mailbox is the first point Case, by the maximum attribute value of positive sample data in next branch mailbox adjacent with the branch mailbox to the attribute value section of just infinite composition, Target Attribute values section as the branch mailbox;If the branch mailbox is not first branch mailbox, the infinite positive sample into the branch mailbox will be born The attribute value section that the maximum attribute value of data is constituted, the Target Attribute values section as the branch mailbox.
The invention discloses a kind of data branch mailbox method and devices, which comprises according to preset ordering rule and The attribute value of each positive sample data in full dose sample data, is ranked up the positive sample data;According to the mesh to branch mailbox Quantity is marked, the positive sample data after sequence are divided into multiple groups, wherein each group of positive sample attribution data is in one Branch mailbox, the quantity of grouping are identical as the destination number;For each branch mailbox, according to the category of positive sample data each in the branch mailbox Property value, determines the Target Attribute values section of the branch mailbox;According to the Target Attribute values section of determining each branch mailbox and full dose sample The attribute value of each negative sample data in data carries out branch mailbox to the negative sample data.Due in embodiments of the present invention, root According to the destination number to branch mailbox, the positive sample data after being sorted according to attribute value are divided into multiple groups, each group of positive sample Attribution data is directed to each branch mailbox in a branch mailbox, and according to the attribute value of positive sample data each in the branch mailbox, determining should The Target Attribute values section of branch mailbox ensure that the quantity of positive sample data in the Target Attribute values section of each branch mailbox basic one It causes, and then the quantity for avoiding positive sample data in different branch mailbox has big difference, causes the model established based on branch mailbox unstable The problem of.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of one of data branch mailbox process schematic provided in an embodiment of the present invention;
Fig. 2 is the two of a kind of data branch mailbox process schematic provided in an embodiment of the present invention;
Fig. 3 is a kind of data box separation device structural schematic diagram provided in an embodiment of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, make below in conjunction with the attached drawing present invention into one Step ground detailed description, it is clear that described embodiment is only a part of the embodiments of the present invention, rather than whole implementation Example.Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts Every other embodiment, shall fall within the protection scope of the present invention.
In existing branch mailbox method, the value range of attribute value is such as divided into k wide sections, by taking k is 8 as an example, area Between be followed successively by (- ∞, 10), [10,20), [20,30), [30,40), [40,50), [50,60), [60,70), [70 ,+∞), often A section is followed successively by branch mailbox 1, branch mailbox 2, branch mailbox 3, branch mailbox 4, branch mailbox 5, branch mailbox 6, branch mailbox 7, branch mailbox 8, example as a branch mailbox Property, positive sample data and its attribute value include: positive sample data A attribute value be 24, positive sample data B attribute value is 25, just Sample data C attribute value is 17, positive sample data D attribute value is 28, positive sample data E attribute value is 35, positive sample data F belongs to Property value be 23, after branch mailbox, in branch mailbox 1 comprising in 0 positive sample data, branch mailbox 2 comprising including in 1 positive sample data, branch mailbox 3 Comprising 1 positive sample data, in branch mailbox 5 comprising including 0 in 0 positive sample data, branch mailbox 6 in 4 positive sample data, branch mailbox 4 Comprising including 0 positive sample data in 0 positive sample data, branch mailbox 8 in a positive sample data, branch mailbox 7, wherein there are parts The case where quantity of positive sample data is more in branch mailbox, the quantity probability of positive sample data is zero even zero in the branch mailbox of part, The quantity of positive sample data has big difference in different branch mailbox, and the application, which aims to solve the problem that, exists in the prior art positive sample in different branch mailbox The quantity of notebook data has big difference, and the problem for causing the model established based on branch mailbox unstable is carried out now in conjunction with following embodiments It illustrates.
Embodiment 1:
Fig. 1 is a kind of data branch mailbox process schematic provided in an embodiment of the present invention, which includes:
S101: according to the attribute value of positive sample data each in preset ordering rule and full dose sample data, to described Positive sample data are ranked up.
Data branch mailbox method provided in an embodiment of the present invention is applied to electronic equipment, which can be mobile phone, a The equipment such as people's computer (PC), tablet computer, or the equipment such as server, server cluster.
In embodiments of the present invention, the attribute value of data can be the value of any attribute in the data, such as age, transaction The value of number, transaction amount, trade date etc..In addition preset ordering rule, such as ascending order have also been pre-saved in electronic equipment Sequence, descending sort etc..Preferably, electronic equipment is also provided with operation circle selected or input to ordering rule and attribute Face, user can select or input ordering rule by the operation interface, select or input the attribute for carrying out data branch mailbox foundation, No longer repeated.
Specifically, category of the electronic equipment according to positive sample data each in preset ordering rule and full dose sample data Property value, is ranked up positive sample data.It is illustrative: if preset ordering rule is ascending sort, in full dose data just Sample data and its attribute value include: positive sample data A attribute value be 24, positive sample data B attribute value is 25, positive sample data C attribute value is 17, positive sample data D attribute value is 28, positive sample data E attribute value is 35, positive sample data F attribute value is 23, after sequence, the sequence of positive sample are as follows: positive sample data C, positive sample data F, positive sample data A, positive sample data B, positive sample Notebook data D, positive sample data E.
S102: according to the destination number to branch mailbox, the positive sample data after sequence are divided into multiple groups, wherein For each group of positive sample attribution data in a branch mailbox, the quantity of grouping is identical as the destination number.
In embodiments of the present invention, electronic equipment can also be provided with the operation that the destination number of branch mailbox is waited for for user setting The destination number to branch mailbox can be arranged by the operation interface in interface, user.Specifically, electronic equipment is according to branch mailbox Positive sample data after sequence are divided into multiple groups by destination number, and each group of positive sample attribution data is in one point Case, the quantity of grouping are identical as the destination number to branch mailbox.
Such as: the destination number to branch mailbox is 3, and positive sample data are positive sample data C, positive sample data F, just after sequence Sample data A, positive sample data B, positive sample data D, positive sample data E, after positive sample data are divided into 3 groups after sequence To organize 1: positive sample data C, positive sample data F;Group 2: positive sample data A, positive sample data B, group 3: positive sample data D, just Sample data E, wherein the corresponding branch mailbox 1 of group 1, the corresponding branch mailbox 2 of group 2, the corresponding branch mailbox 3 of group 3.
S103: the target of the branch mailbox is determined according to the attribute value of positive sample data each in the branch mailbox for each branch mailbox Attribute value section.
Specifically, electronic equipment is directed to each branch mailbox, it can be according to the minimum attribute values of positive sample data in the branch mailbox extremely The section that maximum attribute value is constituted, as the objective attribute target attribute section of the branch mailbox, such as: the minimum of positive sample data belongs in branch mailbox 4 Property value be 20, maximum attribute value is 25, determine the branch mailbox objective attribute target attribute section be [20-25].
In addition, because of the section directly constituted according to the minimum attribute value of positive sample data in branch mailbox to maximum attribute value, really The fixed branch mailbox Target Attribute values section, it is discontinuous to may result in corresponding objective attribute target attribute section between two adjacent branch mailbox, Preferably, in embodiments of the present invention, being determined for each branch mailbox according to the attribute value of positive sample data each in the branch mailbox The Target Attribute values section of the branch mailbox include: according to the maximum attribute value of positive sample data in the branch mailbox, and it is adjacent with the branch mailbox Branch mailbox in positive sample data maximum attribute value, determine the Target Attribute values section of the branch mailbox, it is specific:
If preset ordering rule is ascending sort, described to be directed to each branch mailbox, according to positive sample each in the branch mailbox The attribute value of data determines that the Target Attribute values section of the branch mailbox includes:
For each branch mailbox, identify whether the branch mailbox is first branch mailbox or the last one branch mailbox;
If not, by the maximum attribute value of positive sample data in a upper branch mailbox adjacent with the branch mailbox into the branch mailbox positive sample The attribute value section that the maximum attribute value of notebook data is constituted, the Target Attribute values section as the branch mailbox;
If so, judging whether the branch mailbox is first branch mailbox;If the branch mailbox is first branch mailbox, will bear infinite to this The attribute value section that the maximum attribute value of positive sample data is constituted in branch mailbox, the Target Attribute values section as the branch mailbox;If The branch mailbox is not first branch mailbox, by the maximum attribute value of positive sample data in a upper branch mailbox adjacent with the branch mailbox to just infinite The attribute value section of composition, the Target Attribute values section as the branch mailbox.
Specifically, each branch mailbox is directed to if preset ordering rule is ascending sort, if the branch mailbox is first Branch mailbox then illustrates that branch mailbox is not present before the branch mailbox, the corresponding objective attribute target attribute interval range of the branch mailbox, should cover bear it is infinite to should The attribute value section that the maximum attribute value of positive sample data is constituted in branch mailbox;If the branch mailbox is non-first branch mailbox and non-last One branch mailbox then illustrates there is branch mailbox before and after the branch mailbox, by positive sample number in a upper branch mailbox adjacent with the branch mailbox According to maximum attribute value (do not include the maximum attribute value) positive sample data into the branch mailbox attribute for constituting of maximum attribute value It is worth section, the Target Attribute values section as the branch mailbox;If the branch mailbox is the last one branch mailbox, illustrate the branch mailbox later not There are branch mailbox, the corresponding objective attribute target attribute interval range of the branch mailbox should cover positive sample number in a upper branch mailbox adjacent with the branch mailbox According to maximum attribute value (do not include the maximum attribute value) to the attribute value section of just infinite composition.
If preset ordering rule is descending sort, described to be directed to each branch mailbox, according to positive sample each in the branch mailbox The attribute value of data determines that the Target Attribute values section of the branch mailbox includes:
For each branch mailbox, identify whether the branch mailbox is first branch mailbox or the last one branch mailbox;
If not, by the maximum attribute value of positive sample data in next branch mailbox adjacent with the branch mailbox into the branch mailbox positive sample The attribute value section that the maximum attribute value of notebook data is constituted, the Target Attribute values section as the branch mailbox;
If so, judging whether the branch mailbox is first branch mailbox;If the branch mailbox is first branch mailbox, will be with the branch mailbox phase Target of the maximum attribute value of positive sample data to the attribute value section of just infinite composition, as the branch mailbox in adjacent next branch mailbox Attribute value section;If the branch mailbox is not first branch mailbox, the maximum attribute value of the infinite positive sample data into the branch mailbox will be born The attribute value section of composition, the Target Attribute values section as the branch mailbox.
Specifically, each branch mailbox is directed to if preset ordering rule is descending sort, if the branch mailbox is first Branch mailbox then illustrates that branch mailbox is not present before in the branch mailbox, and the corresponding objective attribute target attribute interval range of the branch mailbox should be covered and the branch mailbox phase The maximum attribute value (not including the maximum attribute value) of positive sample data is to the attribute value of just infinite composition in adjacent next branch mailbox Section;If the branch mailbox is non-first branch mailbox and the last one non-branch mailbox, illustrates to exist before and after the branch mailbox and divide Case, by the maximum attribute value (not including the maximum attribute value) of positive sample data in next branch mailbox adjacent with the branch mailbox to this point The attribute value section that the maximum attribute value of positive sample data is constituted in case, the Target Attribute values section as the branch mailbox;If should Branch mailbox is the last one branch mailbox, then illustrates that branch mailbox is not present after the branch mailbox, the corresponding objective attribute target attribute interval range of the branch mailbox is answered Cover and bears the attribute value section that the maximum attribute value of the infinite positive sample data into the branch mailbox is constituted.
S104: according to negative sample number each in the Target Attribute values section of determining each branch mailbox and full dose sample data According to attribute value, to the negative sample data carry out branch mailbox.
Specifically, after determining the Target Attribute values section of each branch mailbox, for each negative sample in full dose sample data The negative sample data are divided into the branch mailbox in corresponding objective attribute target attribute section by data according to the attribute value of the negative sample data, are completed To the branch mailbox of full dose sample data;It of courses, it can also be according to the Target Attribute values section of each branch mailbox, to full dose sample data Carry out branch mailbox again.
Due in embodiments of the present invention, according to the destination number to branch mailbox, by the positive sample after being sorted according to attribute value Data are divided into multiple groups, and each group of positive sample attribution data is directed to each branch mailbox in a branch mailbox, according to the branch mailbox In each positive sample data attribute value, determine the Target Attribute values section of the branch mailbox, ensure that the objective attribute target attribute of each branch mailbox The quantity for being worth positive sample data in section is almost the same, and then the quantity for avoiding positive sample data in different branch mailbox differed Greatly, the problem for causing the model established based on branch mailbox unstable.
In addition, carrying out the corresponding objective attribute target attribute of each branch mailbox just for positive sample data due in embodiments of the present invention It is worth the determination in section, data branch mailbox process is succinct, and is directed to small probability event data, since positive sample data are with respect to full dose sample Notebook data only accounts for very small part, can be further improved the efficiency of data branch mailbox.
Embodiment 2:
In order to guarantee that the positive sample data for including in each branch mailbox are almost the same, on the basis of the above embodiments, at this In inventive embodiments, the positive sample data after sequence are divided into multiple groups of packets by destination number of the basis to branch mailbox It includes:
According to tagy_bin=1+int (mybinnum* (tagx_count/ (fnum+beta))), each positive sample is determined The grouping of attribution data, wherein tagy_bin be positive sample data ownership grouping grouping serial number, mybinnum be to branch mailbox Destination number, tagx_count is positive sample data corresponding sequence serial number after sequence, and fnum is positive the total of sample data Quantity, beta is one and is not more than 0 and the real number less than 1, and int is downward bracket function.
Embodiment 3:
In order to avoid there is overlapping in the corresponding Target Attribute values section of different branch mailbox, on the basis of the various embodiments described above, In embodiments of the present invention, the positive sample data after sequence are divided into multiple by destination number of the basis to branch mailbox After group, the Target Attribute values of the branch mailbox are determined according to the attribute value of positive sample data each in the branch mailbox for each branch mailbox Before section, the method also includes:
For two branch mailbox of arbitrary neighborhood, the maximum attribute value of positive sample data in two adjacent branch mailbox is identified It is whether identical;
If so, two adjacent branch mailbox are merged into a branch mailbox;If not, carrying out subsequent step.
Specifically, because being in embodiments of the present invention according to preset ordering rule, according to each positive sample data After attribute value is ranked up positive sample data, it is divided into multiple groups, and each group of positive sample attribution data is in one point Case, then the maximum attribute value of positive sample data, also complies with preset ordering rule in each branch mailbox being arranged successively, as ascending order is arranged Sequence or descending sort.Therefore, if the maximum attribute value of positive sample data is identical in two branch mailbox of arbitrary neighborhood, illustrate institute It states in two adjacent branch mailbox, there are the attribute value of each positive sample data in a branch mailbox is identical, and in the branch mailbox The attribute value it is identical as the attribute value of at least one positive sample data in another branch mailbox, need described adjacent two A branch mailbox merges into a branch mailbox.Illustratively, it is illustrated by taking ascending sort as an example, such as: the positive sample for including in branch mailbox 5 The attribute value of data is followed successively by 31,31,32,33, the attribute value for the positive sample data for including in branch mailbox 6 is followed successively by 33,33,33, 33, the maximum attribute value of positive sample data is identical in branch mailbox 5 and branch mailbox 6, and branch mailbox 5 and branch mailbox 6 are merged into a branch mailbox.
Fig. 2 is a kind of branch mailbox process schematic provided in an embodiment of the present invention, which includes:
S201: according to the attribute value of positive sample data each in preset ordering rule and full dose sample data, to described Positive sample data are ranked up.
S202: according to the destination number to branch mailbox, the positive sample data after sequence are divided into multiple groups, wherein For each group of positive sample attribution data in a branch mailbox, the quantity of grouping is identical as the destination number.
S203: for two branch mailbox of arbitrary neighborhood, the maximum of positive sample data in two adjacent branch mailbox is identified Whether attribute value is identical;If so, two adjacent branch mailbox are merged into a branch mailbox.
S204: the target of the branch mailbox is determined according to the attribute value of positive sample data each in the branch mailbox for each branch mailbox Attribute value section.
S205: according to negative sample number each in the Target Attribute values section of determining each branch mailbox and full dose sample data According to attribute value, to the negative sample data carry out branch mailbox.
Embodiment 4:
Fig. 3 is a kind of data box separation device structural schematic diagram provided in an embodiment of the present invention, which includes:
Sorting module 31, for the attribute according to positive sample data each in preset ordering rule and full dose sample data Value, is ranked up the positive sample data;
First branch mailbox module 32, for according to the destination number to branch mailbox, the positive sample data after sequence to be divided equally It is multiple groups, wherein for each group of positive sample attribution data in a branch mailbox, the quantity of grouping is identical as the destination number;
Determining module 33, for being directed to each branch mailbox, according to the attribute value of positive sample data each in the branch mailbox, determining should The Target Attribute values section of branch mailbox;
Second branch mailbox module 34, for according to determining each branch mailbox Target Attribute values section and full dose sample data In each negative sample data attribute value, to the negative sample data carry out branch mailbox.
The first branch mailbox module 32 is specifically used for according to tagy_bin=1+int (mybinnum* (tagx_count/ (fnum+beta))), determine the grouping of each positive sample attribution data, wherein tagy_bin be positive sample data ownership grouping Grouping serial number, mybinnum is destination number to branch mailbox, and tagx_count is positive sample data corresponding row after sequence Sequence serial number, fnum are positive the total quantity of sample data, and beta is one and is not more than 0 and the real number less than 1, and int is to be rounded downwards Function.
Described device further include:
Merging module 35 identifies positive sample in two adjacent branch mailbox for being directed to two branch mailbox of arbitrary neighborhood Whether the maximum attribute value of data is identical;If so, two adjacent branch mailbox are merged into a branch mailbox, and trigger determination Module, if not, directly triggering determining module.
The determining module 33, if being specifically used for preset ordering rule is ascending sort, for each branch mailbox, identification Whether the branch mailbox is first branch mailbox or the last one branch mailbox;If not, by positive sample in a upper branch mailbox adjacent with the branch mailbox The attribute value section that the maximum attribute value of the maximum attribute value of data positive sample data into the branch mailbox is constituted, as the branch mailbox Target Attribute values section;If so, judging whether the branch mailbox is first branch mailbox;If the branch mailbox is first branch mailbox, will bear The attribute value section that the maximum attribute value of the infinite positive sample data into the branch mailbox is constituted, the Target Attribute values area as the branch mailbox Between;If the branch mailbox is not first branch mailbox, by the maximum attribute value of positive sample data in a upper branch mailbox adjacent with the branch mailbox Target Attribute values section to the attribute value section of just infinite composition, as the branch mailbox.
The determining module 33, if being specifically used for preset ordering rule is descending sort, for each branch mailbox, identification Whether the branch mailbox is first branch mailbox or the last one branch mailbox;If not, by positive sample in next branch mailbox adjacent with the branch mailbox The attribute value section that the maximum attribute value of the maximum attribute value of data positive sample data into the branch mailbox is constituted, as the branch mailbox Target Attribute values section;If so, judging whether the branch mailbox is first branch mailbox;It, will be with if the branch mailbox is first branch mailbox The maximum attribute value of positive sample data is to the attribute value section of just infinite composition in the adjacent next branch mailbox of the branch mailbox, as this point The Target Attribute values section of case;If the branch mailbox is not first branch mailbox, the infinite positive sample data into the branch mailbox will be born most The attribute value section that large attribute value is constituted, the Target Attribute values section as the branch mailbox.
The invention discloses a kind of data branch mailbox method and devices, which comprises according to preset ordering rule and The attribute value of each positive sample data in full dose sample data, is ranked up the positive sample data;According to the mesh to branch mailbox Quantity is marked, the positive sample data after sequence are divided into multiple groups, wherein each group of positive sample attribution data is in one Branch mailbox, the quantity of grouping are identical as the destination number;For each branch mailbox, according to the category of positive sample data each in the branch mailbox Property value, determines the Target Attribute values section of the branch mailbox;According to the Target Attribute values section of determining each branch mailbox and full dose sample The attribute value of each negative sample data in data carries out branch mailbox to the negative sample data.Due in embodiments of the present invention, root According to the destination number to branch mailbox, the positive sample data after being sorted according to attribute value are divided into multiple groups, each group of positive sample Attribution data is directed to each branch mailbox in a branch mailbox, and according to the attribute value of positive sample data each in the branch mailbox, determining should The Target Attribute values section of branch mailbox ensure that the quantity of positive sample data in the Target Attribute values section of each branch mailbox basic one It causes, and then the quantity for avoiding positive sample data in different branch mailbox has big difference, causes the model established based on branch mailbox unstable The problem of.
For systems/devices embodiment, since it is substantially similar to the method embodiment, so the comparison of description is simple Single, the relevent part can refer to the partial explaination of embodiments of method.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications can be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the application range.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (10)

1. a kind of data branch mailbox method, which is characterized in that the described method includes:
According to the attribute value of positive sample data each in preset ordering rule and full dose sample data, to the positive sample data It is ranked up;
According to the destination number to branch mailbox, the positive sample data after sequence are divided into multiple groups, wherein each group just Sample data belongs to a branch mailbox, and the quantity of grouping is identical as the destination number;
The Target Attribute values area of the branch mailbox is determined according to the attribute value of positive sample data each in the branch mailbox for each branch mailbox Between;
According to the attribute of negative sample data each in the Target Attribute values section of determining each branch mailbox and full dose sample data Value carries out branch mailbox to the negative sample data.
2. the method as described in claim 1, which is characterized in that destination number of the basis to branch mailbox, by the institute after sequence It states positive sample data and is divided into multiple groups and include:
According to tagy_bin=1+int (mybinnum* (tagx_count/ (fnum+beta))), each positive sample data are determined The grouping of ownership, wherein tagy_bin is positive the grouping serial number of the grouping of sample data ownership, and mybinnum is mesh to branch mailbox Quantity is marked, tagx_count is positive sample data corresponding sequence serial number after sequence, and fnum is positive the total quantity of sample data, Beta is one and is not more than 0 and the real number less than 1, and int is downward bracket function.
3. the method as described in claim 1, which is characterized in that destination number of the basis to branch mailbox, by the institute after sequence Positive sample data are stated to be divided into after multiple groups, for each branch mailbox, according to the attribute value of positive sample data each in the branch mailbox, Before the Target Attribute values section for determining the branch mailbox, the method also includes:
For two branch mailbox of arbitrary neighborhood, identify positive sample data in two adjacent branch mailbox maximum attribute value whether It is identical;
If so, two adjacent branch mailbox are merged into a branch mailbox;If not, carrying out subsequent step.
4. the method as described in claim 1, which is characterized in that described to be directed to if preset ordering rule is ascending sort Each branch mailbox determines that the Target Attribute values section of the branch mailbox includes: according to the attribute value of positive sample data each in the branch mailbox
For each branch mailbox, identify whether the branch mailbox is first branch mailbox or the last one branch mailbox;
If not, by the maximum attribute value of positive sample data in a upper branch mailbox adjacent with the branch mailbox into the branch mailbox positive sample number According to the attribute value section that constitutes of maximum attribute value, Target Attribute values section as the branch mailbox;
If so, judging whether the branch mailbox is first branch mailbox;If the branch mailbox is first branch mailbox, will bear infinite to the branch mailbox The attribute value section that the maximum attribute value of middle positive sample data is constituted, the Target Attribute values section as the branch mailbox;If this point Case is not first branch mailbox, by the maximum attribute value of positive sample data in a upper branch mailbox adjacent with the branch mailbox to just infinite composition Attribute value section, the Target Attribute values section as the branch mailbox.
5. the method as described in claim 1, which is characterized in that described to be directed to if preset ordering rule is descending sort Each branch mailbox determines that the Target Attribute values section of the branch mailbox includes: according to the attribute value of positive sample data each in the branch mailbox
For each branch mailbox, identify whether the branch mailbox is first branch mailbox or the last one branch mailbox;
If not, by the maximum attribute value of positive sample data in next branch mailbox adjacent with the branch mailbox into the branch mailbox positive sample number According to the attribute value section that constitutes of maximum attribute value, Target Attribute values section as the branch mailbox;
If so, judging whether the branch mailbox is first branch mailbox;If the branch mailbox is first branch mailbox, will be adjacent with the branch mailbox Objective attribute target attribute of the maximum attribute value of positive sample data to the attribute value section of just infinite composition, as the branch mailbox in next branch mailbox It is worth section;If the branch mailbox is not first branch mailbox, the maximum attribute value for bearing the infinite positive sample data into the branch mailbox is constituted Attribute value section, the Target Attribute values section as the branch mailbox.
6. a kind of data box separation device, which is characterized in that described device includes:
Sorting module is right for the attribute value according to positive sample data each in preset ordering rule and full dose sample data The positive sample data are ranked up;
First branch mailbox module, for the positive sample data after sequence being divided into multiple according to the destination number to branch mailbox Group, wherein for each group of positive sample attribution data in a branch mailbox, the quantity of grouping is identical as the destination number;
Determining module, for determining the branch mailbox according to the attribute value of positive sample data each in the branch mailbox for each branch mailbox Target Attribute values section;
Second branch mailbox module, for according to each in the Target Attribute values section of determining each branch mailbox and full dose sample data The attribute value of negative sample data carries out branch mailbox to the negative sample data.
7. device as claimed in claim 6, which is characterized in that the first branch mailbox module is specifically used for according to tagy_bin =1+int (mybinnum* (tagx_count/ (fnum+beta))), determines the grouping of each positive sample attribution data, wherein Tagy_bin be positive sample data ownership grouping grouping serial number, mybinnum is destination number to branch mailbox, tagx_ Count is positive sample data corresponding sequence serial number after sequence, and fnum is positive the total quantity of sample data, beta be one not Real number greater than 0 and less than 1, int are downward bracket function.
8. device as claimed in claim 6, which is characterized in that described device further include:
Merging module identifies positive sample data in two adjacent branch mailbox for being directed to two branch mailbox of arbitrary neighborhood Whether maximum attribute value is identical;If so, two adjacent branch mailbox are merged into a branch mailbox, and determining module is triggered, If not, directly triggering determining module.
9. device as claimed in claim 6, which is characterized in that the determining module is advised if being specifically used for preset sequence Then identify whether the branch mailbox is first branch mailbox or the last one branch mailbox for each branch mailbox for ascending sort;If not, will The maximum attribute of maximum attribute value positive sample data into the branch mailbox of positive sample data in a upper branch mailbox adjacent with the branch mailbox It is worth the attribute value section constituted, the Target Attribute values section as the branch mailbox;If so, judging whether the branch mailbox is the first point Case;If the branch mailbox is first branch mailbox, the attribute that the maximum attribute value for bearing the infinite positive sample data into the branch mailbox is constituted It is worth section, the Target Attribute values section as the branch mailbox;It, will be adjacent with the branch mailbox upper if the branch mailbox is not first branch mailbox Target Attribute values of the maximum attribute value of positive sample data to the attribute value section of just infinite composition, as the branch mailbox in one branch mailbox Section.
10. device as claimed in claim 6, which is characterized in that the determining module is advised if being specifically used for preset sequence Then identify whether the branch mailbox is first branch mailbox or the last one branch mailbox for each branch mailbox for descending sort;If not, will The maximum attribute of maximum attribute value positive sample data into the branch mailbox of positive sample data in next branch mailbox adjacent with the branch mailbox It is worth the attribute value section constituted, the Target Attribute values section as the branch mailbox;If so, judging whether the branch mailbox is the first point Case;If the branch mailbox is first branch mailbox, extremely by the maximum attribute value of positive sample data in next branch mailbox adjacent with the branch mailbox The attribute value section of just infinite composition, the Target Attribute values section as the branch mailbox;It, will if the branch mailbox is not first branch mailbox Bear the attribute value section that the maximum attribute value of the infinite positive sample data into the branch mailbox is constituted, the Target Attribute values as the branch mailbox Section.
CN201810858624.5A 2018-07-31 2018-07-31 A kind of data branch mailbox method and device Pending CN108984790A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810858624.5A CN108984790A (en) 2018-07-31 2018-07-31 A kind of data branch mailbox method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810858624.5A CN108984790A (en) 2018-07-31 2018-07-31 A kind of data branch mailbox method and device

Publications (1)

Publication Number Publication Date
CN108984790A true CN108984790A (en) 2018-12-11

Family

ID=64552389

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810858624.5A Pending CN108984790A (en) 2018-07-31 2018-07-31 A kind of data branch mailbox method and device

Country Status (1)

Country Link
CN (1) CN108984790A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084376A (en) * 2019-04-30 2019-08-02 成都四方伟业软件股份有限公司 To the method and device of the automatic branch mailbox of data
CN111429003A (en) * 2020-03-23 2020-07-17 北京互金新融科技有限公司 Data processing method and device
CN112270377A (en) * 2020-11-11 2021-01-26 北京百度网讯科技有限公司 Target image extraction method, neural network training method and device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084376A (en) * 2019-04-30 2019-08-02 成都四方伟业软件股份有限公司 To the method and device of the automatic branch mailbox of data
CN110084376B (en) * 2019-04-30 2021-05-14 成都四方伟业软件股份有限公司 Method and device for automatically separating data into boxes
CN111429003A (en) * 2020-03-23 2020-07-17 北京互金新融科技有限公司 Data processing method and device
CN111429003B (en) * 2020-03-23 2023-11-03 北京互金新融科技有限公司 Data processing method and device
CN112270377A (en) * 2020-11-11 2021-01-26 北京百度网讯科技有限公司 Target image extraction method, neural network training method and device
CN112270377B (en) * 2020-11-11 2024-03-15 北京百度网讯科技有限公司 Target image extraction method, neural network training method and device

Similar Documents

Publication Publication Date Title
CN108595157B (en) Block chain data processing method, device, equipment and storage medium
CN107391526B (en) Data processing method and device based on block chain
CN108984790A (en) A kind of data branch mailbox method and device
CN111352712B (en) Cloud computing task tracking processing method and device, cloud computing system and server
CN110635962B (en) Abnormity analysis method and device for distributed system
CN106202092A (en) The method and system that data process
CN109933610B (en) Data processing method, device, computer equipment and storage medium
CN109191287A (en) A kind of sharding method, device and the electronic equipment of block chain intelligence contract
CN112463859B (en) User data processing method and server based on big data and business analysis
CN103294558A (en) MapReduce scheduling method supporting dynamic trust evaluation
GB2611177A (en) Multi-task deployment method and electronic device
CN112445776B (en) Presto-based dynamic barrel dividing method, system, equipment and readable storage medium
CN108390914A (en) A kind of service update method and device, system
CN114004708A (en) Risk control method and device for business system
CN109118361A (en) Quota control method, apparatus and system
CN110909085A (en) Data processing method, device, equipment and storage medium
CN109857817A (en) The whole network domain electronic mutual inductor frequent continuous data is screened and data processing method
KR102464688B1 (en) Method and apparatus for detrmining event level of monitoring result
CN109086132A (en) A kind of recognition of face task balance call method, device and terminal device
CN114897426A (en) Case division information processing method and device, computer equipment and storage medium
CN114968028A (en) Method, apparatus, and medium for dynamically changing menu layout based on user behavior analysis
CN113344104A (en) Data processing method, device, equipment and medium
CN112070349A (en) Order allocation method, device, equipment and storage medium
CN112395081A (en) Resource online automatic recovery method, system, server and storage medium
CN110008000A (en) Application cluster capacity reduction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200821

Address after: 501, 5 / F, block B, No. 28, xinjiekouwei street, Xicheng District, Beijing 100032

Applicant after: Joint digital technology (Beijing) Co., Ltd

Address before: 100082 No. 508, 5th floor, Block B, 28 Xinjiekouwai Street, Xicheng District, Beijing

Applicant before: MIXIAOFENG WISDOM (BEIJING) TECHNOLOGY Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181211