Summary of the invention
This specification embodiment provides and a kind of sample label processing method and processing device, community partitioning method and device.
In a first aspect, this specification embodiment provides a kind of sample label processing method, comprising:
Sample set is obtained, the part sample in the sample set has default label;
According to the incidence relation between sample each in the sample set, the sample set is divided into H group, H is
Positive integer;
L iterative processing is carried out to the H group, until meeting the condition of convergence, and will be carried out at last time iteration
For the label information of each sample as processing result, the label information of each sample is corresponding to characterize each sample after reason
Whether this has the default label, and L is positive integer;
Wherein, the iterative processing includes: every time
The group characteristics of each group are determined according to the label information of current each sample;
Determine that target group and non-targeted group, the target group are with the pre- bidding according to the group characteristics
The group of the sample aggregation of label, the non-targeted group are other groups in one above group in addition to the target group
Group;
The default label is added not have the sample of the default label in the target group;
The default label is deleted to the sample in the non-targeted group with the default label.
Second aspect, this specification embodiment provide a kind of community division methods, comprising:
According to the incidence relation between sample each in sample set, generate using single sample as the relational network figure of node;
Calculate the degree of each node of the relational network figure;
According to each node of node spent sequence from big to small and successively access the relational network figure;
Wherein, each node of the access relational network figure includes:
Judge whether present node has been added any one group;
If any one group is not added for present node, the new group centered on present node is generated;
Determine that more than one expanding node, the expanding node are to pass through N side phase with present node according to present node
Associated node, N are positive integer;
The new group is added in one above expanding node.
The third aspect, this specification embodiment provide another community division methods, comprising:
According to the incidence relation between sample each in sample set, generate using single sample as the relational network figure of node;
Calculate the degree of each node of the relational network figure;
According to each node of node spent sequence from big to small and successively access the relational network figure;
Wherein, each node of the access relational network figure includes:
Judge whether present node has been added any one group;
If any one group is not added for present node, the new group centered on present node is generated;
Determine that more than one expanding node, the expanding node are to pass through N side phase with present node according to present node
Associated node, N are positive integer;
Each expanding node handle into group;
Wherein, it is described enter group processing include:
Judge the added group's quantity of the expanding node whether less than the first preset threshold;
If the added group's quantity of expanding node is less than first preset threshold, the expanding node is added
Enter the new group.
Fourth aspect, this specification embodiment provide a kind of sample label processing unit, comprising:
Sample set obtains module, and for obtaining sample set, the part sample in the sample set has default label;
Sample set division module, for according to the incidence relation between sample each in the sample set, by the sample
Collection is divided into H group, and H is positive integer;
Iterative processing module, for carrying out L iterative processing to the H group, up to meeting the condition of convergence, and will be into
The label information of each sample is as processing result, the label information pair of each sample after row last time iterative processing
It should characterize whether each sample has the default label, L is positive integer;
Wherein, the iterative processing module includes:
Characteristic determination module, for determining the group characteristics of each group according to the label information of current each sample;
Group determination module, for determining target group and non-targeted group, the target complex according to the group characteristics
Group is the group of the sample aggregation with the default label, and the non-targeted group is in one above group except described
Other groups outside target group;
Label adding module, for described default not have the sample addition of the default label in the target group
Label;
Label removing module, it is described default for deleting the sample in the non-targeted group with the default label
Label.
5th aspect, this specification embodiment provide a kind of community dividing device, comprising:
Network generation module, for generating with single sample according to the incidence relation between sample each in sample set
For the relational network figure of node;
Node degree computing module, the degree of each node for calculating the relational network figure;
Access modules successively access each section of the relational network figure for the sequence of the degree according to node from big to small
Point;
Wherein, the access modules include:
First judgment module, for judging whether present node has been added any one group;
New cluster generating module is generated with present node and is for when any one group is not added for present node
The new group of the heart;
Expanding node determining module, for determining that more than one expanding node, the expanding node are according to present node
With present node by the associated node in N side, N is positive integer;
First is added module, for the new group to be added in one above expanding node.
6th aspect, this specification embodiment provide another community dividing device, comprising:
Network generation module, for generating with single sample according to the incidence relation between sample each in sample set
For the relational network figure of node;
Node degree computing module, the degree of each node for calculating the relational network figure;
Access modules successively access each section of the relational network figure for the sequence of the degree according to node from big to small
Point;
Wherein, the access modules include:
First judgment module, for judging whether present node has been added any one group;
New cluster generating module is generated with present node and is for when any one group is not added for present node
The new group of the heart;
Expanding node determining module, for determining that more than one expanding node, the expanding node are according to present node
With present node by the associated node in N side, N is positive integer;
Enter group processing module, for handle into group to each expanding node;
Wherein, it is described enter group processing module include:
Second judgment module, for judging the added group's quantity of the expanding node whether less than the first default threshold
Value;
Second is added module, for being less than first preset threshold in the added group's quantity of the expanding node
When, the new group is added in the expanding node.
7th aspect, this specification embodiment provide a kind of server, including memory, processor and are stored in described
On memory and the computer program that can run on the processor, the processor are realized when executing the computer program
Above-mentioned sample label processing method and community division methods.
Eighth aspect, this specification embodiment provide a kind of computer readable storage medium, are stored thereon with computer journey
Sequence, the computer program realize above-mentioned sample label processing method and community division methods when being executed by the processor.
This specification embodiment has the beneficial effect that:
In this specification embodiment, according to the incidence relation between each sample, part sample had into default label
Sample set is divided into H group, and the sample aggregation with the default label is obtained according to the group characteristics of each group
Target group and the non-targeted group in addition to the target group, by in the target group do not have the default label
The sample addition default label realize diffusion, by being deleted to the sample in the non-targeted group with the default label
It except the default label realizes purification, and realizes diffusion and purification again by multiple iterative processing, improves the accurate of sample
Property and recall rate.The sample label processing method that this specification embodiment provides carries out group by directly qualitative again to a
Body is qualitative to adjust, and all calculating reduce the complexity of calculating all in group;It only needs to obtain when each iterative processing every
The group characteristics of a group, and do not have to calculate each sample to update, thus it is very low to calculate cost;The condition of convergence easily reaches,
Generally carrying out four to five iterative processings can exit.With existing LPA (Label Propagation Algorithm, label
Propagation algorithm) it compares, sample accuracy and recall rate are higher;The single label of support or few label starting, and single label, less label,
The diffusion purification of multi-tag can synchronize progress.
Specific embodiment
Above-mentioned technical proposal in order to better understand, below by attached drawing and specific embodiment to this specification embodiment
Technical solution elaborate, it should be understood that the specific features in this specification embodiment and embodiment are to this specification
The detailed description of embodiment technical solution, rather than the restriction to this specification technical solution.In the absence of conflict, this theory
Technical characteristic in bright book embodiment and embodiment can be combined with each other.
Referring to Figure 1, application scenarios schematic diagram is handled for the sample label of this specification embodiment.Wherein, sample label
Processing unit 100 uses specific iterative algorithm, to accuracy and recall rate low, can not support policy application and model training
Sample carries out tag processes, obtains accuracy and the high output sample of recall rate, and the output sample is supplied to model and is instructed
Practice device 200, model training is carried out according to the output sample by the model training apparatus 200, acquisition is able to solve a certain
The supervised learning model or semi-supervised learning model of particular problem.
In a first aspect, this specification embodiment provides a kind of sample label processing method.Fig. 2 is the sample label processing
The flow chart of method, the sample label processing method include step S201 to step S207.
S201 obtains sample set, and the part sample in the sample set has default label.
In machine learning field, sample refers to that the particular instance of data, the set of sample are the sample set.According to machine
The particular problem that device study solves is different, and the form of expression of sample is also different.Such as using machine learning in network trading
Risk subscribers identified that then sample corresponds to user;Classified for another example using machine learning to text, then sample pair
It should be text.Sample, which can be divided into, exemplar and unlabeled exemplars, and label is the data for needing to predict, such as can be commodity
Any data such as the meaning of the type of goods, audio clips that are shown in following price, picture.In this specification embodiment,
Sample in the sample set be accuracy and recall rate it is low, can not support policy application and model training sample.To net
For risk subscribers in network transaction are identified, sample accuracy and the low user for being embodied in devoid of risk of recall rate are added to
Risk label, risky user are not added risk label.
Part sample in the sample set has the default label, and part sample does not have the default label.Institute
Stating default label can be a certain specific label, or certain class label belonging to certain several specific label.For example, for
Risk behavior in network trading, the risk behavior include but is not limited to cheat class risk behavior, baseline class risk behavior, warp
Class risk behavior and financial class risk behavior are sought, if only needing to identify the user for carrying out risk behavior, without concern for progress
Which kind of risk behavior what the user of risk behavior specifically carried out is, then the default label is risk label;If desired it identifies
The user of certain specific risk behavior is carried out, then the default label is this kind of specific risk label, such as risk of fraud label.
It should be noted that the total sample number amount in the sample set and the sample size with the default label are by practical application
It determines, this specification embodiment is to this without limiting.Sample in the sample set can be obtained by web crawlers technology,
Be also possible to extract from some database, can also be from other systems or channel acquisition, this specification embodiment to this not
It is defined.
The sample set is divided into H group according to the incidence relation between sample each in the sample set by S202
Group, H are positive integer.
The incidence relation can be device relationships, cyberrelationship, social networks and community relations etc..If the sample
There are natural groups for concentration, such as there is chat group, particular community group, particular network group or particular device environment group etc., then
It is a group by each natural group division, otherwise carries out group identification using community segmentation or community discovery algorithm.Ginseng
Fig. 3 is examined, this specification embodiment provides a kind of concrete methods of realizing that the sample set is divided into H group, including step
S301 and step S302.
S301 is generated according to the incidence relation between sample each in the sample set using single sample as the pass of node
It is network.
The relational network figure is a kind of graph structure being made of several nodes, each node one sample of corresponding characterization
This, the relationship between two samples is indicated using line.For example, being used if having incidence relation between sample A and sample B
Line connects the corresponding node of sample A and the corresponding node of sample B;If onrelevant relationship between sample A and sample B, sample A
Without line between corresponding node and the corresponding node of sample B.It should be noted that the relational network figure can be undirected
Figure, or digraph, depending on actual demand.
S302 carries out community division to the relational network figure, obtains the H group.
With reference to Fig. 4, this specification embodiment provides a kind of quick community division methods, including step S401 is to step
S405。
S401 calculates the degree of each node of the relational network figure.
The degree of some node is the item number on side associated with the node.If the relational network figure is digraph, institute
The degree for stating each node of relational network figure is the sum of in-degree and out-degree.Wherein, the in-degree of some node is to be directed toward the node
The item number on side, the out-degree of some node are the item number from the side that the node is pointed out.Obtain each node of the relational network figure
Degree after, each node of the relational network figure is successively accessed according to the degree sequence from big to small of node, wherein described
The each node for accessing the relational network figure includes step S402 to step S405.It should be noted that for two or more
The identical node of the degree of node, can access in any order.
S402, judges whether present node has been added any one group.
If any one group is not added for present node, S403 is thened follow the steps, is generated new centered on present node
Otherwise group accesses next node.New group of the generation centered on present node, that is, create a group, and
The new group is added in present node.
S404 determines that more than one expanding node, the expanding node are to pass through N item with present node according to present node
The associated node in side, N are positive integer.
According to the classical theory of complex network, any two node only needs N step that can establish connection.The value of N can
It is configured according to actual needs, the value of N is smaller, may miss important connection;The value of N is bigger, and calculation amount is bigger.?
In a kind of optional implementation, the value of N is 3.Further, if the relational network figure is digraph, the N side is successively
Present node is directed toward from the expanding node or is directed toward the expanding node from present node.
The new group is added in one above expanding node by S405.
The community division methods that this specification embodiment provides, it is multiple that the classical theory based on complex network realizes the low time
Community segmentation is quickly carried out under miscellaneous degree.By taking the value of relational network figure non-directed graph as shown in Figure 5 and N are 1 as an example, below
Community division methods shown in Fig. 4 are described in detail:
Relational network figure shown in fig. 5 shares seven nodes of a, b, c, d, e, f, g, the degree for calculating each node corresponds to 3,3,
4,3,3,2,2;C node is accessed first according to the sequence of node spent from big to small, since any one group is not added for c node
Group generates the new group centered on c node;Since a node, b node, d node and e node are by 1 side and c node
Associated node, thus a node, b node, d node and e node are determined as expanding node, and by a node, b node, d
The new group centered on c node is added in node and e node.Using identical method, other remaining nodes are successively accessed,
Since the group centered on c node has been added in a node, b node, d node and e node, thus access f node or g
E node and g node are added the new group centered on f node, or e node and f node are added with g node and are by node
The new group at center, it is final to obtain Liang Ge group: by group that a node, b node, c node, d node and e node are constituted with
And the group being made of e node, f node and g node.
Community division methods shown in Fig. 4 may frequently participate in each group for some nodes in core position
The calculating of group, keeps the processing speed of subsequent step slack-off.With reference to Fig. 6, this specification embodiment provides another quick community
Division methods, including step S601 to step S606.
S601 calculates the degree of each node of the relational network figure.
The degree of some node is the item number on side associated with the node.If the relational network figure is digraph, institute
The degree for stating each node of relational network figure is the sum of in-degree and out-degree.Wherein, the in-degree of some node is to be directed toward the node
The item number on side, the out-degree of some node are the item number from the side that the node is pointed out.Obtain each node of the relational network figure
Degree after, each node of the relational network figure is successively accessed according to the degree sequence from big to small of node, wherein described
The each node for accessing the relational network figure includes step S602 to step S606.It should be noted that for two or more
The identical node of the degree of node, can access in any order.
S602, judges whether present node has been added any one group.
If any one group is not added for present node, S603 is thened follow the steps, is generated new centered on present node
Otherwise group accesses next node.New group of the generation centered on present node, that is, create a group, and
The new group is added in present node.
S604 determines that more than one expanding node, the expanding node are to pass through N item with present node according to present node
The associated node in side, N are positive integer.
According to the classical theory of complex network, any two node only needs N step that can establish connection.The value of N can
It is configured according to actual needs, the value of N is smaller, may miss important connection;The value of N is bigger, and calculation amount is bigger.?
In a kind of optional implementation, the value of N is 3.Further, if the relational network figure is digraph, the N side is successively
Present node is directed toward from the expanding node or is directed toward the expanding node from present node.Obtain one above extension
After node, to each expanding node carry out into group handle, wherein it is described enter group processing include step S605 and step S606.It needs
It is noted that can handle into group to each expanding node simultaneously, successively each expanding node can also be entered
Group's processing, this specification embodiment is to this without limiting.
Whether S605 judges the added group's quantity of the expanding node less than the first preset threshold.
The value of first preset threshold can be configured according to practical application, and the value of first preset threshold is got over
It is small, some groups may be missed;The value of first preset threshold is bigger, and calculation amount is bigger.
If the added group's quantity of expanding node is less than first preset threshold, S606 is thened follow the steps, it will
The new group is added in the expanding node.
It is still 1 with the value of relational network figure non-directed graph as shown in Figure 5, N and first preset threshold takes
Value is also to be below described in detail community division methods shown in fig. 6 for 1:
Relational network figure shown in fig. 5 shares seven nodes of a, b, c, d, e, f, g, the degree for calculating each node corresponds to 3,3,
4,3,3,2,2;C node is accessed first according to the sequence of node spent from big to small, since any one group is not added for c node
Group generates the new group centered on c node;Since a node, b node, d node and e node are by 1 side and c node
Associated node, thus a node, b node, d node and e node are determined as expanding node;Due to a node, b node, d
Any one group, i.e. a node, b node, d node and the added group's quantity of e node are not added for node and e node
It is 0, is less than first preset threshold, thus a node, b node, d node and e node is added centered on c node
New group.Using identical method, other remaining nodes are successively accessed, due to a node, b node, d node and e node
The group centered on c node has been added, thus has accessed f node or g node, if access f node, expanding node is e section
Point and g node, if access g node, expanding node is e node and f node;Since e node has been added in centered on c node
New group, i.e. the added group's quantity of e node is 1, is not less than first preset threshold, thus cannot add e node
Enter the new group centered on f node or g node, the new group centered on f node only is added in g node, or f is saved
The new group centered on g node is added in point, final to obtain Liang Ge group: being saved by a node, b node, c node, d node and e
The group that point is constituted and the group being made of f node and g node.
In practical applications, since the sample size in the sample set is larger, according to sample each in the sample set
Between incidence relation the sample set would generally be divided into more than two groups, i.e. H is the positive integer not less than 2, in pole
It is possible that the case where sample set is divided into a group in the case of end.It continues to refer to figure 1, the sample set is drawn
It is divided into after H group, L iterative processing is carried out to the H group, until meeting the condition of convergence, L is positive integer.Wherein,
Each iterative processing includes step S203 to step S206.
S203 determines the group characteristics of each group according to the label information of current each sample.
The label information of each sample is corresponding to characterize whether each sample has the default label.It carries out first
When secondary iterative processing, the label information of current each sample is the label information of each sample in the sample set;Carry out second
When secondary and second of above iterative processing, the label information of current each sample is each after carrying out last iterative processing
The label information of sample.For the group characteristics as the foundation for determining target group, particular content can be according to the actual situation
Depending on, as long as it can ensure that determining the target group according to the group characteristics, wherein the target group is tool
There is the group of the sample aggregation of the default label.In a kind of optional implementation, the group characteristics include cluster label
Concentration;In another optional implementation, the group characteristics can also include group in addition to including the cluster label concentration
Group scale.Wherein, the group size is the quantity of all samples in the corresponding group of the group characteristics, the cluster label
Concentration be the corresponding group of the group characteristics in the default label sample quantity and all samples quantity it
Than.
S204 determines that target group and non-targeted group, the target group are with described according to the group characteristics
The group of the sample aggregation of default label, the non-targeted group be in one above group in addition to the target group
Other groups.
Specifically, judge whether each group characteristics meet preset condition;If the group characteristics meet the default item
The corresponding group of the group characteristics is then determined as the target group by part, otherwise by the corresponding group of the group characteristics
It is determined as the non-targeted group.It is described to judge whether each group characteristics meet preset condition, it can be successively to each group
Feature is judged, can also be judged simultaneously each group characteristics, this specification embodiment is to this without limiting.Institute
It states preset condition to be determined according to the particular problem that the group characteristics and machine learning solve, specifically be asked with what machine learning solved
It is described for entitled identification group risk behavior, the group characteristics include the group size and the cluster label concentration
Preset condition can be greater than preset quantity for the group size and the cluster label concentration is greater than default percentage.
S205 is that the sample for not having the default label in the target group adds the default label.
By taking the target group includes 10 samples as an example, if wherein 7 samples have the default label, for residue
3 samples add the default label.By adding institute not have the sample of the default label in the target group
Default label is stated, realizes label diffusion.
S206 deletes the default label to the sample in the non-targeted group with the default label.
By taking the non-targeted group includes 10 samples as an example, if wherein 3 samples have the default label, to this
3 samples delete the default label.Described in deleting the sample in the non-targeted group with the default label
Pre- bidding realizes label purification.It should be noted that this specification embodiment does not limit holding for step S205 and step S206
Row sequence, it can first carry out step S205, then execute step S206;Step S206 can also be first carried out, then executes step
S205。
After the completion of each iterative processing, judge whether to meet the condition of convergence.If meeting the condition of convergence, execute
Otherwise step S207 is carried out down using the label information of each sample after progress last time iterative processing as processing result
An iteration processing, wherein the label information of each sample is corresponding to characterize whether each sample has described preset
Label.The condition of convergence can reach preset times for the number of iterations L, and the preset times can be set based on practical experience
It sets.The condition of convergence can be with are as follows: meetsWherein, a is to carry out current iteration to handle the described pre- of addition
It is marked with the quantity of label, b is the quantity for carrying out the default label that current iteration processing is deleted, and M is to carry out current iteration processing
Before with the sum of the quantity of sample of the default label in each group, ε is the second preset threshold.Certainly, the convergence
Condition can also be other conditions, and this specification embodiment is to this without limiting.
If the group characteristics include the group size and the cluster label concentration, the preset condition is the group
Group scale is greater than 6 and the cluster label concentration is greater than 0.5, and the condition of convergence is to meetAnd described second
Preset threshold ε is 0.2, by taking one above group group R as shown in Figure 7, group S and group T as an example, wherein black
Color dot indicates the sample with the default label, and white dot indicates the sample without the default label, right below
The iterative processing is described in detail:
The group size for calculating group R shown in Fig. 7 is 11, cluster label concentration is 8/11, and the group size of group S is
6, cluster label concentration is 2/6, and the group size of group T is 7, cluster label concentration is 5/7.Therefore, by group R and group T
It is determined as the target group, group S is determined as the non-targeted group.It is described default not have in group R and group T
The sample of label adds the default label, deletes the default label to the sample in group S with the default label, when
The group of preceding iterative processing output is as shown in Figure 8.Since the quantity of the default label of current iteration processing addition is 2, when
The quantity for the default label that preceding iterative processing is deleted is 1, is had in each group of last iterative processing output described
The sum of the quantity of sample of default label is 11, i.e.,Thus current iteration after treatment
Meet the condition of convergence, stop the iterative processing, by each sample in group R shown in Fig. 8, group S and group T
Label information is as processing result.
In this specification embodiment, by directly to group carry out it is qualitative again to individual it is qualitative adjust, it is all calculating all
In group, the complexity of calculating is reduced;It only needs to obtain the group characteristics of each group when each iterative processing, and does not have to
Each sample is calculated and is updated, thus it is very low to calculate cost;The condition of convergence easily reaches, and generally carries out four to five iterative processings
It can exit.It is compared, the sample of acquisition with existing LPA (Label Propagation Algorithm, label propagation algorithm)
Accuracy and recall rate are higher;Support single label or few label starting, and the diffusion purification of single label, few label, multi-tag is all
Progress can be synchronized.
Second aspect, based on the same inventive concept, this specification embodiment provide a kind of community division methods, comprising:
According to the incidence relation between sample each in sample set, generate using single sample as the relational network figure of node;
Calculate the degree of each node of the relational network figure;
According to each node of node spent sequence from big to small and successively access the relational network figure;
Wherein, each node of the access relational network figure includes:
Judge whether present node has been added any one group;
If any one group is not added for present node, the new group centered on present node is generated;
Determine that more than one expanding node, the expanding node are to pass through N side phase with present node according to present node
Associated node, N are positive integer;
The new group is added in one above expanding node.
The community division methods that this specification embodiment second aspect provides, the classical theory based on complex network realize
Community segmentation is quickly carried out under low time complexity, specifically refers to the description to step S401 to step S405, herein no longer
It repeats.
The third aspect, based on the same inventive concept, this specification embodiment provide another community division methods, comprising:
According to the incidence relation between sample each in sample set, generate using single sample as the relational network figure of node;
Calculate the degree of each node of the relational network figure;
According to each node of node spent sequence from big to small and successively access the relational network figure;
Wherein, each node of the access relational network figure includes:
Judge whether present node has been added any one group;
If any one group is not added for present node, the new group centered on present node is generated;
Determine that more than one expanding node, the expanding node are to pass through N side phase with present node according to present node
Associated node, N are positive integer;
Each expanding node handle into group;
Wherein, it is described enter group processing include:
Judge the added group's quantity of the expanding node whether less than the first preset threshold;
If the added group's quantity of expanding node is less than first preset threshold, the expanding node is added
Enter the new group.
The community division methods that this specification embodiment third aspect provides, the classical theory for being based not only on complex network are real
Show and quickly carried out community segmentation under low time complexity, and has limited some nodes in core position and repeat that group is added
The quantity of group, so that the processing speed of subsequent step be made to become faster, can specifically join so that it will not frequently participate in the calculating of each group
The description to step S601 to step S606 is examined, details are not described herein.
Fourth aspect, based on the same inventive concept, this specification embodiment provide a kind of sample label processing unit.Fig. 9
It is the structural schematic diagram of the sample label processing unit, the sample label processing unit includes:
Sample set obtains module 901, and for obtaining sample set, the part sample in the sample set has default label;
Sample set division module 902, for according to the incidence relation between sample each in the sample set, by the sample
This collection is divided into H group, and H is positive integer;
Iterative processing module 903, for carrying out L iterative processing to the H group, until meet the condition of convergence, and
Using the label information of each sample after progress last time iterative processing as processing result, the label of each sample is believed
Breath is corresponding to characterize whether each sample has the default label, and L is positive integer;
Wherein, the iterative processing module 903 includes:
Characteristic determination module 9031, for determining that the group of each group is special according to the label information of current each sample
Sign;
Group determination module 9032, for determining target group and non-targeted group, the mesh according to the group characteristics
The group that group is the sample aggregation with the default label is marked, the non-targeted group is to remove in one above group
Other groups outside the target group;
Label adding module 9033, for for described in the sample addition in the target group without the default label
Default label;
Label removing module 9034, described in deleting the sample in the non-targeted group with the default label
Default label.
In a kind of optional implementation, the sample set division module 902 includes:
Network generation module, for generating with single according to the incidence relation between sample each in the sample set
Sample is the relational network figure of node;
Community division module obtains the H group for carrying out community division to the relational network figure.
In a kind of optional implementation, the community division module includes:
Node degree computing module, the degree of each node for calculating the relational network figure;
Access modules successively access each section of the relational network figure for the sequence of the degree according to node from big to small
Point;
Wherein, the access modules include:
First judgment module, for judging whether present node has been added any one group;
New cluster generating module is generated with present node and is for when any one group is not added for present node
The new group of the heart;
Expanding node determining module, for determining that more than one expanding node, the expanding node are according to present node
With present node by the associated node in N side, N is positive integer;
First is added module, for the new group to be added in one above expanding node.
In a kind of optional implementation, the community division module includes:
Node degree computing module, the degree of each node for calculating the relational network figure;
Access modules successively access each section of the relational network figure for the sequence of the degree according to node from big to small
Point;
Wherein, the access modules include:
First judgment module, for judging whether present node has been added any one group;
New cluster generating module is generated with present node and is for when any one group is not added for present node
The new group of the heart;
Expanding node determining module, for determining that more than one expanding node, the expanding node are according to present node
With present node by the associated node in N side, N is positive integer;
Enter group processing module, for handle into group to each expanding node;
Wherein, it is described enter group processing module include:
Second judgment module, for judging the added group's quantity of the expanding node whether less than the first default threshold
Value;
Second is added module, for being less than first preset threshold in the added group's quantity of the expanding node
When, the new group is added in the expanding node.
In a kind of optional implementation, the relational network figure is digraph, each node of the relational network figure
Degree be the sum of in-degree and out-degree, the N side is successively directed toward present node from the expanding node or is referred to from present node
To the expanding node.
In a kind of optional implementation, the group characteristics include:
Group size and cluster label concentration;Or
Cluster label concentration;
Wherein, the group size is the quantity of all samples in the corresponding group of the group characteristics, group's mark
Sign the quantity of quantity and all samples that concentration is the sample in the corresponding group of the group characteristics with the default label
The ratio between.
In a kind of optional implementation, the group determination module 9032 includes:
Third judgment module, for judging whether each group characteristics meet preset condition;
Target group determining module, for when the group characteristics meet the preset condition, by the group characteristics
Corresponding group is determined as the target group;
Non-targeted group determination module, for when the group characteristics are unsatisfactory for the preset condition, by the group
The corresponding group of feature is determined as the non-targeted group.
In a kind of optional implementation, the condition of convergence includes:
L reaches preset times;Or,
MeetWherein, a is the quantity for carrying out the default label of current iteration processing addition, and b is
The quantity for the default label that current iteration processing is deleted is carried out, M is to have in each group before carrying out current iteration processing
There is the sum of the quantity of sample of the default label, ε is the second preset threshold.
In a kind of optional implementation, H is the positive integer not less than 2.
5th aspect, based on the same inventive concept, this specification embodiment provides a kind of community dividing device, comprising:
Network generation module, for generating with single sample according to the incidence relation between sample each in sample set
For the relational network figure of node;
Node degree computing module, the degree of each node for calculating the relational network figure;
Access modules successively access each section of the relational network figure for the sequence of the degree according to node from big to small
Point;
Wherein, the access modules include:
First judgment module, for judging whether present node has been added any one group;
New cluster generating module is generated with present node and is for when any one group is not added for present node
The new group of the heart;
Expanding node determining module, for determining that more than one expanding node, the expanding node are according to present node
With present node by the associated node in N side, N is positive integer;
First is added module, for the new group to be added in one above expanding node.
6th aspect, based on the same inventive concept, this specification embodiment provides a kind of community dividing device, comprising:
Network generation module, for generating with single sample according to the incidence relation between sample each in sample set
For the relational network figure of node;
Node degree computing module, the degree of each node for calculating the relational network figure;
Access modules successively access each section of the relational network figure for the sequence of the degree according to node from big to small
Point;
Wherein, the access modules include:
First judgment module, for judging whether present node has been added any one group;
New cluster generating module is generated with present node and is for when any one group is not added for present node
The new group of the heart;
Expanding node determining module, for determining that more than one expanding node, the expanding node are according to present node
With present node by the associated node in N side, N is positive integer;
Enter group processing module, for handle into group to each expanding node;
Wherein, it is described enter group processing module include:
Second judgment module, for judging the added group's quantity of the expanding node whether less than the first default threshold
Value;
Second is added module, for being less than first preset threshold in the added group's quantity of the expanding node
When, the new group is added in the expanding node.
7th aspect, is based on invention structure same as sample label processing method in previous embodiment and community division methods
Think, the present invention also provides a kind of servers.With reference to Figure 10, the server includes memory 1004, processor 1002 and storage
On the memory 1004 and the computer program that can run on the processor 1002, the processor 1002 execute institute
The step of either sample label processing method and community division methods described previously method is realized when stating computer program.
In Figure 10, bus architecture (is represented) with bus 1000, and the bus 1000 may include any number of mutual
The bus and bridge of connection, the bus 1000 will include the one or more processors represented by the processor 1002 and described deposit
The various circuits for the memory that reservoir 1004 represents link together.The bus 1000 can also will such as peripheral equipment, steady
Various other circuits of depressor and management circuit or the like link together, and these are all it is known in the art, therefore,
It will not be further described herein.Bus interface 1005 is in the bus 1000 and receiver 1001 and transmitter 1003
Between interface is provided.The receiver 1001 and the transmitter 1003 can be the same element, i.e. transceiver, provide and are used for
The unit communicated over a transmission medium with various other devices.The processor 1002 is responsible for the management bus 1000 and usually
Processing, and the memory 1004 can be used to store the processor 1002 used data when executing operation.
Eighth aspect is based on invention structure same as sample label processing method in previous embodiment and community division methods
Think, the present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the computer program is by institute
State the step of realizing sample label processing method and community division methods described previously when processor executes.
This specification is referring to the method, equipment (system) and computer program product according to this specification embodiment
Flowchart and/or the block diagram describes.It should be understood that can be realized by computer program instructions every in flowchart and/or the block diagram
The combination of process and/or box in one process and/or box and flowchart and/or the block diagram.It can provide these computers
Processor of the program instruction to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices
To generate a machine, so that generating use by the instruction that computer or the processor of other programmable data processing devices execute
In setting for the function that realization is specified in one or more flows of the flowchart and/or one or more blocks of the block diagram
It is standby.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of equipment, the commander equipment realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment of this specification has been described, once a person skilled in the art knows basic wounds
The property made concept, then additional changes and modifications may be made to these embodiments.So the following claims are intended to be interpreted as includes
Preferred embodiment and all change and modification for falling into this specification range.
Obviously, those skilled in the art can carry out various modification and variations without departing from this specification to this specification
Spirit and scope.In this way, if these modifications and variations of this specification belong to this specification claim and its equivalent skill
Within the scope of art, then this specification is also intended to include these modifications and variations.