CN102202012A - Group dividing method and system of communication network - Google Patents

Group dividing method and system of communication network Download PDF

Info

Publication number
CN102202012A
CN102202012A CN201110141970XA CN201110141970A CN102202012A CN 102202012 A CN102202012 A CN 102202012A CN 201110141970X A CN201110141970X A CN 201110141970XA CN 201110141970 A CN201110141970 A CN 201110141970A CN 102202012 A CN102202012 A CN 102202012A
Authority
CN
China
Prior art keywords
node
corporations
communication
limit
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201110141970XA
Other languages
Chinese (zh)
Other versions
CN102202012B (en
Inventor
郭世泽
陈哲
王小娟
陆哲明
段榕
赵建鹏
杨云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
No54 Inst Headquarters Of General Staff P L A
Original Assignee
No54 Inst Headquarters Of General Staff P L A
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by No54 Inst Headquarters Of General Staff P L A filed Critical No54 Inst Headquarters Of General Staff P L A
Priority to CN201110141970.XA priority Critical patent/CN102202012B/en
Publication of CN102202012A publication Critical patent/CN102202012A/en
Application granted granted Critical
Publication of CN102202012B publication Critical patent/CN102202012B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a group dividing method of a communication network, which comprises the following steps: preprocessing the communication data; creating a communication relationship network according to the obtained preprocessing result to obtain the nodes representing a communication sender and a communication receiver in the communication network as well as a side representing the communication relationship between the communication sender and the communication receiver; constructing a demand text vector and a communication text vector according to a query word provided by the user; calculating the node centrality of each node in the communication relationship network; calculating the communication relationship strength among the nodes in communication relationship in the communication relationship network, the similarity of the sides among the nodes and the satisfaction degree of the user on the sides among the nodes; performing side clustering operation on the sides in the communication relationship network to generate multiple groups; finding respective core members in the group according to the node centrality and communication theme; expanding the members in the group; and dividing the expanded members in the group to generate a new group.

Description

The corporations' division methods and the system of communication network
Technical field
The present invention relates to the data mining field, particularly a kind of corporations' division methods and system of communication network.
Background technology
Fetion, mail, MSN, meanss of communication such as QQ become the important means that people carry out information interchange gradually, and the convenience of contact makes its application increasingly extensive.Communication network is social networks embodiment on the internet, and communication data provides the research sample for the discovery of social rule.By analyzing communication data, find user's interest public organization and core member according to user's request, this method is also referred to as corporations' division methods, and corporations' division result has been shone upon the group in the reality, has practical significance.
For corporations' division methods of communication network, prior art mainly is divided into two kinds:
A kind of Complex Networks Theory that is based on is divided communication network, and as spectral method, stratification is based on method of modularity etc.What the corporations of complex network divided concern is topology of networks, division result can be good at reflecting topology of networks, but comprised a large amount of extraneous data in the communication network, the speed that the existence of these data makes corporations divide on the one hand is restricted, though make that on the other hand dividing the result belongs to a group on topological structure, the Content of Communication of this group is not that the user pays close attention to.For the corporations that are met user's request divide, need screen communication data based on user's request.
Another kind is based on Content of Communication communication network is carried out corporations' division, as the k-means algorithm, and Bayes etc., the Content of Communication that Content of Communication is similar is divided into corporations.Adopt the resulting corporations of this method,, and can meet consumers' demand by screening though Content of Communication is similar, for the group of same Content of Communication may be corresponding in real society different " groups ".
Consideration Content of Communication and user's request are carried out corporations and are divided, need to consider on the one hand the requirement of the renewal of communication data every day to algorithm speed, to consider to communicate by letter text on the other hand to the influence of node and side attribute, thereby the analysis result that obtains is met consumers' demand.
Summary of the invention
The objective of the invention is to overcome existing corporations division methods and in partition process, lay particular stress on to some extent, can't meet consumers' demand, can not reflect the feature of node well.
To achieve these goals, the invention provides a kind of corporations' division methods of communication network, comprising:
Step 1), communication data is carried out preliminary treatment, obtain the information that comprises communication data ID, caller information, recipient's information, call duration time, Content of Communication about communication data;
Step 2), create the communications and liaison relational network that is used to reflect described communication network architecture according to the resulting preliminary treatment result of step 1), obtain being used for representing the sender of communications of described communication network, communication receiver's node by described communications and liaison relational network, and the limit that is used to represent correspondence between described sender of communications, communication receiver;
Step 3), the query word structure demand text vector that provides according to the user and the text vector of communicating by letter;
The node center degree of each node in step 4), the described link relation network of calculating; Described node center degree comprises node intermediary degree, node tightness and node contact degree;
Step 5), calculate communications and liaison relationship strength, the similarity between each internodal limit and user between each node that has the communications and liaison relation in the described communications and liaison relational network to the satisfaction on described internodal limit;
Step 6), be cluster operation in the described communications and liaison relational network, generate a plurality of corporations while doing based on described Content of Communication;
Step 7), in described corporations, seek separately core member according to described node center degree and communication theme;
Step 8), on described core member's basis, the member in the corporations is expanded;
Step 9), the member that process in the described corporations is expanded divide, and generate new corporations.
In the technique scheme, described step 6) comprises:
Step 6-1), determine the number of the corporations that the limit cluster will generate;
Step 6-2), be each corporations' generation initial cores separately;
Step 6-3), for every in communication network limit, calculate the similarity between the initial cores in itself and described each corporations successively;
Step 6-4), according to step 6-3) result of calculation, the limit in the described communication network is added in the corporations with the initial cores place of its similarity maximum;
Step 6-5), adjust the cluster centre of described each corporations;
Step 6-6), repeated execution of steps 6-3)-step 6-5), up to satisfying stop condition.
In the technique scheme, described step 6-2) comprising:
Step 6-2-1), according to the similarity between described each internodal limit, if s Ij=0, then limit i and limit j are formed to depositing in the set A;
Step 6-2-2), in every group among the set of computations A with the class degree value of limit i
Figure BSA00000506493900031
And the class degree value of limit j
Figure BSA00000506493900032
Whether judge these two class degree values all greater than preassigned threshold value, have only that limit i and limit j formed to being isolated limit when described two class degree values during all less than described threshold value, will for the limit i on isolated limit and limit j formed to from set A, deleting;
Step 6-2-3), limit i in the set A and limit j are carried out step-by-step and operation
Figure BSA00000506493900033
With satisfy the limit i of minimum value and limit j deposit in cluster centre center=(i, j) in;
Step 6-2-4), search with cluster centre center in the limit k of all limit similarity minimums as new cluster centre, if k does not exist, then return the cluster centre that finds, this cluster centre is exactly an initial cluster center; If it is a plurality of that k has, described k is deposited among the set center, re-execute step 6-2-3 then).
In the technique scheme, described step 7) comprises:
Step 7-1), be each member's computing node centrad in the corporations;
Step 7-2), theme as member's computing node weight in the corporations based on communication;
Step 7-3), node is sorted, obtain the core member according to ranking results by described node center degree and described node weights.
In the technique scheme, described step 8) comprises:
Step 8-1), get m and node i beeline and form set of node { v greater than 2 node 1, v 2..., v m; The number of times that belongs to same corporations with variable fnum record and node i;
Step 8-2), from the set of node that previous step is produced, choose a undressed subclass, judge whether node and the node i in this node subclass belongs to same corporations;
Step 8-3), repeating step 8-2), the frequency p according to the fnum of each node calculates each node if frequency p, thinks then that this node and node i belong to same corporations greater than another threshold value, otherwise then is not.
In the technique scheme, described step 9) comprises:
Step 9-1), communication network is divided into n corporations, each node is exactly independently corporations; Wherein, initial modularity value Q=0, initial a iAnd intermediate variable b IjSatisfy:
a i = Σ j w ij e ij 2 Σ i , j w ij
b ij = w ij e ij 2 Σ i , j w ij
E when node i has the limit to be connected with node j wherein Ij=1; E when not having the limit to connect between node i and the node j Ij=0; w IjBe limit e IjCorresponding weights; The element of module Increment Matrix satisfies when initial:
Δ Q ij = b ij + b ji - 2 a i a j = w ij e ij Σ i , j w ij - ( Σ k w ik e ik ) ( Σ k w jk e jk ) 2 ( Σ i , j w ij ) 2
Step 9-2), from raft H, select maximum Δ Q Ij, merging corresponding i of corporations and j, the label of the corporations after mark merges is j; And update module degree increment Delta Q Ij, raft H and auxiliary vectorial a i: this step comprises:
Step 9-2-1), Δ Q IjRenewal, delete the element of the capable and i of i row, upgrade the element of the capable and j row of j, thereby obtain
Figure BSA00000506493900044
Step 9-2-2), the renewal of raft H, upgrade Δ Q at every turn IjAfter, upgrade the greatest member of corresponding row and column in the raft;
Step 9-2-3), auxiliary vector upgrades:
a′ j=a i+a j
a′ i=0
Modularity value Q+ Δ Q after record merges simultaneously Ij
Step 9-3), repeating step 9-2) merge end condition up to satisfying.
The present invention also provides a kind of corporations of communication network to divide system, comprising: data preprocessing module, communications and liaison relational network structure module, text vector constructing module, node center degree computing module, side attribute computing module, limit cluster module, core member search module, member's expansion module and member and divide module; Wherein,
Described data preprocessing module is carried out preliminary treatment to communication data, obtains the information about communication data that comprises communication data ID, caller information, recipient's information, call duration time, Content of Communication;
Described communications and liaison relational network makes up module and creates the communications and liaison relational network that is used to reflect described communication network architecture according to resulting preliminary treatment result, obtain being used for representing the sender of communications of described communication network, communication receiver's node by described communications and liaison relational network, and the limit that is used to represent correspondence between described sender of communications, communication receiver;
Described text vector constructing module is according to user the query word structure demand text vector that provides and the text vector of communicating by letter;
Described node center degree computing module calculates the node center degree of each node in the described link relation network; Described node center degree comprises node intermediary degree, node tightness and node contact degree;
Described side attribute computing module calculates communications and liaison relationship strength, the similarity between each internodal limit and user between each node that has communications and liaison relations in the described link relation network to the satisfaction on described internodal limit;
Described limit cluster module is the cluster operation while doing in the described communications and liaison relational network based on described Content of Communication, generates a plurality of corporations;
Described core member searches module and seek separately core member according to described node center degree and communication theme in described corporation;
Described member's expansion module is expanded the member in the corporations on described core member's basis;
Described member divides module the member through expansion in the described corporations is divided, and generates new corporations.
The invention has the advantages that:
1, method and system of the present invention has extracted from communication network and has comprised and be used for representing the sender of communications of described communication network, communication receiver's node, be used to represent the limit of correspondence between described sender of communications, communication receiver, the node center degree, each internodal link relation intensity, similarity between each internodal limit and user to the satisfaction on described internodal limit in interior information, for the excavation and the analysis of follow-up communication data provides technical support than horn of plenty.
2, method and system of the present invention once spreads (being the diffusion that the cluster done when dividing of limit cluster, corporations and incorporator are done when expanding) by twice cluster and has realized corporations' division, divide the result accurately, reliable.
Description of drawings
Fig. 1 is a corporations of the present invention division methods flow chart in one embodiment;
Fig. 2 is the related in one embodiment schematic diagram that is used to store the pretreated form of process;
The flow chart of Fig. 3 in the corporations of the present invention division methods member in the corporations being expanded;
Fig. 4 is that corporations of the present invention divide system's schematic diagram in one embodiment.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is illustrated.
Before embodiments of the present invention are elaborated, at first related notion related among the present invention is described.
1, set of node N
Set of node N is the set of each communication node in the communication network.
2, limit collection E
Limit collection E is used in the record communication process as the communication node of transmit leg and as the correspondence between recipient's the communication node, is typically expressed as one 0,1 matrix, wherein e IjThere is the limit to connect e between=1 expression node i and the node j IjThere is not the limit to connect between=0 expression node i and the node j.
3, user's request Q
The scale of considering communication network is very huge, and in order to improve accuracy rate, the user need provide the demand text to come the lock onto target scope.For example, a user thinks the information of locking about " security ", and then this user need provide as keywords such as " security ", " stocks " and inquire about as the demand text, and all discussed the people of these speech with locked.Described user's request normally occurs with the form of speech.Need to prove, even user's request is clear and definite, can both may be the People's University as " National People's Congress " owing to the inconsistent ambiguity that causes of word also, also may be people's congress, thus also to expand the demand text, thus make up user inquiring vector Q.
4, nodal community collection L N
Property set L for node i NComprise following three:
1), communication number of the account:
Mapping relations between record node and the communication number of the account.
2), information of neighbor nodes table:
If there is the limit to connect between node i and the node j, then node i is called the neighbours of node j, and each node has the information of neighbor nodes table of self.The information of the neighbor node of one node is kept in the information of neighbor nodes table of this node.
3), node center degree C:
Each node is owing to the difference on its topological structure has different status in communication network.Node center degree C is an index that is used to indicate the communication node significance level taking all factors into consideration node tightness, intermediary's degree and contact degree, is represented with a matrix usually.
5, side attribute collection L E
For limit e IjProperty set L EComprise following three:
1), communications and liaison intensity matrix W
In communication network, the communication communications and liaison intensity between the needs assessment node (being called for short communications and liaison intensity).If the direct communication behavior is arranged between the node, then the communications and liaison intensity reflects is that it gets in touch with intensity in reality; If there is not the direct communication behavior, then the communications and liaison intensity reflects is its possibility that produces information interchange in reality.Can take all factors into consideration information such as call duration time, communication frequency, topological structure and make up communications and liaison intensity matrix W.
2), similarity matrix S
The limit is expressed as the vector with semanteme, according to the similarity between the vector calculation limit.Similarity matrix S is that cluster analysis provides support.
3), user satisfaction CE
Every limit can be given a user satisfaction CE according to the user's request text, user satisfaction is used for judging that this limit is whether in user's AOI.
More than being the explanation to related notion of the present invention, in the following embodiments, will be example with the mail network, to how excavating the information in the mail network, and then realize that the process that corporations divide describes.In other embodiments, also can set up information excavating and corporations' division with reference to correlated process such as communication networks such as landline telephone, portable terminals.
Before mail network was analyzed, inevitable requirement had the related data of mail communication.These data can utilize prior art to obtain from the communication network such as the Internet, no longer repeat at this.Below with reference to Fig. 1, to how according to the mail communication data by the communication network mined information, and then realize that the process that corporations divide describes.
Step 10, to the preliminary treatment of mail communication data.
Preliminary treatment to the mail communication data mainly is the information that will obtain following many aspects:
1), communication data ID
Communication data is numbered, and ID is a unique identification of distinguishing communication data.In the present embodiment, be generally an envelope mail and give an ID.And in other embodiments,, give an ID for once talking with as in instant messagings such as MSN and QQ.
2), caller information
The information of transmit leg in the communication data.In the present embodiment, caller information can be the e-mail address of transmit leg, in other embodiments, also can be number of the account, IP address of transmit leg etc., as long as can the unique identification transmit leg.
3), recipient's information
Recipient's information in the communication data.In the present embodiment, recipient's information can be recipient's e-mail address, in other embodiments, also can be number of the account, IP address of recipient etc., as long as can the unique identification recipient.
4), call duration time
The time of origin of communication data.In the present embodiment, call duration time can be the time that transmit leg sends mail, or the recipient receives the time of mail.In other embodiments, in the instant messaging process, other call duration time identification method can be arranged also, as with chat time started of primary network chat as call duration time.
5), Content of Communication
Content of Communication is exactly the content of text of communication data, as the theme and the text of Email, in the present embodiment, not with the information in the Email attachment as Content of Communication.In other embodiments, also can read text message in the annex by related software, and with it as Content of Communication.Owing in Chinese, do not have tangible line of demarcation between speech and the speech, therefore,, need do word segmentation processing to the content of text in the communication data as a kind of preferred implementation, obtain the Content of Communication of forming by a plurality of words.
A communication process in the communication network can obtain the information of above-mentioned five aspects, and the information of all or part communication process of whole communication network in a period of time is put together the basic data that just can be formed for describing the mail communication network.As a kind of preferred implementation, can be classified to these basic datas, and classification results is stored respectively with a plurality of tables.
In the present embodiment, with reference to figure 2, in the several below forms of sorted storage:
A, mapping table: this form is a mapping table, can find the pairing node name information of communication number of the account by inquiring about this table;
B, e-mail messages message: this form is the Content of Communication table, " mail numbering " mid is the major key of this table, unique " mail numbering " mid is all arranged as sign for each communication, if the theme and the text of communication that be mail then this table essential record is if be other communication formats then be chat record;
C, related information table recipient info: this form is that Content of Communication receives information table, in this table, can inquire essential information in " e-mail messages " message table by field " mail numbering " mid;
D, related information table: this form is the contact table, has write down receiving and sending messages between the communication number of the account in this form;
E, weight table: this form is the weight information table of communication number of the account contact;
F, interactive information table: this table comprises text message vector sum user satisfaction for the interactive information table between the communication number of the account.
Step 20, create the communications and liaison relational network according to the resulting preliminary treatment result of previous step.
In step before, from the mail communication of reality, obtained corresponding data, these data itself can not reflect the integral status of mail network intuitively, therefore need to set up the communications and liaison relational network according to mail data in this step.
In the process of setting up the communications and liaison relational network, create a communication node for each communication number of the account, whether needs are created the limit between communication node according to the decision of the content in the resulting form after the preliminary treatment then.If have correspondence between two communication numbers of the account, there is the limit to exist between these two the pairing communication nodes of communication number of the account so, otherwise, just there is not corresponding limit.
When setting up the communications and liaison relational network, can obtain set of node N and limit collection E according to the mail communication data.The composition of set of node N and limit collection E and data structure have had corresponding explanation in preamble, therefore do not repeat herein.
Step 30, structure communication text vector and demand text vector.
In the preprocessing process of step 10, mention, can obtain text message (being Content of Communication) in the communication process by preprocessing process, and these text messages done word segmentation processing, these text messages are done following processing below by following operation.
Step 31, structure inverted index
On the basis of word segmentation result, utilize index dictionary and inactive vocabulary to make up inverted index.Index dictionary, the vocabulary and utilize the index dictionary and inactive vocabulary makes up the common practise of the process of inverted index for this area, therefore repetition herein of stopping using.
Step 32, establishment demand text vector and the text vector of communicating by letter
Include content aspect multiple, user's request customer-furnished comprising having, that represent with the form of query word usually in the text in communication.These texts relevant with user's request are called as the demand text, and the vector of being created by the demand text is called as the demand text vector.The form of demand text vector Q is as follows:
{(t 1,tw 1),(t 2,tw 2),...,(t m,tw m)}
Wherein, t 1, t 2..., t mBe the inquiry lexical item, these speech are all arranged according to ascending order; Tw 1, tw 2..., tw mFor being used to describe the weight of inquiry lexical item in the in the eyes of significance level of user.
Inquiry lexical item by the demand text can make up communication text vector { (t 1, tw 1), (t 2, tw 2) ..., (t m, tw m), and the weight of inquiry lexical item can be calculated by following formula, calculates the inquiry lexical item t among the mail j iWeight tw Ji:
tw ji = f ij × log N f i
F wherein IjBe to comprise speech t among the mail j in the communication text collection iNumber, N be communication text collection number.
Calculate weight tw by above-mentioned formula JiAfter, just can calculate each inquiry lexical item t through weighted calculation 1, t 2..., t mWeight tw in whole communication text collection 1, tw 2..., tw mNeed to prove, though hereinbefore, in demand text vector and feature text vector, the weight of inquiry lexical item is all used such as the form of tw and is represented, but this weight reflects in the demand text vector be corresponding inquiry this in user's significance level in the heart, the frequency dependence that then in the text of communicating by letter, occurs with the inquiry lexical item in the communication text vector.
Step 33, expansion demand text
Consider the diversity of the employed query word of user, as in the example of an inquiry about computerized information, the user who has can be called computer " computer ", in order to make Query Result more accurate, complete, needs expansion demand text.
When expansion demand text, need add relevant lexical item by certain strategy, make the text after the expansion can intactly describe implicit notion or theme.
The operation of expansion demand text can may further comprise the steps:
Step 33-1, at first calculate a lexical item t and the inquiry co-occurrence frequency of lexical item q in text j:
cof(t,q|j)=log(tf(t,j)+1.0)×log(tf(q,j)+1.0)
Wherein, and tf (t, j) or tf (q, j) expression speech t or the occurrence number of q in text j.
Step 33-2, after obtaining the co-occurrence frequency of a lexical item and inquiry lexical item, can further calculate this lexical item and the degree of association of inquiring about between lexical item.
Suppose between each speech among the initial demand text Q separate, the degree of association that can measure lexical item t and Q according to the product of the co-occurrence frequency of each speech among lexical item t and the Q in local text set S.Lexical item t and the Q degree of association in S is defined as:
cohd ( t , Q | S ) = Π q ∈ Q ( cood ( t , q | S ) + 1.0 ) idf ( q | C ) idf ( t | C )
Wherein idf (| C) be defined as:
idf ( | C ) = log ( N ) log ( df ( | C ) + μ )
Df (| C) the text number of certain lexical item appears among the expression corpus C, μ be one greater than 0 adjustable parameter, default value is 100.
Step 33-3, calculate valuation functions, judge whether described lexical item t will be expanded in the demand text by the result of calculation of described valuation functions by the degree of association.
On the basis of aforementioned degree of association computing formula, take the logarithm in both sides, and the computing formula that obtains valuation functions score (t) is as follows:
score ( t ) = Σ q ∈ Q idf ( q | C ) idf ( t | C ) log ( cood ( t , q | S ) + 1.0 )
Define lodd below Q, C(t is under the condition of given overall text set C and user's request text vector Q q|S), lexical item t and the query word q local dependency degree (LocalDependence Degree) in the local document S set, and its computing formula is as follows:
lodd Q,C(t,q|S)=idf(q|C)idf(t|C)log(cood(t,q|S)+1.0)
Then Zhi Qian valuation functions can be reduced to:
score ( t ) = Σ q ∈ Q lodd Q , C ( t , q | S )
After obtaining the score value of valuation functions, just can select the higher lexical item of score value to carry out the expansion of demand text, on the one hand to those in local text set S with query vector Q in the lexical item of the numerous co-occurrence of word frequency give higher score value, concentrate lexical item then to carry out to a certain degree punishment (regulating the degree of punishment by the parameter μ in the idf computing formula) to those at overall mail on the other hand, make the lexical item that the score value finally chosen is the highest and the theme of user's request text have higher correlation with higher frequency.
Step 40, computing node centrad.
Definitional part at preamble is mentioned, and the node center degree comprises node intermediary degree, node tightness and three indexs of node contact degree, with regard to how calculating these indexs describes respectively below.
Step 41, computing node intermediary degree
The mean value of the shortest path number by node k is called intermediary's degree coefficient of node k, is designated as C A(k), then:
C A ( k ) = Σ i n Σ j n g ij ( k ) ( n - 1 ) 2
Wherein, g Ij(k) be a two-valued variable, whether the shortest path between expression node i, the j then is 1 by k, otherwise is 0 by node k.
Step 42, computing node contact degree
The mean value of the node number that will directly link to each other with node k is called degree of the contact coefficient of node k, is designated as C B(k), then:
C B ( k ) = Σ i = 1 n a ( i , k ) ( n - 1 )
Wherein n is the nodal point number of a network, and a (i is a two-valued variable k), is 1 explanation node i, directly link to each other between the k, and be that 0 explanation does not directly link to each other.
Step 43, node tightness
The mean value of the shortest path sum in node k and the network between all nodes is called the tightness coefficient of k, is designated as C C(k), then:
C C ( k ) = Σ i k l ( i , k ) ( n - 1 ) 2
Wherein (i k) is shortest path length between node i, the k to l.
Centrad vector C (k)=(C that just can computing node k after obtaining node intermediary degree, node tightness and node contact degree A(k), C B(k), C C(k)).
Step 50, calculating communications and liaison intensity matrix W
To node i, the communications and liaison relationship strength assessment between the j comprises four indexs: number of communications, call duration time span, shortest path length, shared neighbours' number.Respectively the computational process of these indexs is described below.
Step 51, calculating number of communications
Number of communications is many more between node, shows that its contacts are frequent, concerns tight more.The number of communications of node i, j is calculated as follows:
comm_num ij=send ij+receive ij
Wherein, send IjThe number of times that the expression node i is initiated communication to node j, receive IjThe expression node i receives the number of communications that node j initiates.
Step 52, calculating call duration time span
The inter-node communication time span is long more, shows that the interdependent node contact history is of a specified duration more, concerns closely more, and the call duration time span of node i, j is:
dur_day ij=latest_day ij-earliest_day ij
Wherein, latest_day IjBe the node i that monitors recently, the call duration time between j, earliest_day IjIt is the initial communication time between node i, j.
Step 53, calculating shortest path length
Internodal shortest path length is short more, shows that the substantivity of its contacts is strong more, concerns tight more.Node i, the shortest path length shortest_len between j IjExpression, it is meant that node i has the limit number that the path comprised of minimum edges number in all paths of j.
Step 54, shared neighbours' number
It is many more to share neighbours' node between node, shows that the possibility of its relationship cycle that exists together is big more, concerns tight more.The neighbor node set of scanning node i and j obtains sharing neighbours' number:
sharenode_num ij=|neighbor i∩neighbor j|
Step 55, after calculating number of communications, call duration time span, shortest path length, sharing neighbours' number, just can calculate the function closeness (i that is used to assess two node communications and liaison relationship strength, j), (i, j) value has been formed described communications and liaison intensity matrix W to function closeness on a plurality of dimensions.Described function closeness (i, computing formula j) is:
closeness ( i , j )
= k 1 × comm _ num ij Max _ num + k 2 × dur _ day ij Max _ day
+ k 3 × sharenode _ num ij Max _ node + k 4 × ( 1 - shortest _ len ij Max _ len )
Wherein, Max_num is a maximum communication number of times mutual between all nodes; Max_day is a maximum time span mutual between all nodes; Max_node is that maximum mutual between all nodes is shared neighbours' number; Max_len is the longest mutual between all a nodes shortest path; k iBe weight coefficient.
Step 60, calculating similarity matrix S
Step 61, utilize vector space model to the edge-vector between node i and the node j unify the expression, every limit is a vector.Edge-vector between node i and the node j is defined as the mean value of all communication text vectors between node i and the node j.That is:
e i = ( a 1 i , a 2 i , · · · · · · , a n i )
Wherein, a j i = Σ k = 1 r E w - ID w ( m k , t j ) r , 1 ≤ j ≤ n
E w-ID w(m k, t j) representation feature speech t jAt communication text m kIn weight. step 62, calculate the similarity between any both sides
Utilize cosine formula to calculate the vector on any both sides
Figure BSA00000506493900143
With
Figure BSA00000506493900144
Between similarity, its computing formula is:
s ij = cos ( e i , e j ) = e i · e j ( e i ) 2 × ( e j ) 2 = Σ k = 1 n ( a k i × a k j ) Σ k = 1 n ( a k i ) 2 × Σ k = 1 n ( a k j ) 2
s IjIts value is big more, and angle is more little, and similarity is high more.If Then think e iAnd e jSimilar, otherwise dissimilar.Wherein, Be similarity threshold.
Step 63, structure similarity matrix S
Carry out according to the abovementioned steps opposite side obtaining similarity matrix S on the basis of similarity calculating in twos:
Given threshold value If
Figure BSA00000506493900149
Then similar, otherwise dissimilar, the matrix S after can filtering in view of the above, wherein s ij = 1 s ij &GreaterEqual; &PartialD; 0 s ij < &PartialD;
Step 70, calculating user satisfaction CE
By the user's request text is expanded, Content of Communication can be introduced.Detailed process is as follows:
The weight of step 71, computation requirement text
At first need definite each inquiry lexical item in the in the eyes of weight of user in order to obtain user's satisfaction, before the weight of computation requirement text, at first do as giving a definition:
R represents the text collection of meeting consumers' demand;
C represents all text collections;
N_C represents all text numbers in the set
All text numbers of meeting consumers' demand during N_sim represents to gather.
The weight of computation requirement text can adopt the correlation technique of prior art, in the present embodiment, can be according to the experiment of the relevant feedback of Rocchio, with the demand text as query vector, the desirable query vector that the text that satisfies the demands and the text that do not satisfy the demands are all made a distinction
Figure BSA00000506493900151
Value on each dimension is as the weight of demand text.The computing formula of described desirable query vector is:
Q &RightArrow; opt = 1 N _ sim &Sigma; d j &Element; R d &RightArrow; j | d &RightArrow; j | - 1 N_C-N_sim &Sigma; d j &Element; C - R d &RightArrow; j | d &RightArrow; j |
Wherein, d jThe j dimension of the vector that expression is corresponding, The value of the j dimension of the vector that expression is corresponding;
In the actual conditions, because the text number that satisfies the demands can't be known in advance, therefore when Practical Calculation, at first construct an initial query vector, be that the user gives one [0 with each lexical item, 1] value is represented its significance level, according to the text that satisfies the demands of user's appointment it is progressively revised then, up to reaching an ideal results.The classic algorithm that Rocchio proposes is as follows:
Q &RightArrow; opt = &alpha; &times; q &RightArrow; initial + &beta; &times; &Sigma; d j &Element; R d &RightArrow; j | d &RightArrow; j | - &gamma; &times; &Sigma; d j &Element; C - R d &RightArrow; j | d &RightArrow; j |
Wherein α, β, γ are three constants that are used to adjust; Expression initial query vector.
The user satisfaction of step 72, calculating text m
The satisfaction s of text m mBe expressed as the vector T of text m mWith user's request text vector T QBetween similar value.
s m = cos ( T m , T Q ) = T m &CenterDot; T Q ( T m ) 2 &times; ( T Q ) 2 = &Sigma; k = 1 n ( t k m &times; t k Q ) &Sigma; k = 1 n ( t k m ) 2 &times; &Sigma; k = 1 n ( t k Q ) 2
Step 73, calculating limit user satisfaction
The mean value of all text satisfactions that node i is communicated by letter with node j is called limit user satisfaction CE:
CE = 1 N k &Sigma; i = 1 N k s i
Wherein, N kThe amount of text of communicating by letter with node j for node i.
Mining process to relevant information in the mail communication network in step before illustrates,
Utilize these information can realize that corporations divide.
Step 80, be cluster operation while doing based on Content of Communication.
Described limit cluster is all limits in the communication network will be divided into several corporations, and for Content of Communication, the difference between the limit of different corporations is comparatively obvious, and the limit in the same corporations should be comparatively approaching.The purpose of cluster operation while doing in the communication network is quick lock in user's request scope.The implementation method of described limit cluster operation has multiple, as stratification, partitioning, based on computational methods of grid etc., can adopt the k-means method in the present embodiment.The concrete steps that the k-means method that is adopted in the present embodiment is done the limit cluster operation describe below.
Step 81, determining the number of the corporations that will generate by the limit cluster, is n with this number indicia;
Step 82, be that each corporations generate initial cores separately;
Step 83, for every in communication network limit, calculate the similarity between the initial cores in itself and each corporations successively;
Step 84, according to the result of calculation of step 83, the limit in the communication network is added in the corporations with the initial cores place of its similarity maximum;
Step 85, adjustment cluster centre; In this step, described adjustment cluster centre can adopt the mean value such as each member in the compute classes, with described mean value as common method in the new prior aries such as cluster centre;
Step 86, repeated execution of steps 83-step 85, up to satisfying stop condition, this moment, resulting each corporations were exactly the limit clustering result.Related stop condition can have multiplely in this step, and as in adjusting the process of cluster centre, difference is less than a preassigned threshold value between the core of former and later two classes.
In above-mentioned steps 82, relate to the process that generates initial cores, the establishment with regard to initial cores is illustrated below.
Index 1: the similarity between the initial cores is as much as possible little, makes more possible little of similarity between the corporations at initial cores place.
Index 2: for guaranteeing the initial cores vector is not the limit that isolates, and adds up the limit number similar to it, makes it greater than given threshold value.
Index 3: overlapping few more good more in the limit that two selected cluster centres are relevant.
The selected process of initial cores is as follows:
Step 82-1, in similarity matrix S, if s Ij=0, then limit i and limit j are formed to depositing in the set A;
In every group among step 82-2, the set of computations A with the class degree value of limit i
Figure BSA00000506493900171
And the class degree value of limit j
Figure BSA00000506493900172
Whether judge these two class degree values all greater than preassigned threshold value (as 2), have only that limit i and limit j formed to being isolated limit when described two class degree values during all less than described threshold value, will for the limit i on isolated limit and limit j formed to from set A, deleting.
Step 82-3, limit i in the set A and limit j are carried out step-by-step and operation
Figure BSA00000506493900173
With satisfy the limit i of minimum value and limit j deposit in cluster centre center=(i, j) in;
Step 82-4, search with cluster centre center in the limit k of all limit similarity minimums as new cluster centre.If k does not exist, then return the cluster centre that finds, this cluster centre is exactly an initial cluster center.If it is a plurality of that k has, described k is deposited among the set center, re-execute step 82-3 then.
Step 90, in corporations, find the core member.
The process of finding the core member is as follows.
Step 91, whether be that the core member judges to the member in the corporations based on the node center degree.
Composition about the node center degree has had detailed explanation with calculating in the step 40 of preamble, therefore, do not repeated in this step.Wherein, the contact degree in the node center degree has reflected the active degree of node in network, and the contact degree of a node is very high to mean that it is likely server; What intermediary's degree was weighed is that certain special node is positioned at the degree between other node; Tightness weighed the distance of distance between a node and other node, reflected that a node arrives the speed of other all nodes.The node center degree is integrated above-mentioned three has described the degree of the middle cardiac status of node k in network.
Step 92, based on communication theme whether be that the core member judges to the incorporator.
In this step, whether be that the core member judges that multiple implementation is arranged to the incorporator, adopted the HITS algorithm in the present embodiment based on the communication theme.Described HITS algorithm is that its basic principle is according to a given general reference theme by a kind of web page interlinkage parser of the Kleinberg proposition of IBM, determines authority's page or leaf of this theme by link analysis.In conjunction with the characteristics of communication behavior self, utilize this algorithm to find that core member's process is as follows:
Step 92-1, determine to comprise the node set of HITS algorithm effect:
Step a), concentrate from the Query Result that obtains based on the user's request text and to get the highest preceding t position of rank and put into result set R σ(being called Root Set).
Step b), to described result set R σExpand.Described expansion is divided into two aspects, and the one, with all R σIn the active communication node of node extend to described result set R σIn; The 2nd, pointing to described R σIn in the passive nodes in communication of each node, get any d node and extend to original result set R σIn, thereby form S σ(being called Base Set).
S set after the expansion σCan satisfy three characteristics: S preferably σLess relatively; S σMiddle interdependent node is abundant; S σThe authoritative node that comprises most most worthies.
The center weight of step 92-2, computing node and authoritative weight.
The node set S that will have the communications and liaison relation σBe expressed as a directed graph, (p, q) expression node p and node q communicate directed edge.A good Centroid (hub) points to many good authoritative node (authorities), and a good authoritative node (authority) also has a plurality of good Centroids (hubs) to point to it simultaneously.For any node p, the authorityweight (authoritative weight) of A (p) expression node p, the hub weight (center weight) of H (p) expression node p, satisfy normalization condition:
&Sigma; p &Element; S &sigma; A 2 ( p ) = 1 And &Sigma; p &Element; S &sigma; H 2 ( p ) = 1
Kleinberg is divided into dual mode with the transmission of node weights, i.e. I operation and O operation:
I is operating as the transmission of Centroid to authoritative node, is expressed as:
A ( p ) &LeftArrow; &Sigma; q &Element; &CenterDot; Q H ( q )
Q={q| (p, q) ∈ E} wherein;
O is operating as the transmission of authoritative node to Centroid, is expressed as:
H ( p ) &LeftArrow; &Sigma; q &Element; &CenterDot; Q A ( q )
Wherein (p, q) ∈ E} can obtain the final weight of all nodes to Q={q| by interative computation.
After step 93, all nodes all have a centrad and node weights, take all factors into consideration the value of two aspects, according to descending, the forward node of overall ranking is exactly the core member with it.
Step 100, the member in the corporations is expanded.
After previous step finds core member in the corporations, serve as that the expansion to member in the corporations is realized on the basis with these core members.Described member's expansion can be by judging whether a member and core member belong to same corporations and realize.Connect tight relatively the node that belongs to same corporations, the node outside corporations.In like manner, for the information that node i has, the amount of information that obtains in the same corporations is greater than and obtains amount of information outside the corporations.Therefore, the amount of information that obtains according to each node in the information communication process can judge whether two nodes belong to same corporations.With reference to figure 3, detailed process is as follows:
Step 101, get m and the node i beeline is formed set of node { v greater than 2 node 1, v 2..., v m; Write down the number of times that belongs to same corporations with node i with variable fnum, the initial value of this variable is 0.
Step 102, from the set of node that step 101 is produced, choose a undressed subclass, judge whether node and the node i in this node subclass belongs to same corporations; This step comprises:
Step 102-1, in the node subclass, choose a node j, choose aforesaid node i in addition as source node, initialization M as terminal note i=1, M j=0, set { M k=0}; M wherein i, M j, M kBy representing node i, the useful amount of information of j, k respectively; K ∈ 1,2 ..., n_node} and k ≠ i, j; N_node is the node number in the network;
Step 102-2, upgrade M successively by ascending order kValue, the value of information of node k is as follows:
M k = &Sigma; i M i w ik e ik &Sigma; j w ij e ki
Wherein, when having the limit to connect between node i and the node j, e Ij=1; E when not having the limit to connect between node i and the node j Ij=0.w Ij=1 is limit e IjCorresponding weights.
Step 102-3, repeat above-mentioned step 102-2, change not obvious up to the value of information of node.
Step 102-4, select the divided information threshold value according to redundancy.
Each node all has a value of information, as long as exist the maximum difference place to scratch near position intermediate and information, just whole network can be divided into two corporations.Described herein has introduced redundancy near position intermediate.Such as, number of network node is n, if redundancy is, means that the size of corporations is roughly at 20% o'clock
Figure BSA00000506493900192
Like this as long as be positioned at (0.3n, adjacent two nodes of searching difference maximum in node 0.7n).
Behind step 102-5, the selected good threshold, if the value of information of node k then belongs to same corporations with source node greater than a threshold value (being 70% in the present embodiment), corresponding fnum+1; If less than this threshold value then do not belong to same corporations.
Step 103, repeating step 102 calculate the frequency p of each node according to the fnum of each node, if frequency p greater than another threshold value (having adopted 0.6 in the present embodiment), thinks that then this node and node i belong to same corporations, otherwise then is not.
Step 110, divide based on the corporations of complex network.
When carrying out corporations' division, can adopt partitioning and act of union.Be example in the present embodiment with the act of union, the process that corporations are divided describes.
In this step, can relate to the notion of modularity, do following explanation earlier:
Modularity: suppose that network is divided into k corporations, define the symmetrical matrix E=(e of a k * k dimension Ij), element e wherein IjThe limit of node that connects two different corporations in the expression network shared ratio in all limits, these two nodes lay respectively at i corporations and j corporations.Modularity represents that with Q its computing formula is as follows:
Q = &Sigma; i ( e ii - a i 2 ) = Tre - | | e 2 | |
Wherein || e 2|| all element sum among the representing matrix x.
Also relate to following three kinds of data structures in this step:
(1) modularity Increment Matrix Δ Q IjWith its each line stores is a balanced binary tree, and raft.
(2) raft H.Comprised modularity Increment Matrix Δ Q in this heap IjIn greatest member of each row, comprise the numbering i and the j of two corporations of element correspondence simultaneously.
(3) auxiliary vectorial a i
After related notion and data structure being done as above explanation, it is as follows to adopt act of union to do the concrete steps that corporations divide:
Step 111, communication network is divided into n corporations, each node is exactly independently corporations.At this moment, initial modularity value Q=0.Initial a i, and intermediate variable b IjSatisfy:
a i = &Sigma; j w ij e ij 2 &Sigma; i , j w ij
b ij = w ij e ij 2 &Sigma; i , j w ij
E when node i has the limit to be connected with node j wherein Ij=1; E when not having the limit to connect between node i and the node j Ij=0.w IjBe limit e IjCorresponding weights.The element of module Increment Matrix satisfies when initial:
&Delta; Q ij = b ij + b ji - 2 a i a j = w ij e ij &Sigma; i , j w ij - ( &Sigma; k w ik e ik ) ( &Sigma; k w jk e jk ) 2 ( &Sigma; i , j w ij ) 2
Step 112, from raft H, select maximum Δ Q Ij, merging corresponding i of corporations and j, the label of the corporations after mark merges is j; And update module degree increment Delta Q Ij, raft H and auxiliary vectorial a i:
Step 112-1, Δ Q IjRenewal, delete the element of the capable and i of i row, upgrade the element of the capable and j row of j, thereby obtain
Figure BSA00000506493900212
Δ Q is upgraded in the renewal of step 112-2, raft H at every turn IjAfter, upgrade the greatest member of corresponding row and column in the raft.
Step 112-3, auxiliary vector upgrade:
a′ j=a i+a j
a′ i=0
Modularity value Q+ Δ Q after record merges simultaneously Ij
Step 113, repeating step 112 merge end condition up to satisfying.Described merging end condition has multiple, and in one embodiment, described merging end condition all belongs in the corporations for all nodes.In another embodiment, consider that modularity Q only has a peak value, therefore can be made as after the greatest member in the modularity Increment Matrix is born by positive changing to that just can stop can be also with merging end condition.
The present invention also provides a kind of corporations of communication network to divide system, with reference to figure 4, comprising: data preprocessing module, link relation network struction module, text vector constructing module, node center degree computing module, side attribute computing module, limit cluster module, core member search module, member's expansion module and member and divide module; Wherein,
Described data preprocessing module is carried out preliminary treatment to communication data, obtains the information about communication data that comprises communication data ID, caller information, recipient's information, call duration time, Content of Communication;
Described communications and liaison relational network makes up module and creates the communications and liaison relational network that is used to reflect described communication network architecture according to resulting preliminary treatment result, obtain being used for representing the sender of communications of described communication network, communication receiver's node by described communications and liaison relational network, and the limit that is used to represent correspondence between described sender of communications, communication receiver;
Described text vector constructing module is according to user the query word structure demand text vector that provides and the text vector of communicating by letter;
Described node center degree computing module calculates the node center degree of each node in the described link relation network; Described node center degree comprises node intermediary degree, node tightness and node contact degree;
Described side attribute computing module calculates link relation intensity, the similarity between each internodal limit and user between each node that has link relation in the described link relation network to the satisfaction on described internodal limit;
Described limit cluster module is the cluster operation while doing in the described link relation network based on described Content of Communication, generates a plurality of corporations;
Described core member searches module and seek separately core member according to described node center degree and communication theme in described corporation;
Described member's expansion module is expanded the member in the corporations on described core member's basis;
Described member divides module the member through expansion in the described corporations is divided, and generates new corporations.
By above method and system, can realize division, thereby the member is classified according to their attribute or feature different corporations in the communication network.
It should be noted last that above embodiment is only unrestricted in order to technical scheme of the present invention to be described.Although the present invention is had been described in detail with reference to embodiment, those of ordinary skill in the art is to be understood that, technical scheme of the present invention is made amendment or is equal to replacement, do not break away from the spirit and scope of technical solution of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.

Claims (7)

1. corporations' division methods of a communication network comprises:
Step 1), communication data is carried out preliminary treatment, obtain the information that comprises communication data ID, caller information, recipient's information, call duration time, Content of Communication about communication data;
Step 2), create the communications and liaison relational network that is used to reflect described communication network architecture according to the resulting preliminary treatment result of step 1), obtain being used for representing the sender of communications of described communication network, communication receiver's node by described communications and liaison relational network, and the limit that is used to represent correspondence between described sender of communications, communication receiver;
Step 3), the query word structure demand text vector that provides according to the user and the text vector of communicating by letter;
The node center degree of each node in step 4), the described communications and liaison relational network of calculating; Described node center degree comprises node intermediary degree, node tightness and node contact degree;
Step 5), calculate communications and liaison relationship strength, the similarity between each internodal limit and user between each node that has link relation in the described communications and liaison relational network to the satisfaction on described internodal limit;
Step 6), be cluster operation in the described communications and liaison relational network, generate a plurality of corporations while doing based on described Content of Communication;
Step 7), in described corporations, seek separately core member according to described node center degree and communication theme;
Step 8), on described core member's basis, the member in the corporations is expanded;
Step 9), the member that process in the described corporations is expanded divide, and generate new corporations.
2. corporations' division methods of communication network according to claim 1 is characterized in that, described step 6) comprises:
Step 6-1), determine the number of the corporations that the limit cluster will generate;
Step 6-2), be each corporations' generation initial cores separately;
Step 6-3), for every in communication network limit, calculate the similarity between the initial cores in itself and described each corporations successively;
Step 6-4), according to step 6-3) result of calculation, the limit in the described communication network is added in the corporations with the initial cores place of its similarity maximum;
Step 6-5), adjust the cluster centre of described each corporations;
Step 6-6), repeated execution of steps 6-3)-step 6-5), up to satisfying stop condition.
3. corporations' division methods of communication network according to claim 2 is characterized in that, described step 6-2) comprising:
Step 6-2-1), according to the similarity between described each internodal limit, if similarity s Ij=0, then limit i and limit j are formed to depositing in the set A;
Step 6-2-2), in every group among the set of computations A with the class degree value of limit i
Figure FSA00000506493800021
And the class degree value of limit j
Figure FSA00000506493800022
Whether judge these two class degree values all greater than preassigned threshold value, have only that limit i and limit j formed to being isolated limit when described two class degree values during all less than described threshold value, will for the limit i on isolated limit and limit j formed to from set A, deleting;
Step 6-2-3), limit i in the set A and limit j are carried out step-by-step and operation With satisfy the limit i of minimum value and limit j deposit in cluster centre center=(i, j) in;
Step 6-2-4), search with cluster centre center in the limit k of all limit similarity minimums as new cluster centre, if k does not exist, then return the cluster centre that finds, this cluster centre is exactly an initial cluster center; If it is a plurality of that k has, described k is deposited among the set center, re-execute step 6-2-3 then).
4. corporations' division methods of communication network according to claim 1 is characterized in that, described step 7) comprises:
Step 7-1), be each member's computing node centrad in the corporations;
Step 7-2), theme as member's computing node weight in the corporations based on communication;
Step 7-3), node is sorted, obtain the core member according to ranking results by described node center degree and described node weights.
5. corporations' division methods of communication network according to claim 1 is characterized in that, described step 8) comprises:
Step 8-1), get m and node i beeline and form set of node { v greater than 2 node 1, v 2..., v m; The number of times that belongs to same corporations with variable fnum record and node i;
Step 8-2), from the set of node that previous step is produced, choose a undressed subclass, judge whether node and the node i in this node subclass belongs to same corporations;
Step 8-3), repeating step 8-2), the frequency p according to the fnum of each node calculates each node if frequency p, thinks then that this node and node i belong to same corporations greater than another threshold value, otherwise then is not.
6. corporations' division methods of communication network according to claim 1 is characterized in that, described step 9) comprises:
Step 9-1), communication network is divided into n corporations, each node is exactly independently corporations; Wherein, the initial modularity value Q=0 that is used for the representation module degree, initial auxiliary vectorial a iAnd intermediate variable b IjSatisfy:
a i = &Sigma; j w ij e ij 2 &Sigma; i , j w ij
b ij = w ij e ij 2 &Sigma; i , j w ij
E when node i has the limit to be connected with node j wherein Ij=1; E when not having the limit to connect between node i and the node j Ij=0; w IjBe limit e IjCorresponding weights; The element Δ Q of module Increment Matrix IjWhen initial, satisfy:
&Delta; Q ij = b ij + b ji - 2 a i a j = w ij e ij &Sigma; i , j w ij - ( &Sigma; k w ik e ik ) ( &Sigma; k w jk e jk ) 2 ( &Sigma; i , j w ij ) 2
Step 9-2), from raft H, select maximum Δ Q Ij, merging corresponding i of corporations and j, the label of the corporations after mark merges is j; And renewal Δ Q Ij, raft H and auxiliary vectorial a i: this step comprises:
Step 9-2-1), Δ Q IjRenewal, delete the element of the capable and i of i row, upgrade the element of the capable and j row of j, thereby obtain
Step 9-2-2), the renewal of raft H, upgrade Δ Q at every turn IjAfter, upgrade the greatest member of corresponding row and column in the raft;
Step 9-2-3), auxiliary vector upgrades:
a′ j=a i+a j
a′ i=0
Modularity value Q+ Δ Q after record merges simultaneously Ij
Step 9-3), repeating step 9-2) merge end condition up to satisfying.
7. the corporations of a communication network divide system, it is characterized in that, comprising: data preprocessing module, communications and liaison relational network structure module, text vector constructing module, node center degree computing module, side attribute computing module, limit cluster module, core member search module, member's expansion module and member and divide module; Wherein,
Described data preprocessing module is carried out preliminary treatment to communication data, obtains the information about communication data that comprises communication data ID, caller information, recipient's information, call duration time, Content of Communication;
Described communications and liaison relational network makes up module and creates the communications and liaison relational network that is used to reflect described communication network architecture according to resulting preliminary treatment result, obtain being used for representing the sender of communications of described communication network, communication receiver's node by described communications and liaison relational network, and the limit that is used to represent correspondence between described sender of communications, communication receiver;
Described text vector constructing module is according to user the query word structure demand text vector that provides and the text vector of communicating by letter;
Described node center degree computing module calculates the node center degree of each node in the described link relation network; Described node center degree comprises node intermediary degree, node tightness and node contact degree;
Described side attribute computing module calculates link relation intensity, the similarity between each internodal limit and user between each node that has link relation in the described link relation network to the satisfaction on described internodal limit;
Described limit cluster module is the cluster operation while doing in the described link relation network based on described Content of Communication, generates a plurality of corporations;
Described core member searches module and seek separately core member according to described node center degree and communication theme in described corporation;
Described member's expansion module is expanded the member in the corporations on described core member's basis;
Described member divides module the member through expansion in the described corporations is divided, and generates new corporations.
CN201110141970.XA 2011-05-30 2011-05-30 Group dividing method and system of communication network Expired - Fee Related CN102202012B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110141970.XA CN102202012B (en) 2011-05-30 2011-05-30 Group dividing method and system of communication network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110141970.XA CN102202012B (en) 2011-05-30 2011-05-30 Group dividing method and system of communication network

Publications (2)

Publication Number Publication Date
CN102202012A true CN102202012A (en) 2011-09-28
CN102202012B CN102202012B (en) 2015-01-14

Family

ID=44662413

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110141970.XA Expired - Fee Related CN102202012B (en) 2011-05-30 2011-05-30 Group dividing method and system of communication network

Country Status (1)

Country Link
CN (1) CN102202012B (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102509248A (en) * 2011-11-22 2012-06-20 昆明理工大学 Multi-objective categorization method for travel grouping and system adopting same
CN103226577A (en) * 2013-04-01 2013-07-31 儒豹(苏州)科技有限责任公司 News clustering method
CN103338460A (en) * 2013-06-17 2013-10-02 北京邮电大学 Method for calculating centrality of nodes of dynamic network environment
CN104394202A (en) * 2014-11-13 2015-03-04 西安交通大学 A node vitality quantifying method in a mobile social network
CN105740907A (en) * 2016-02-01 2016-07-06 石家庄铁道大学 Local community mining method
CN105760503A (en) * 2016-02-23 2016-07-13 清华大学 Method for quickly calculating graph node similarity
CN105812280A (en) * 2016-05-05 2016-07-27 四川九洲电器集团有限责任公司 Classification method and electronic equipment
CN105844577A (en) * 2015-01-12 2016-08-10 阿里巴巴集团控股有限公司 Relation network recognition method and device
CN106022936A (en) * 2016-05-25 2016-10-12 南京大学 Influence maximization algorithm based on community structure and applicable to paper cooperation network
CN106301868A (en) * 2015-06-12 2017-01-04 华为技术有限公司 The method and apparatus determining the importance of network node
CN106789338A (en) * 2017-01-18 2017-05-31 北京航空航天大学 A kind of method that key person is found in the extensive social networks of dynamic
CN107545509A (en) * 2017-07-17 2018-01-05 西安电子科技大学 A kind of group dividing method of more relation social networks
CN107623594A (en) * 2017-09-01 2018-01-23 电子科技大学 A kind of three-dimensional level network topology method for visualizing of geographical location information constraint
CN108989581A (en) * 2018-09-21 2018-12-11 中国银行股份有限公司 A kind of consumer's risk recognition methods, apparatus and system
WO2019042060A1 (en) * 2017-08-30 2019-03-07 腾讯科技(深圳)有限公司 Method and apparatus for determining member role, and storage medium
CN109543108A (en) * 2018-11-26 2019-03-29 中国人民解放军陆军工程大学 The user role digging system of network-oriented multiple domain information
CN109978053A (en) * 2019-03-25 2019-07-05 北京航空航天大学 A kind of unmanned plane cooperative control method based on community division
CN110083780A (en) * 2019-04-25 2019-08-02 上海理工大学 Personalized recommendation method based on community division in complex network model
CN110213164A (en) * 2019-05-21 2019-09-06 南瑞集团有限公司 A kind of method and device of the identification network key disseminator based on topology information fusion
CN110825935A (en) * 2019-09-26 2020-02-21 福建新大陆软件工程有限公司 Community core character mining method, system, electronic equipment and readable storage medium
WO2020062450A1 (en) * 2018-09-28 2020-04-02 苏州达家迎信息技术有限公司 Method and apparatus for determining central vertex in social network, and device and storage medium
CN111104722A (en) * 2018-10-10 2020-05-05 华北电力大学(保定) Electric power communication network modeling method considering overlapping communities
CN112699108A (en) * 2020-12-25 2021-04-23 中科恒运股份有限公司 Data reconstruction method and device for marital registration system and terminal equipment
CN114118094A (en) * 2021-11-12 2022-03-01 国网天津市电力公司 Semantic community discovery method based on non-negative matrix factorization

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008120072A1 (en) * 2007-04-03 2008-10-09 Fernando Luege Mateos Method and system of classifying, ranking and relating information based on networks
CN101321190A (en) * 2008-07-04 2008-12-10 清华大学 Recommend method and recommend system of heterogeneous network
CN101408901A (en) * 2008-11-26 2009-04-15 东北大学 Probability clustering method of cross-categorical data based on key word
CN101430708A (en) * 2008-11-21 2009-05-13 哈尔滨工业大学深圳研究生院 Blog hierarchy classification tree construction method based on label clustering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008120072A1 (en) * 2007-04-03 2008-10-09 Fernando Luege Mateos Method and system of classifying, ranking and relating information based on networks
CN101321190A (en) * 2008-07-04 2008-12-10 清华大学 Recommend method and recommend system of heterogeneous network
CN101430708A (en) * 2008-11-21 2009-05-13 哈尔滨工业大学深圳研究生院 Blog hierarchy classification tree construction method based on label clustering
CN101408901A (en) * 2008-11-26 2009-04-15 东北大学 Probability clustering method of cross-categorical data based on key word

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102509248A (en) * 2011-11-22 2012-06-20 昆明理工大学 Multi-objective categorization method for travel grouping and system adopting same
CN102509248B (en) * 2011-11-22 2016-03-30 昆明理工大学 A kind ofly be applied to the travel Multi-Target Classification Method of forming a team and system
CN103226577A (en) * 2013-04-01 2013-07-31 儒豹(苏州)科技有限责任公司 News clustering method
CN103338460A (en) * 2013-06-17 2013-10-02 北京邮电大学 Method for calculating centrality of nodes of dynamic network environment
CN103338460B (en) * 2013-06-17 2016-03-30 北京邮电大学 For the computational methods of the node center degree of dynamic network environment
CN104394202A (en) * 2014-11-13 2015-03-04 西安交通大学 A node vitality quantifying method in a mobile social network
CN104394202B (en) * 2014-11-13 2018-01-05 西安交通大学 A kind of node liveness quantization method in mobile community network
CN105844577A (en) * 2015-01-12 2016-08-10 阿里巴巴集团控股有限公司 Relation network recognition method and device
CN106301868A (en) * 2015-06-12 2017-01-04 华为技术有限公司 The method and apparatus determining the importance of network node
CN106301868B (en) * 2015-06-12 2019-08-20 华为技术有限公司 The method and apparatus for determining the importance of network node
CN105740907A (en) * 2016-02-01 2016-07-06 石家庄铁道大学 Local community mining method
CN105760503B (en) * 2016-02-23 2019-02-05 清华大学 A kind of method of quick calculating node of graph similarity
CN105760503A (en) * 2016-02-23 2016-07-13 清华大学 Method for quickly calculating graph node similarity
CN105812280A (en) * 2016-05-05 2016-07-27 四川九洲电器集团有限责任公司 Classification method and electronic equipment
CN105812280B (en) * 2016-05-05 2019-06-04 四川九洲电器集团有限责任公司 A kind of classification method and electronic equipment
CN106022936B (en) * 2016-05-25 2020-03-20 南京大学 Community structure-based influence maximization algorithm applicable to thesis cooperative network
CN106022936A (en) * 2016-05-25 2016-10-12 南京大学 Influence maximization algorithm based on community structure and applicable to paper cooperation network
CN106789338A (en) * 2017-01-18 2017-05-31 北京航空航天大学 A kind of method that key person is found in the extensive social networks of dynamic
CN106789338B (en) * 2017-01-18 2020-10-30 北京航空航天大学 Method for discovering key people in dynamic large-scale social network
CN107545509A (en) * 2017-07-17 2018-01-05 西安电子科技大学 A kind of group dividing method of more relation social networks
CN110020341A (en) * 2017-08-30 2019-07-16 腾讯科技(深圳)有限公司 Member role determines method, apparatus and storage medium
CN110020341B (en) * 2017-08-30 2022-09-16 腾讯科技(深圳)有限公司 Member role determination method, device and storage medium
WO2019042060A1 (en) * 2017-08-30 2019-03-07 腾讯科技(深圳)有限公司 Method and apparatus for determining member role, and storage medium
CN107623594A (en) * 2017-09-01 2018-01-23 电子科技大学 A kind of three-dimensional level network topology method for visualizing of geographical location information constraint
CN108989581B (en) * 2018-09-21 2022-03-22 中国银行股份有限公司 User risk identification method, device and system
CN108989581A (en) * 2018-09-21 2018-12-11 中国银行股份有限公司 A kind of consumer's risk recognition methods, apparatus and system
WO2020062450A1 (en) * 2018-09-28 2020-04-02 苏州达家迎信息技术有限公司 Method and apparatus for determining central vertex in social network, and device and storage medium
US11487818B2 (en) 2018-09-28 2022-11-01 Suzhou Dajiaying Information Technology Co., Ltd Method, apparatus, device and storage medium for determining a central vertex in a social network
CN111104722A (en) * 2018-10-10 2020-05-05 华北电力大学(保定) Electric power communication network modeling method considering overlapping communities
CN109543108A (en) * 2018-11-26 2019-03-29 中国人民解放军陆军工程大学 The user role digging system of network-oriented multiple domain information
CN109978053A (en) * 2019-03-25 2019-07-05 北京航空航天大学 A kind of unmanned plane cooperative control method based on community division
CN110083780B (en) * 2019-04-25 2023-07-21 上海理工大学 Community based on complex network model partitioned personalized recommendation method
CN110083780A (en) * 2019-04-25 2019-08-02 上海理工大学 Personalized recommendation method based on community division in complex network model
CN110213164A (en) * 2019-05-21 2019-09-06 南瑞集团有限公司 A kind of method and device of the identification network key disseminator based on topology information fusion
CN110213164B (en) * 2019-05-21 2021-06-08 南瑞集团有限公司 Method and device for identifying network key propagator based on topology information fusion
CN110825935A (en) * 2019-09-26 2020-02-21 福建新大陆软件工程有限公司 Community core character mining method, system, electronic equipment and readable storage medium
CN112699108A (en) * 2020-12-25 2021-04-23 中科恒运股份有限公司 Data reconstruction method and device for marital registration system and terminal equipment
CN114118094A (en) * 2021-11-12 2022-03-01 国网天津市电力公司 Semantic community discovery method based on non-negative matrix factorization
CN114118094B (en) * 2021-11-12 2024-05-24 国网天津市电力公司 Semantic community discovery method based on nonnegative matrix factorization

Also Published As

Publication number Publication date
CN102202012B (en) 2015-01-14

Similar Documents

Publication Publication Date Title
CN102202012B (en) Group dividing method and system of communication network
Hammouda et al. Hierarchically distributed peer-to-peer document clustering and cluster summarization
CN104598588B (en) Microblog users label automatic generating calculation based on double focusing class
Ma et al. Big graph search: challenges and techniques
CN105045875B (en) Personalized search and device
Qiao et al. Top-k nearest keyword search on large graphs
US20080097994A1 (en) Method of extracting community and system for the same
CN106503148B (en) A kind of table entity link method based on multiple knowledge base
Chen et al. Graph-based clustering for computational linguistics: A survey
CN110147421B (en) Target entity linking method, device, equipment and storage medium
CN106484764A (en) User&#39;s similarity calculating method based on crowd portrayal technology
CN110209808A (en) A kind of event generation method and relevant apparatus based on text information
CN107239512B (en) A kind of microblogging comment spam recognition methods of combination comment relational network figure
CN105808590A (en) Search engine realization method as well as search method and apparatus
CN110502509A (en) A kind of traffic big data cleaning method and relevant apparatus based on Hadoop Yu Spark frame
CN104298776A (en) LDA model-based search engine result optimization system
CN105721279A (en) Relationship circle excavation method and system of telecommunication network users
CN112084781B (en) Standard term determining method, device and storage medium
CN104699817A (en) Search engine ordering method and search engine ordering system based on improved spectral clusters
CN105404619A (en) Similarity based semantic Web service clustering labeling method
CN111680498B (en) Entity disambiguation method, device, storage medium and computer equipment
Langville et al. The use of linear algebra by web search engines
Zhang et al. Co-ranking multiple entities in a heterogeneous network: Integrating temporal factor and users’ bookmarks
CN116450938A (en) Work order recommendation realization method and system based on map
Kubatz et al. Localrank-neighborhood-based, fast computation of tag recommendations

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150114

Termination date: 20160530