CN104615718B - The Hierarchy Analysis Method of social networks accident - Google Patents

The Hierarchy Analysis Method of social networks accident Download PDF

Info

Publication number
CN104615718B
CN104615718B CN201510061738.3A CN201510061738A CN104615718B CN 104615718 B CN104615718 B CN 104615718B CN 201510061738 A CN201510061738 A CN 201510061738A CN 104615718 B CN104615718 B CN 104615718B
Authority
CN
China
Prior art keywords
node
occurrence
burst
hot word
mrow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510061738.3A
Other languages
Chinese (zh)
Other versions
CN104615718A (en
Inventor
怀进鹏
于伟仁
李建欣
卢忠宇
张日崇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201510061738.3A priority Critical patent/CN104615718B/en
Publication of CN104615718A publication Critical patent/CN104615718A/en
Application granted granted Critical
Publication of CN104615718B publication Critical patent/CN104615718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of Hierarchy Analysis Method of social networks accident, including:Obtain the hot word co-occurrence graph that happens suddenly;It is determined that bipartite graph corresponding with the hot word co-occurrence graph that happens suddenly, k factions filtration treatment is carried out to bipartite graph, obtains each maximum factions corresponding to corporations of each k factions and corporations of each k factions;By default Measure Indexes, descending arrangement is carried out to the burst hot word node included in each maximum factions corresponding to corporations of each k factions respectively;According to putting in order for the hot word node that happened suddenly in each maximum factions by descending arrangement processing, accident characteristics tree is built;Accident characteristics tree is carried out to set the breadth first traversal that depth is k, determines sub-branch corresponding to k depth branch and each k depth branch;It is determined that each branch is a subevent of the burst hot word node composition accident included in maximum factions corresponding to each k depth branch and its corresponding sub-branch, so as to realize the accident of fine granulation and subevent detection.

Description

The Hierarchy Analysis Method of social networks accident
Technical field
The invention belongs to big data processing technology field, more particularly, to a kind of step analysis of social networks accident Method.
Background technology
Social networks plays more and more important role, such as microblogging in the life of people, and the two of largest domestic are big The enrollment of microblog Sina and Tengxun is already more than 500,000,000.The 33rd China Internet network investigation of development situation of CNNIC is united By in December, 2013, China's microblog users scale is 2.81 hundred million for meter report, and microblogging utilization rate is 45.5% in netizen.
For accident in other words focus incident, the influence scale and spread speed of microblogging have surmounted common blog With traditional news media.On May 12nd, 2008, Sichuan Province China Wenchuan earthquake occurrence, Twitter 35 divide 33 at about 14 Second discloses first bar message.Including producing huge shadow in the whole nation from Linwu melon grower's event, school bus gross overload accident, and in April, 2014 Loud " child Hong Kong urine event " is also to be propagated rapidly in social crowd by microblog, and then causes extensive discussions, Microblogging has become the public opinion platform that can not be despised.
Microblogging can reflect public opinion situation in time, obtain real time information from microblogging in time, judge accident, find phase Microblogging is closed to be significant.It is varied that the mode of accident is detected from a large amount of microbloggings at present, for example uses and is based on The method of cluster, the detection for realizing using method of topic model etc. accident.But these methods are typically according to by each The graph structure that the keyword that is included in microblog data text is formed is detected, due to having in the word of the graph structure a lot To the word of event detection redundancy, expressive force is inadequate, so that the testing result of accident is bad in microblogging.It is and existing Event detection outcome be that all words corresponding to each event are set out in the form of keyword set, it is impossible to disclose each The subevent level included in event, so as to which more fine-grained event analysis can not be realized.
The content of the invention
For above-mentioned problem, the present invention provides a kind of Hierarchy Analysis Method of social networks accident, to Different subevents corresponding to accurate accident and each accident detected in social networks, so as to realize accident Fine granulation analysis.
The invention provides a kind of Hierarchy Analysis Method of social networks accident, including:
Obtain the hot word co-occurrence graph that happens suddenly;Wherein, the set of node of the hot word co-occurrence graph of burst includes each burst hot word node And each co-occurrence word node of cooccurrence relation, the side collection of the hot word co-occurrence graph of burst with each burst hot word node respectively be present Including the side between each burst hot word node and each burst hot word node respectively corresponding co-occurrence word node;Wherein, institute State burst hot word co-occurrence graph key words co-occurrence figure happen suddenly after hot word detects and obtain, the key words co-occurrence figure according to The co-occurrence keyword included in pending data text in the social networks obtains, and the co-occurrence keyword refers to go out simultaneously Keyword in present same data text;
It is determined that bipartite graph corresponding with the hot word co-occurrence graph of burst, the node set of the bipartite graph is by the burst heat Burst hot word node in word co-occurrence graph forms, and the side in the line set of the bipartite graph is according in the hot word co-occurrence graph of burst Side between each burst hot word node determines, and in the line set of the bipartite graph while to have no right;
K factions filtration treatment is carried out to the bipartite graph, corporations of each k factions is obtained and corporations of each k factions is corresponding Each maximum factions, wherein, the burst hot word node that is included in corporations of each k factions forms an accident, each k factions Each maximum factions form the one side of accident corresponding to corporations, and k takes the integer more than or equal to 3;
Respectively using the corporations of each k factions in corporations of each k factions as corporations of pending k factions, according to default Node importance Measure Indexes are warm to the burst included in each maximum factions corresponding to corporations of the pending k factions respectively Word node carries out descending arrangement, obtains each maximum factions by descending arrangement processing;
According to putting in order for the hot word node that happened suddenly in each maximum factions by descending arrangement processing, structure burst Affair character tree, wherein, the set membership between the accident characteristics tree interior joint is dashed forward according in each maximum factions The determination that puts in order of heating word node;
The accident characteristics tree is carried out to set the breadth first traversal that depth is k, determines the accident feature Depth is set in tree and is no more than corresponding each k depth branch during the k;
It is determined that sub-branch corresponding to each k depth branch, wherein, sub-branch corresponding to each k depth branch includes accepting Each sub-branch under the leaf node of the k depth branch;
It is determined that in maximum factions corresponding to sub-branch corresponding with each k depth branch and each k depth branch Comprising burst hot word node form a subevent of accident corresponding to pending k factions corporations.
The Hierarchy Analysis Method of social networks accident provided by the invention, include each pending data text in acquisition Burst hot word in this and after distinguishing the hot word co-occurrence graph for each co-occurrence word that co-occurrence is connected with each burst hot word, based on by this Bipartite graph that hot word co-occurrence graph obtains carries out k factions filtration treatment, to obtain i.e. each accident of corporations of each k factions and every Each maximum factions corresponding to corporations of individual k factions are the variant aspect of each accident.In order to further obtain each burst Each subevent that event is included, the hot word node importance row that happens suddenly is carried out respectively to each maximum factions of a corporations of k factions Sequence processing, to build the accident characteristics tree being made up of the burst hot word node in each maximum factions, so as to be based on this feature Tree obtains branch corresponding to each subevent.By the program, it is not only able to accurately to detect to include in social networks each Accident, additionally it is possible to each subevent that each accident includes is detected, so as to realize that the degree accident of fine grained is examined Survey analysis.
Brief description of the drawings
Fig. 1 is the flow chart of the Hierarchy Analysis Method embodiment one of social networks accident of the present invention;
Fig. 2 is the specific implementation flow chart of step 101 in embodiment illustrated in fig. 1 one;
Fig. 3 is the schematic diagram of key words co-occurrence figure;
Fig. 4 is the schematic diagram of the hot word co-occurrence graph of burst;
Fig. 5 is the flow chart of the Hierarchy Analysis Method embodiment two of social networks accident of the present invention.
Embodiment
Fig. 1 is the flow chart of the Hierarchy Analysis Method embodiment one of social networks accident of the present invention, as shown in figure 1, This method includes:
Step 101, obtain the hot word co-occurrence graph that happens suddenly;
Wherein, the hot word co-occurrence graph G of burstk(t) set of node include it is each burst hot word node and respectively with it is each There is each co-occurrence word node of cooccurrence relation in burst hot word node, the side collection of the hot word co-occurrence graph of burst includes each burst Side between hot word node and each burst hot word node respectively corresponding co-occurrence word node;Wherein, the burst hot word co-occurrence Figure to key words co-occurrence figure happen suddenly after hot word detects and obtained, and the key words co-occurrence figure is according in the social networks Pending data text in the co-occurrence keyword that includes obtain, the co-occurrence keyword refers to appear in same data simultaneously Keyword in text.
Social networks in the present embodiment such as can be the social networks such as microblogging, forum, and described in the present embodiment is treated Processing data text such as can be microblog data text accordingly.What deserves to be explained is primarily directed to text in the present embodiment The data message of this type is handled, referred to as data text.Microblog data has that the quality of data is low, text is short, term is non- Formally, the features such as non-event noise text is more.In order to accurately detect the burst included in numerous microblog data texts Event, i.e., in very short time section by extensive discussions, propagate focus incident, firstly, it is necessary to be determined from each data text Co-occurrence keyword, wherein, co-occurrence keyword refers to appear in the keyword in same data text simultaneously, i.e., same data Cooccurrence relation be present in the keyword in text.
In the present embodiment, an event in community network such as microblogging is expressed as one group of keyword set being closely related Close.But in fact, although description one event data text it is varied, but the keyword of its core have it is consistent Tendency.For accident, its kernel keyword has bursty nature in usage amount.Key is used in the present embodiment The cooccurrence relation of word and keyword, model the incidence relation between keyword and keyword.
Specifically, key words co-occurrence figure is built first, and in simple terms, the key words co-occurrence figure is represented with G (t), therein Node set includes the keyword in each data text, i.e., by keyword node configuration node set, each keyword node it Between cooccurrence relation as the side in line set.Specific building process will be described in detail in the embodiment below, herein not Elaborate.
Existing each accident during in order to detect the current detection moment in social networks, current embodiment require that above-mentioned On the basis of key words co-occurrence figure G (t), the heat of the burst in key words co-occurrence figure G (t) is detected by happening suddenly hot word detection Word, so as to obtain that there is the co-occurrence word of co-occurrence annexation as node by each burst hot word and respectively with each burst hot word Unexpected hot word co-occurrence graph Gk(t).Unexpected hot word co-occurrence graph Gk(t) contained in most significant to incident detection each Connection side between node and node, it will be described in detail for the process of burst hot word detection in the embodiment below.
Step 102, bipartite graph corresponding with the hot word co-occurrence graph of burst is determined, the node set of the bipartite graph is by institute The burst hot word node stated in the hot word co-occurrence graph that happens suddenly forms, and the side in the line set of the bipartite graph is according to the burst hot word In co-occurrence figure respectively burst hot word node between side determine, and in the line set of the bipartite graph while to have no right.
It is easy to detect for subsequent burst event and subevent, following processing is done to the hot word co-occurrence graph that happens suddenly in the present embodiment Obtain corresponding bipartite graph (binary graph):Remove the hot word co-occurrence graph G that happens suddenlyk(t) each co-occurrence word node in, only retains Happen suddenly hot word node, also, the side for being also associated the co-occurrence word node of removal while co-occurrence word node is removed removes, only Retain the connection side between burst hot word node, moreover, respectively the connection between burst hot word node only remains each while to have no right Side attachment structure between burst hot word node, below will be with Gt(t)=(Vt(t),Et(t)) bipartite graph is represented.
Step 103, k factions filtration treatment is carried out to the bipartite graph, obtain corporations of each k factions and each k Each maximum factions corresponding to corporations of factions, wherein, the burst hot word node included in corporations of each k factions forms a burst thing Part, each maximum factions corresponding to corporations of each k factions form the one side of accident, and k takes whole more than or equal to 3 Number.
It is more in the prior art that the detection of accident is carried out using the method based on figure cluster, one in this kind of detection method As can have the defects of following:Incomplete semanteme, i.e., same keyword can not be appeared in different accidents;It is accidental Symbiosis, the i.e. keyword in different contexts may be assigned in same event;Lack hierarchical relationship, in nature Under state, an event can include many aspects, and so as to have multiple subevents, and existing way can not detect outgoing event Hierarchical structure.Therefore, following processing mode is present embodiments provided to carry out incident detection and subevent detection.
According to the definition for the hot word co-occurrence graph of happening suddenly, two nodes in figure, if they are not in same data text Middle appearance, then side connection is not had between them.That is, keyword node is formed most in its 1-hop is a hop neighbor The event of fine granulation, then the maximum factions in figure (maximal clique) directly define the one side of event.In order to Formation event and the full appreciation for obtaining event, for bipartite graph Gt(t) k factions filtering (k-clique-, is run Percolation) processing method, to find overlapped affair clustering.(k-clique-percolation) filters in k factions Method is exactly to find all maximum factions in figure first in simple terms, then extracts the connected component in factions' figure (connected component), so as to find corporations of k factions (k-clique-community).One corporation of k factions be by The corporations of adjacent k factions composition, the neighbouring relations mean to share k-1 node between two adjacent k factions.Based on this, The definition of event will have the property that:Community local definitions, do not influenceed by the change of figure other parts;It can allow It is overlapping between community, anticipated and different contexts so as to understand a word more;By adjusting k, it is possible to reduce accidental common Raw situation;The hierarchical relationship of the different aspect of event is included in community maximal clique.
In the present embodiment, k value is 3, now, any to be abandoned comprising event of the keyword number of nodes less than 3. Increase k value can abandon the aspect of more events or event, but can have stronger semantic consistency, therefore reality should With the middle value that need to be integrated compromise and consider k.In k=3, each burst included in figure is finally given by running the above method Event, each accident are made up of one group of burst hot word node, a now accident being made up of burst hot word node Figure G can be described ast(t) 3-clique-community on, the various aspects of accident are by each maximal clique Represent.
Step 104, respectively using the corporations of each k factions in corporations of each k factions as corporations of pending k factions, According to default node importance Measure Indexes, respectively to being included in each maximum factions corresponding to corporations of the pending k factions Burst hot word node carry out descending arrangement, obtain by the descending arrangement processing each maximum factions.
Step 105, putting in order according to the hot word node that happened suddenly in each maximum factions by descending arrangement processing, Accident characteristics tree is built, wherein, the set membership between the accident characteristics tree interior joint is according to each maximum Happen suddenly the determination that puts in order of hot word node in factions;
Step 106, the accident characteristics tree is carried out to set the breadth first traversal that depth is k, determine the burst Depth is set in affair character tree and is no more than corresponding each k depth branch during the k;
Step 107, sub-branch corresponding to each k depth branch is determined, wherein, sub-branch corresponding to each k depth branch Including each sub-branch being undertaken under the leaf node of the k depth branch;
Step 108, determine corresponding to sub-branch corresponding with each k depth branch and each k depth branch most Burst hot word node included in big factions forms a sub- thing of accident corresponding to corporations of the pending k factions Part.
After the above-mentioned corporations of each accident Ji Ge k factions obtained in figure, in fact, in same burst thing Part is the maximal cliques in same community, can there is mutual non-conterminous situation, this non-conterminous situation It often imply that different subevents or be probably different events.In order to further according to the not Tongfang of each accident Face determines each subevent that each accident includes, also by building accident characteristics tree in the present embodiment, and to burst Affair character tree carries out the mode of the breadth first traversal of k depth to determine each subevent that each accident includes, so as to Know the hierarchical structure of each accident.
Specifically, for each maximum factions of some corporation of k factions (k-clique-community), according to node Importance measures index such as Activity On the Node frequency etc. carries out descending row to the burst hot word node included in each maximum factions Row.Wherein, because k factions filter method is directed to bipartite graph Gt(t), Gt(t) in while to have no right, therefore, node is lived Dynamic frequency can be determined simply with the number on the side associated by each burst hot node.Carried out in each maximum factions After the sequence processing of above-mentioned burst hot word node, the accident characteristics tree corresponding to the accident is built.Specifically, According to the ranking results of each maximum factions, the burst hot word node of foremost will be come as root node, according to put in order according to The secondary child node expanded downwards from level to level.In the process, for example, certain two maximum factions the first two burst hot word node Identical, the 3rd burst hot word node is different, then the two maximum factions share the first two burst hot word node be first layer and Second layer depth is identical, and third layer has bifurcated, i.e., separates Liang Ge branches under second burst hot word node, correspond to this respectively The 3rd burst hot word node in Liang Ge factions.The like obtain accident characteristics tree corresponding to certain accident.
After obtaining accident characteristics tree, the accident characteristics tree is carried out to set the breadth first traversal that depth is k, So that it is determined that depth is set in accident characteristics tree is no more than corresponding each k depth branch during k.In simple terms, it is exactly only to prominent The successively traversal inspection that affair character tree carries out k depth is sent out, the accident characteristics tree is possessed each in maximum k depth Branch corresponds to a k depth branch respectively.And then sub-branch corresponding to each k depth branch is determined, wherein, each k depth point Sub-branch corresponding to branch includes each sub-branch being undertaken under the leaf node of the k depth branch.In simple terms, entirely set Depth may be far longer than k, and after the k depth branch that depth capacity is k is obtained, each k depth branch may also include more The sub-branch of individual carrying in its lower section.In the present embodiment, it is determined that each k depth branch and its corresponding each sub-branch are carried on Each sub-branch below is multiple-branching construction corresponding to a subevent, and then is determined each corresponding to each such multiple-limb Maximum factions, the multiple-limb may correspond to multiple maximum factions, so as to be wrapped in maximum factions corresponding to each multiple-limb A subevent corresponding to each burst hot word node composition contained.
In the present embodiment, obtain include burst hot word in each pending data text and with each burst hot word point After the hot word co-occurrence graph of each co-occurrence word of other co-occurrence connection, k factions are carried out based on the bipartite graph obtained by the hot word co-occurrence graph Filtration treatment, it is i.e. every to obtain each maximum factions corresponding to i.e. each accident of corporations of each k factions and corporations of each k factions The variant aspect of individual accident.In order to further obtain each subevent that each accident is included, to a k factions Each maximum factions of corporations carry out the hot word node importance sequence processing that happens suddenly respectively, to build by the burst in each maximum factions The accident characteristics tree of hot word node composition, so as to obtain above-mentioned multiple-limb knot corresponding to each subevent based on this feature tree Structure.By the program, it is not only able to accurately detect each accident included in social networks, additionally it is possible to detect each Each subevent that accident includes, so as to realize that the degree incident detection of fine grained is analyzed.
Fig. 2 is the specific implementation flow chart of step 101 in embodiment illustrated in fig. 1 one, as shown in Fig. 2 step 101 in Fig. 1 It can be realized with as follows 1011-1016.
Step 1011, pending data is obtained, the pending data includes at least one data text;
Step 1012, word segmentation processing is carried out to each data text at least one data text respectively, obtained By the keyword that is included in each data text as node, the cooccurrence relation between the keyword in each data text is made For the key words co-occurrence figure on side;
Wherein, the cooccurrence relation refers to the keyword while appeared in same data text, co-occurrence pass be present Connection side between the keyword of system all be present.
Specifically, word segmentation processing is carried out to each data text included in pending data, such as using existing NLPIR Chinese word segmentation systems carry out word segmentation processing to each data file, so as to according to word be unit by each data text Divided, obtain each word included in each data text.Not only included in the word included in each data text Some have the word of entity implication, such as title, verb, do not have the word of concrete meaning, such as pronoun, conjunction with some also Deng.Therefore, it is necessary to select those that there is the such as title, verb of entity implication from the result of word segmentation processing in the present embodiment As keyword.
Specifically, when building key words co-occurrence figure, using the keyword in data text as the node in figure, they it Between cooccurrence relation as side.When arriving a data text, one is therefrom extracted without weight Undirected graph, and carry The sequence on side therein is taken, the sequence on these constantly caused sides can be considered as streaming diagram data, so as to change over time Change, with the processing successively of data text, obtained by the keyword in each data text as node, each data text In keyword between undirected temporal diagram of the cooccurrence relation as side.Fig. 3 is the schematic diagram of key words co-occurrence figure, as shown in figure 3, The figure on the left side is the schematic diagram of word segmentation processing result in Fig. 3, and the participle of three data texts is illustrated in word segmentation processing result Result, the i.e. keyword of data text 1 are A, C, D in the first row, and the keyword of data text 2 is in the second row A, B, D, the keyword of data text 3 are A, B, C in the third line.The figure on the right is corresponding with the figure on the left side crucial in Fig. 3 Word co-occurrence graph, for example for data file 1, keyword A, C, D are appeared in data text 1 simultaneously, are formed in A, C, D The annexation of one triangle, similarly data text 2 and data text 3.What deserves to be explained is the such as sum of data text 1 According to text 2, the connection side wherein between A, D occurs, but right figure is only illustrated between this two node in Fig. 3 one haves no right Side, therefore, Fig. 3 are only the simple signals of side structure.
Step 1013, according to reach the current detection moment when key words co-occurrence figure in each edge each arrive the moment Decaying weight corresponding to each arrival moment of occurrence number and each edge, determines each edge described in the current detection moment Side frequency.
In the present embodiment, it is G (t)=(N (t), A (t)) to define key words co-occurrence figure, and it is a undirected temporal diagram, i.e., Its node, while and while weight can change over time.Wherein, N (t) is the set of streaming diagram data interior joint, A (T) It is the sequence on side.It can include and repeat among edge sequence, because may go out in different or identical reception to repetition Existing side data.Change over time, node and side in the figure G (t) can update, moreover, two there is side to connect Connection side between the node pair of relation be likely to occur it is multiple, i.e., repeatedly occur different at the time of or identical at the time of it is more Secondary appearance, and for incident detection, the arrival moment on the side between two nodes pair has different influences, distance The detection moment nearer arrival moment has more sensitive influence important in other words.
Therefore, in order to detect accident, it is necessary to be assigned to bigger weight to most recent incoming side, otherwise it is to nearest Emergency situations are by the insensitive of change.In order to describe the characteristic on this tense, the weight on side is controlled to decay using decay factor λ Speed.In the present embodiment, the decaying weight of each edge in figure G (t) is determined using exponential decay model, this smooth declines Subtracting can avoid splitting a burst character, so as to cause its detection failure.
First, the definition of the decaying weight on following side is introduced in the present embodiment:
In current detection moment t, moment tsThe weight on the side of arrival isWherein, λ is decay factor, decay Half-life period be 1/ λ, 0<λ<1.
Secondly, the definition of the decaying weight based on above-mentioned side, the side frequency that following side is also introduced in the present embodiment are determined Justice:
The side frequency for defining side (i, j) is the weight of in current detection moment t side (i, j).
Due in figure G (t), multiple examples on side (i, j) being contained in edge sequence A (t), i.e., at the similar and different moment The side (i, j) occurs repeatedly, and the definition of the decaying weight on side is just for some tsCorresponding side (i, j) during the moment Decaying weight, it is not directed to all arrival moment of side (i, j) when reaching current detection moment t.Therefore, determined according to side frequency Justice, the decay for any a line (i, j) in figure G (t), it is necessary to according to corresponding to each arrival moment of side (i, j) Weight and each occurrence number for arriving moment side (i, j) determine the side frequency on side (i, j).
Specifically, it is determined that it is in current detection moment t, the side frequency F (i, j, t) of side (i, j):
Wherein, T (i, j, k) is k-th of arrival moment that side (i, j) arrives, and N (i, j, k) is that side (i, j) arrives at k-th Carry out in the moment number occurred, side (i, j) is any bar side in the key words co-occurrence figure.
In addition, on the basis of defined above, in actual applications, for figure G (t), change over time, only When having new side to arrive, the data in figure are that side or node can just update.It is understood that for the node in figure I and node j, if do not arrived comprising their side, their statistics need not update.Therefore, can be based on Under inference realize the side frequency on the side (i, j) between simple geo-statistic node i and node j:
Assuming that the time that last side (i, j) is arrived is t ', there is following inference for F (i, j, t):Side if (i, j) Do not arrive, then have in this period at (t ', t):
F (i, j, t)=F (i, j, t ') × 2-λ(t-t’).
During renewal, all statistics are caused all to update time t till now first by above-mentioned inference, , can be simply using+1 behaviour as new side data arrive and caused renewal for an angle with time decay Make renewal to arrive in F (`) value.The computation complexity of the operation and the quantity on side are linear.For each node i, this is more New operation can be handled independently and in a distributed manner, as long as node i can receive the data of its own.Renewal operation can be with Handled using any of continuous flow type data processing platform (DPP), discrete streaming data processing platform (DPP) and off-line data processing platform Platform is handled, for example either Spark or Spark Streaming are realized by popular processing platform Storm.
Handled more than, obtained the temporal diagram of a cooccurrence relation based on keyword, side frequency weighs keyword Between incidence relation tightness degree, Activity On the Node frequency weighs the active level of keyword, namely temperature.One keyword Context is stored among its neighbors set S (i, t) well.To each node i, there are three statistical informations to need to tie up Shield:(i) the last time L (i) occurred in the side comprising node i;(ii) node in S (i, t);(iii) node i saves to its neighbour The sequence F (i, j, L (i)) of the side frequency value on side in point set S (i, t).The statistical information is safeguarded in shared space and figure It is node degree and directly proportional.Key words co-occurrence figure is typically sparse.| S (i, t) | typically it is far smaller than nodes number Amount, therefore the information maintenance method is close and efficient, particularly under the scene of Stream Processing.
Step 1014, the neighbors set for determining each node in the key words co-occurrence figure, and according to respectively with institute The side frequency on side between each neighbors in neighbors set is stated, determines the Activity On the Node of each node described in the current detection moment Frequency.
In the present embodiment, based on side frequency, also define the Activity On the Node frequency of node, i.e., for arbitrary node i, Assuming that its neighbors collection is combined into S (i, t) during current sensing time t, then the Activity On the Node frequency of node i is defined as and the node The side frequency sum on all sides connected i.Therefore, for each node in key words co-occurrence figure, determined first from figure every The neighbors set of individual node, so according to respectively between each neighbors in its neighbors set side side frequency, it is determined that The Activity On the Node frequency of current detection moment each node.Specifically, it is determined that Activity On the Node of the node i in current detection moment t Frequency alpha (i, t) is:
Wherein, S (i, t) be node i neighbors set, the Serial No. of S (i, t) interior joint
Step 1015, the Activity On the Node frequency according to corresponding to each node in different detection moments, it is determined that each node Motion frequency intensity of variation;
Step 1016, determine that the motion frequency intensity of variation of the key words co-occurrence figure interior joint is more than predeterminable level threshold value Node be burst hot word node, the node of cooccurrence relation be present with the burst hot word node for the burst hot word node Co-occurrence word node, the co-occurrence word node as corresponding to each burst hot word node and each burst hot word node difference is obtained, and it is each prominent The hot word co-occurrence graph of burst of the side composition to generate heat between word node and each burst hot word node respectively corresponding co-occurrence word node.
In the present embodiment, in order to finally realize the detection of accident, it is necessary first to examined in key words co-occurrence figure G (t) Measure burst hot word.Burst hot word detection is to detect that Activity On the Node frequency has the node of prominent change.Activity On the Node frequency Increased node, its related side can also embody the feature of side frequency burst to rate suddenly.
In the present embodiment, can according to corresponding to each node in different detection moments Activity On the Node frequency, it is determined that each The motion frequency intensity of variation of node, so as to if the motion frequency intensity of variation of certain node is more than predeterminable level threshold value, be somebody's turn to do Node is burst hot word node.Preferably, because the motion frequency of node changes in half-life period span significantly, therefore this The half-life period motion frequency change of following node defined in embodiment:
The half-life period motion frequency of node i, which becomes, to be turned to:HA (i, t, λ)=α (i, t)-α (i, t-1/ λ).
What deserves to be explained is the half-life period motion frequency change of the node i determined according to above formula is a change value sequence, Corresponding half and half when i.e. half-life period motion frequency change sequence HA (i, t, λ) is followed successively by the different times by current detection moment t The phase motion frequency changing value that declines forms, i.e. corresponding each changing value when t takes different time points successively.
And then determine that the motion frequency of node i becomes according to the half-life period motion frequency change sequence HA (i, t, λ) of node i Change degree ZValue is:
Wherein, μA(i, t, λ) be half-life period motion frequency change sequence HA (i, t, λ) average, σA(i, t, λ) is partly to decline Phase motion frequency change sequence HA (i, t, λ) standard deviation.
So as to which if it is such as 3 that the motion frequency intensity of variation of node i, which is more than predeterminable level threshold value, the node i is prominent Generate heat word node, the co-occurrence word node of the node of cooccurrence relation for burst hot word node i be present with the burst hot word node i. That is in figure G (t), for each keyword node in figure, all carry out above-mentioned motion frequency intensity of variation and compare place Reason, obtain entirely scheming all burst hot word nodes present in G (t), it is each to happen suddenly what hot word node was associated in figure G (t) Keyword node is the co-occurrence word node that the keyword node for having connection side with it is the burst hot word node.
So as to obtain being corresponded to respectively by each burst hot word node and each burst hot word node from key words co-occurrence figure G (t) Co-occurrence word node as node set, distinguish corresponding co-occurrence word node with each burst hot word node as each burst hot word node Between burst hot word co-occurrence graph G of the side as line setk(t).As shown in figure 4, Fig. 4 is the schematic diagram of the hot word co-occurrence graph of burst, Citing in the Fig. 4 is based on the key words co-occurrence figure in Fig. 3.It is illustrated that node A for burst hot word, node B, C, D in Fig. 4 It is burst hot word node A co-occurrence word node respectively.
By above-mentioned processing, eliminate in key words co-occurrence figure G (t) to the keyword node of incident detection redundancy, To there is the co-occurrence word inspection of high correlation to the significant burst hot word of incident detection and with burst hot word Measure and, in order to carry out follow-up incident detection.
Optionally, after step 101 obtains above-mentioned hot word co-occurrence graph, in order to further increase final incident detection As a result and subevent testing result accuracy, as shown in figure 5, can also comprise the following steps:
Step 201, filtering denoising is carried out to the hot word co-occurrence graph of burst, obtain the burst hot word co-occurrence after denoising Figure.
Wherein, the filtering denoising includes:
Filter out the side that side frequency in the hot word co-occurrence graph of burst is less than default side frequency threshold value;
Filter out the node that neighbors number in the hot word co-occurrence graph of burst is not more than predetermined number threshold value, the node bag Include the burst hot word node and co-occurrence word node in the hot word co-occurrence graph of burst.
In order to obtain the stronger incident detections of incidence relation, before the step of carrying out incident detection, Noise filtering mode can also be used in each detection cycle to the hot word co-occurrence graph G that happens suddenlyk(t) denoising is carried out.It is specific next Say, for the hot word co-occurrence graph G that happens suddenlyk(t) each burst hot word node in, its maintenance have its burst severity information i.e. ZValue Value, current sensing time t, the side frequency value on co-occurrence word node set and the side between each co-occurrence word node.Therefore, base In these information, on the one hand, filter out the hot word co-occurrence graph G of burstk(t) side frequency is less than the side of default side frequency threshold value in.I.e. For each burst hot word node, pass through the side frequency on its side between each co-occurrence word node in corresponding co-occurrence word node set The value of rate removes the side less than default side frequency threshold value compared with default side frequency threshold value.On the other hand, burst is filtered out Hot word co-occurrence graph Gk(t) neighbors number is not more than the node of predetermined number threshold value such as 1 in, and the node includes the burst Hot word co-occurrence graph Gk(t) burst hot word node and co-occurrence word node in.Deleted burst hot word node is got along well the 3rd section Point is common to be occurred, it is believed that it is without tight type semantically, so as to represent an event;Deleted co-occurrence word Node then represents application method of the burst hot word node associated with it in other linguistic context contexts.Under normal circumstances, go G can be removed by making an uproark(t) at least half of node in.By denoising, happen suddenly hot word co-occurrence graph Gk(t) it is changed into an oriented cum rights Scheme Ge(t)=(Ve(t),Ee(t)), node set Ve (t) contains all nodes after denoising, in line set Ee (t) While pointing to its corresponding co-occurrence word node by burst hot word node, the weight of each edge is corresponding side frequency value.
So as to determine bipartite graph G corresponding with the hot word co-occurrence graph of burst in step 102e(t), it is changed into accordingly:Really Fixed bipartite graph corresponding with the hot word co-occurrence graph of burst after the denoising, the hot word co-occurrence graph pair of burst with after the denoising The node set for the bipartite graph answered is made up of the burst hot word node in the hot word co-occurrence graph of burst after the denoising, in line set In the hot word co-occurrence graph of burst after according to the denoising between each burst hot word while determine.
In the present embodiment, to happening suddenly, hot word co-occurrence graph carries out above-mentioned denoising, has effectively filtered redundant node therein With redundancy side, so as to be advantageous to improve incident detections accuracy.
In the present embodiment, after word segmentation processing is carried out to pending multiple microblog data texts, each microblogging number is obtained According to the co-occurrence keyword of text, all there is annexation between each co-occurrence keyword, it is crucial by the co-occurrence to each data text Word is merged, and is obtained by the keyword in each data text as node, being total between the keyword in each data text Now key words co-occurrence figure of the relation as side.The key words co-occurrence figure is a undirected temporal diagram, is a streaming figure.To the pass Each edge in key word co-occurrence graph carries out the determination of decaying weight according to the time that the arrives nearlyer principle with higher weight, by It is more sensitive for nearest accident in most recent incoming side, so as to ensure more promptly and accurately to detect the thing that happens suddenly Part.Also, the intensity of variation of the Activity On the Node frequency of each keyword node in key words co-occurrence figure determines key Burst hot word node in word co-occurrence graph, obtain the hot word being made up of the corresponding each co-occurrence word node of each burst hot word node Co-occurrence figure, eliminate in key words co-occurrence figure for some keyword nodes of incident detection redundancy so that according to the hot word It is more accurate to obtain the testing result of accident corresponding to each cluster result that co-occurrence figure carries out figure clustering processing.
Further, after step 107 obtains each subevent of each accident, as shown in figure 5, can also wrap Include following steps:
Step 202, according to the hot word co-occurrence graph of burst after the denoising, it is determined that with corporations of the pending k factions Extension co-occurrence word node corresponding to each subevent, and the extension co-occurrence word node of determination is added to corresponding subevent In be expanded subevent, wherein, extension co-occurrence word node corresponding to each subevent and each dashing forward in corresponding subevent Heating word node has cooccurrence relation.
Specifically, the detection of the subevent included in above-mentioned accident and accident is based on bipartite graph Gt(t), wherein having abandoned hot word co-occurrence graph Gk(t) the figure G, or perhaps after denoisinge(t) the co-occurrence word node in, And co-occurrence word node can also provide event very important information, user can be helped to understand event result.Therefore, the present embodiment The middle extended operation carried out to subevent, i.e., { Ve(t)-Vt(t) the co-occurrence word node in } is added in subevent.Specifically Ground, according to the hot word co-occurrence graph G of burst after denoisinge(t) pass between the corresponding co-occurrence word node of each burst hot word node Connection relation, it is determined that extension co-occurrence word node corresponding to each subevent, wherein, extension co-occurrence word node is corresponding to certain subevent In Ge(t) in, there is the co-occurrence word node of cooccurrence relation with each burst hot word node in the subevent.So as to by determination Extension co-occurrence word node is added in corresponding subevent with the subevent that is expanded.
In the present embodiment, by above-mentioned subevent expansion process, more complete and comprehensive subevent description has been obtained.
Finally it should be noted that:Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations;To the greatest extent The present invention is described in detail with reference to foregoing embodiments for pipe, it will be understood by those within the art that:Its according to The technical scheme described in foregoing embodiments can so be modified, either which part or all technical characteristic are entered Row equivalent substitution;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology The scope of scheme.

Claims (8)

  1. A kind of 1. Hierarchy Analysis Method of social networks accident, it is characterised in that including:
    Obtain the hot word co-occurrence graph that happens suddenly;Wherein, the set of node of the hot word co-occurrence graph of the burst include each burst hot word node and Each co-occurrence word node of cooccurrence relation with each burst hot word node respectively be present, the side collection of the hot word co-occurrence graph of burst includes Side between each burst hot word node and each burst hot word node respectively corresponding co-occurrence word node;Wherein, it is described prominent Heating word co-occurrence graph to key words co-occurrence figure happen suddenly after hot word detects and obtained, and the key words co-occurrence figure is according to The co-occurrence keyword included in pending data text in social networks obtains, and the co-occurrence keyword refers to appear in simultaneously Keyword in same data text;
    It is determined that bipartite graph corresponding with the hot word co-occurrence graph of burst, the node set of the bipartite graph is total to by the burst hot word Now the burst hot word node in figure forms, and the side in the line set of the bipartite graph is according to each prominent in the hot word co-occurrence graph of burst Generate heat word node between side determine, and in the line set of the bipartite graph while to have no right;
    K factions filtration treatment is carried out to the bipartite graph, is obtained each corresponding to corporations of each k factions and corporations of each k factions Maximum factions, wherein, the burst hot word node included in corporations of each k factions forms an accident, corporations of each k factions Corresponding each maximum factions form the one side of accident, and k takes the integer more than or equal to 3;
    Respectively using the corporations of each k factions in corporations of each k factions as corporations of pending k factions, according to default node Importance measures index, respectively the burst hot word section to being included in each maximum factions corresponding to corporations of the pending k factions Point carries out descending arrangement, obtains each maximum factions by descending arrangement processing;
    According to putting in order for the hot word node that happened suddenly in each maximum factions by descending arrangement processing, accident is built Characteristics tree, wherein, the set membership between the accident characteristics tree interior joint is according to the heat that happened suddenly in each maximum factions The determination that puts in order of word node;
    The accident characteristics tree is carried out to set the breadth first traversal that depth is k, determined in the accident characteristics tree Tree depth is no more than corresponding each k depth branch during the k;
    It is determined that sub-branch corresponding to each k depth branch, wherein, sub-branch corresponding to each k depth branch includes being undertaken on institute State each sub-branch under the leaf node of k depth branch;
    It is determined that wrapped in maximum factions corresponding to sub-branch corresponding with each k depth branch and each k depth branch The burst hot word node contained forms a subevent of accident corresponding to corporations of the pending k factions.
  2. 2. according to the method for claim 1, it is characterised in that the hot word co-occurrence graph of the acquisition burst, including:
    Pending data is obtained successively, and the pending data includes at least one data text;
    Word segmentation processing is carried out to each data text at least one data text successively, obtained by each data text In the keyword that includes as node, the key of the cooccurrence relation between the keyword as side in each data text Word co-occurrence graph;
    According to reach the current detection moment when key words co-occurrence figure in each edge each arrive the moment occurrence number and Decaying weight corresponding to each arrival moment of each edge, determine the side frequency of current detection moment each edge, each edge Side frequency be the current detection moment each edge weight;
    Determine the neighbors set of each node in the key words co-occurrence figure, and according to respectively with the neighbors set Each neighbors between side side frequency, determine the Activity On the Node frequency of current detection moment each node, each Activity On the Node Frequency is the side frequency sum on all sides being connected with each node;
    The Activity On the Node frequency according to corresponding to each node in different detection moments, it is determined that the motion frequency change journey of each node Degree;
    Determine the key words co-occurrence figure interior joint motion frequency intensity of variation be more than predeterminable level threshold value node for burst Hot word node, the node that cooccurrence relation be present with the burst hot word node are the co-occurrence word node of the burst hot word node, Obtain as each burst hot word node and each burst hot word node respectively corresponding to co-occurrence word node, and each burst hot word node with The hot word co-occurrence graph of the burst of side composition between co-occurrence word node corresponding to each burst hot word node difference.
  3. 3. according to the method for claim 2, it is characterised in that described according to keyword when reaching the current detection moment Each edge each arrives to decay corresponding to the occurrence number at moment and each arrival moment of each edge and weighed in co-occurrence figure Weight, before determining the side frequency of each edge described in the current detection moment, in addition to:
    Determined according to being defined as below in current detection moment t, decaying weight corresponding to each arrival moment of each edge:
    In current detection moment t, time tsThe decaying weight on the side of arrival is:Wherein, λ is decay factor, decay Half-life period be 1/ λ, 0<λ<1.
  4. 4. according to the method for claim 3, it is characterised in that described according to keyword when reaching the current detection moment Each edge each arrives to decay corresponding to the occurrence number at moment and each arrival moment of each edge and weighed in co-occurrence figure Weight, the side frequency of each edge described in the current detection moment is determined, including:
    It is determined that it is in current detection moment t, the side frequency F (i, j, t) of side (i, j):
    <mrow> <mi>F</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msubsup> <mi>n</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mi>t</mi> </msubsup> </munderover> <mi>N</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&amp;times;</mo> <msup> <mn>2</mn> <mrow> <mo>-</mo> <mi>&amp;lambda;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mi>T</mi> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </msup> </mrow>
    Wherein, T (i, j, k) is k-th of arrival moment that side (i, j) arrives, and N (i, j, k) is side (i, j) when arriving for k-th The number occurred in quarter, side (i, j) are any bar side in the key words co-occurrence figure,Untill moment t, to there is side Number at the time of (i, j).
  5. 5. according to the method for claim 4, it is characterised in that each node determined in the key words co-occurrence figure Neighbors set, and according to respectively between each neighbors in the neighbors set side side frequency, it is determined that current inspection The Activity On the Node frequency of each node described in the moment is surveyed, including:
    Determine that Activity On the Node frequency alpha (i, t) of the node i in current detection moment t is:
    <mrow> <mi>&amp;alpha;</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <msubsup> <mi>j</mi> <mn>1</mn> <mi>i</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mi>j</mi> <mrow> <mo>|</mo> <mi>S</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>|</mo> </mrow> <mi>i</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> </munderover> <mi>F</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>m</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow>
    Wherein, S (i, t) be node i neighbors set, the Serial No. of S (i, t) interior joint
  6. 6. according to the method for claim 5, it is characterised in that each node of basis is corresponding to different detection moments Activity On the Node frequency, it is determined that the motion frequency intensity of variation of each node, including:
    The half-life period motion frequency change sequence HA (i, t, λ) of node i is determined according to equation below:
    HA (i, t, λ)=α (i, t)-α (i, t-1/ λ);
    Wherein, corresponded to when half-life period motion frequency change sequence HA (i, t, λ) is followed successively by the different times by current detection moment t Each half-life period motion frequency changing value composition;
    The motion frequency intensity of variation of node i is determined according to the half-life period motion frequency change sequence HA (i, t, λ) of node i ZValue is:
    <mrow> <mi>Z</mi> <mi>V</mi> <mi>a</mi> <mi>l</mi> <mi>u</mi> <mi>e</mi> <mo>=</mo> <mfrac> <mrow> <mi>H</mi> <mi>A</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>&amp;lambda;</mi> <mo>)</mo> </mrow> <mo>-</mo> <msup> <mi>&amp;mu;</mi> <mi>A</mi> </msup> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>&amp;lambda;</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msup> <mi>&amp;sigma;</mi> <mi>A</mi> </msup> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>&amp;lambda;</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
    Wherein, μA(i, t, λ) be half-life period motion frequency change sequence HA (i, t, λ) average, σA(i, t, λ) lives for half-life period Dynamic frequency change sequence HA (i, t, λ) standard deviation.
  7. 7. the method according to any one of claim 2 to 6, it is characterised in that described to obtain the burst hot word co-occurrence After figure, in addition to:
    Filtering denoising is carried out to the hot word co-occurrence graph of burst, obtains the hot word co-occurrence graph of the burst after denoising, wherein, it is described Filtering denoising includes:
    Filter out the side that side frequency in the hot word co-occurrence graph of burst is less than default side frequency threshold value;
    The node that neighbors number in the hot word co-occurrence graph of burst is not more than predetermined number threshold value is filtered out, the node includes institute State the burst hot word node and co-occurrence word node in the hot word co-occurrence graph that happens suddenly;
    Accordingly, determination bipartite graph corresponding with the hot word co-occurrence graph of burst, including:
    It is determined that bipartite graph corresponding with the hot word co-occurrence graph of burst after the denoising, the burst hot word with after the denoising are total to Now the node set of bipartite graph corresponding to figure is made up of the burst hot word node in the hot word co-occurrence graph of burst after the denoising, side In set in the hot word co-occurrence graph of burst after according to the denoising between each burst hot word while determine.
  8. 8. according to the method for claim 7, it is characterised in that the determination and each k depth branch and each k Burst hot word node corresponding to sub-branch corresponding to depth branch included in maximum factions forms society of the pending k factions After a subevent of accident corresponding to group, in addition to:
    According to the hot word co-occurrence graph of burst after the denoising, it is determined that with each subevent pair in corporations of the pending k factions The extension co-occurrence word node answered, and the extension co-occurrence word node of determination is added in corresponding subevent the son that is expanded Event, wherein, extension co-occurrence word node has with each burst hot word node in corresponding subevent corresponding to each subevent There is cooccurrence relation.
CN201510061738.3A 2015-02-05 2015-02-05 The Hierarchy Analysis Method of social networks accident Active CN104615718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510061738.3A CN104615718B (en) 2015-02-05 2015-02-05 The Hierarchy Analysis Method of social networks accident

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510061738.3A CN104615718B (en) 2015-02-05 2015-02-05 The Hierarchy Analysis Method of social networks accident

Publications (2)

Publication Number Publication Date
CN104615718A CN104615718A (en) 2015-05-13
CN104615718B true CN104615718B (en) 2017-12-15

Family

ID=53150160

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510061738.3A Active CN104615718B (en) 2015-02-05 2015-02-05 The Hierarchy Analysis Method of social networks accident

Country Status (1)

Country Link
CN (1) CN104615718B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106469203B (en) * 2016-08-31 2019-07-23 北京联创众升科技有限公司 A kind of screening technique and device of incident data
CN111737555A (en) * 2020-06-18 2020-10-02 苏州朗动网络科技有限公司 Method and device for selecting hot keywords and storage medium
CN112562849B (en) * 2020-12-08 2023-11-17 中国科学技术大学 Clinical automatic diagnosis method and system based on hierarchical structure and co-occurrence structure
CN113536077B (en) * 2021-05-31 2022-06-17 烟台中科网络技术研究所 Mobile APP specific event content detection method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819576A (en) * 2012-07-23 2012-12-12 无锡雅座在线科技发展有限公司 Data mining method and system based on microblog
CN104216954A (en) * 2014-08-20 2014-12-17 北京邮电大学 Prediction device and prediction method for state of emergency topic
CN104281608A (en) * 2013-07-08 2015-01-14 上海锐英软件技术有限公司 Emergency analyzing method based on microblogs

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819576A (en) * 2012-07-23 2012-12-12 无锡雅座在线科技发展有限公司 Data mining method and system based on microblog
CN104281608A (en) * 2013-07-08 2015-01-14 上海锐英软件技术有限公司 Emergency analyzing method based on microblogs
CN104216954A (en) * 2014-08-20 2014-12-17 北京邮电大学 Prediction device and prediction method for state of emergency topic

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Uncovering the overlapping community structure of complex networks in nature and society;Gergely Palla et al;《nature》;20050609;第435卷;第814-818页 *
社交网络中一种基于模块化的社区检测算法;崔泓;《计算机工程》;20140715;第40卷(第7期);第62-68页 *

Also Published As

Publication number Publication date
CN104615718A (en) 2015-05-13

Similar Documents

Publication Publication Date Title
CN104598629B (en) Social networks incident detection method based on streaming graph model
CN104615718B (en) The Hierarchy Analysis Method of social networks accident
O’Callaghan et al. An analysis of interactions within and between extreme right communities in social media
CN104615717A (en) Multi-dimension assessment method for social network emergency
CN106021508A (en) Sudden event emergency information mining method based on social media
Lamba et al. A tempest in a teacup? Analyzing firestorms on Twitter
CN104216954A (en) Prediction device and prediction method for state of emergency topic
CN105740245A (en) Frequent item set mining method
Ma et al. Natural disaster topic extraction in sina microblogging based on graph analysis
CN101166159A (en) A method and system for identifying rubbish information
CN104484343A (en) Topic detection and tracking method for microblog
CN103294818A (en) Multi-information fusion microblog hot topic detection method
CN106055604A (en) Short text topic model mining method based on word network to extend characteristics
CN104166726B (en) A kind of burst keyword detection method towards microblogging text flow
CN102214241A (en) Method for detecting burst topic in user generation text stream based on graph clustering
CN103179198A (en) Topic influence individual digging method based on relational network
CN105138577A (en) Big data based event evolution analysis method
CN103885993A (en) Public opinion monitoring method and device for microblog
CN109753797A (en) For the intensive subgraph detection method and system of streaming figure
CN104598632A (en) Hot event detection method and device
CN106156117A (en) Hidden community core communication circle detection towards particular topic finds method and system
CN110012009A (en) Internet of Things intrusion detection method based on decision tree and self similarity models coupling
CN107705213A (en) A kind of overlapping Combo discovering method of static social networks
Sun et al. Topic shift detection in online discussions using structural context
Li et al. Exploiting statistically significant dependent rules for associative classification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant