CN104615718B - The Hierarchy Analysis Method of social networks accident - Google Patents
The Hierarchy Analysis Method of social networks accident Download PDFInfo
- Publication number
- CN104615718B CN104615718B CN201510061738.3A CN201510061738A CN104615718B CN 104615718 B CN104615718 B CN 104615718B CN 201510061738 A CN201510061738 A CN 201510061738A CN 104615718 B CN104615718 B CN 104615718B
- Authority
- CN
- China
- Prior art keywords
- node
- occurrence
- burst
- hot word
- mrow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 14
- 238000001514 detection method Methods 0.000 claims abstract description 55
- 238000012545 processing Methods 0.000 claims abstract description 30
- 238000001914 filtration Methods 0.000 claims abstract description 12
- 239000000203 mixture Substances 0.000 claims abstract description 7
- 238000000034 method Methods 0.000 claims description 22
- 230000000694 effects Effects 0.000 claims description 18
- 230000011218 segmentation Effects 0.000 claims description 9
- 238000010438 heat treatment Methods 0.000 claims description 3
- 238000007689 inspection Methods 0.000 claims description 3
- 230000003179 granulation Effects 0.000 abstract description 3
- 238000005469 granulation Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 11
- 230000002123 temporal effect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 2
- 238000005325 percolation Methods 0.000 description 2
- 244000241257 Cucumis melo Species 0.000 description 1
- 235000015510 Cucumis melo subsp melo Nutrition 0.000 description 1
- 244000097202 Rathbunia alamosensis Species 0.000 description 1
- 235000009776 Rathbunia alamosensis Nutrition 0.000 description 1
- FJJCIZWZNKZHII-UHFFFAOYSA-N [4,6-bis(cyanoamino)-1,3,5-triazin-2-yl]cyanamide Chemical compound N#CNC1=NC(NC#N)=NC(NC#N)=N1 FJJCIZWZNKZHII-UHFFFAOYSA-N 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000031068 symbiosis, encompassing mutualism through parasitism Effects 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of Hierarchy Analysis Method of social networks accident, including:Obtain the hot word co-occurrence graph that happens suddenly;It is determined that bipartite graph corresponding with the hot word co-occurrence graph that happens suddenly, k factions filtration treatment is carried out to bipartite graph, obtains each maximum factions corresponding to corporations of each k factions and corporations of each k factions;By default Measure Indexes, descending arrangement is carried out to the burst hot word node included in each maximum factions corresponding to corporations of each k factions respectively;According to putting in order for the hot word node that happened suddenly in each maximum factions by descending arrangement processing, accident characteristics tree is built;Accident characteristics tree is carried out to set the breadth first traversal that depth is k, determines sub-branch corresponding to k depth branch and each k depth branch;It is determined that each branch is a subevent of the burst hot word node composition accident included in maximum factions corresponding to each k depth branch and its corresponding sub-branch, so as to realize the accident of fine granulation and subevent detection.
Description
Technical field
The invention belongs to big data processing technology field, more particularly, to a kind of step analysis of social networks accident
Method.
Background technology
Social networks plays more and more important role, such as microblogging in the life of people, and the two of largest domestic are big
The enrollment of microblog Sina and Tengxun is already more than 500,000,000.The 33rd China Internet network investigation of development situation of CNNIC is united
By in December, 2013, China's microblog users scale is 2.81 hundred million for meter report, and microblogging utilization rate is 45.5% in netizen.
For accident in other words focus incident, the influence scale and spread speed of microblogging have surmounted common blog
With traditional news media.On May 12nd, 2008, Sichuan Province China Wenchuan earthquake occurrence, Twitter 35 divide 33 at about 14
Second discloses first bar message.Including producing huge shadow in the whole nation from Linwu melon grower's event, school bus gross overload accident, and in April, 2014
Loud " child Hong Kong urine event " is also to be propagated rapidly in social crowd by microblog, and then causes extensive discussions,
Microblogging has become the public opinion platform that can not be despised.
Microblogging can reflect public opinion situation in time, obtain real time information from microblogging in time, judge accident, find phase
Microblogging is closed to be significant.It is varied that the mode of accident is detected from a large amount of microbloggings at present, for example uses and is based on
The method of cluster, the detection for realizing using method of topic model etc. accident.But these methods are typically according to by each
The graph structure that the keyword that is included in microblog data text is formed is detected, due to having in the word of the graph structure a lot
To the word of event detection redundancy, expressive force is inadequate, so that the testing result of accident is bad in microblogging.It is and existing
Event detection outcome be that all words corresponding to each event are set out in the form of keyword set, it is impossible to disclose each
The subevent level included in event, so as to which more fine-grained event analysis can not be realized.
The content of the invention
For above-mentioned problem, the present invention provides a kind of Hierarchy Analysis Method of social networks accident, to
Different subevents corresponding to accurate accident and each accident detected in social networks, so as to realize accident
Fine granulation analysis.
The invention provides a kind of Hierarchy Analysis Method of social networks accident, including:
Obtain the hot word co-occurrence graph that happens suddenly;Wherein, the set of node of the hot word co-occurrence graph of burst includes each burst hot word node
And each co-occurrence word node of cooccurrence relation, the side collection of the hot word co-occurrence graph of burst with each burst hot word node respectively be present
Including the side between each burst hot word node and each burst hot word node respectively corresponding co-occurrence word node;Wherein, institute
State burst hot word co-occurrence graph key words co-occurrence figure happen suddenly after hot word detects and obtain, the key words co-occurrence figure according to
The co-occurrence keyword included in pending data text in the social networks obtains, and the co-occurrence keyword refers to go out simultaneously
Keyword in present same data text;
It is determined that bipartite graph corresponding with the hot word co-occurrence graph of burst, the node set of the bipartite graph is by the burst heat
Burst hot word node in word co-occurrence graph forms, and the side in the line set of the bipartite graph is according in the hot word co-occurrence graph of burst
Side between each burst hot word node determines, and in the line set of the bipartite graph while to have no right;
K factions filtration treatment is carried out to the bipartite graph, corporations of each k factions is obtained and corporations of each k factions is corresponding
Each maximum factions, wherein, the burst hot word node that is included in corporations of each k factions forms an accident, each k factions
Each maximum factions form the one side of accident corresponding to corporations, and k takes the integer more than or equal to 3;
Respectively using the corporations of each k factions in corporations of each k factions as corporations of pending k factions, according to default
Node importance Measure Indexes are warm to the burst included in each maximum factions corresponding to corporations of the pending k factions respectively
Word node carries out descending arrangement, obtains each maximum factions by descending arrangement processing;
According to putting in order for the hot word node that happened suddenly in each maximum factions by descending arrangement processing, structure burst
Affair character tree, wherein, the set membership between the accident characteristics tree interior joint is dashed forward according in each maximum factions
The determination that puts in order of heating word node;
The accident characteristics tree is carried out to set the breadth first traversal that depth is k, determines the accident feature
Depth is set in tree and is no more than corresponding each k depth branch during the k;
It is determined that sub-branch corresponding to each k depth branch, wherein, sub-branch corresponding to each k depth branch includes accepting
Each sub-branch under the leaf node of the k depth branch;
It is determined that in maximum factions corresponding to sub-branch corresponding with each k depth branch and each k depth branch
Comprising burst hot word node form a subevent of accident corresponding to pending k factions corporations.
The Hierarchy Analysis Method of social networks accident provided by the invention, include each pending data text in acquisition
Burst hot word in this and after distinguishing the hot word co-occurrence graph for each co-occurrence word that co-occurrence is connected with each burst hot word, based on by this
Bipartite graph that hot word co-occurrence graph obtains carries out k factions filtration treatment, to obtain i.e. each accident of corporations of each k factions and every
Each maximum factions corresponding to corporations of individual k factions are the variant aspect of each accident.In order to further obtain each burst
Each subevent that event is included, the hot word node importance row that happens suddenly is carried out respectively to each maximum factions of a corporations of k factions
Sequence processing, to build the accident characteristics tree being made up of the burst hot word node in each maximum factions, so as to be based on this feature
Tree obtains branch corresponding to each subevent.By the program, it is not only able to accurately to detect to include in social networks each
Accident, additionally it is possible to each subevent that each accident includes is detected, so as to realize that the degree accident of fine grained is examined
Survey analysis.
Brief description of the drawings
Fig. 1 is the flow chart of the Hierarchy Analysis Method embodiment one of social networks accident of the present invention;
Fig. 2 is the specific implementation flow chart of step 101 in embodiment illustrated in fig. 1 one;
Fig. 3 is the schematic diagram of key words co-occurrence figure;
Fig. 4 is the schematic diagram of the hot word co-occurrence graph of burst;
Fig. 5 is the flow chart of the Hierarchy Analysis Method embodiment two of social networks accident of the present invention.
Embodiment
Fig. 1 is the flow chart of the Hierarchy Analysis Method embodiment one of social networks accident of the present invention, as shown in figure 1,
This method includes:
Step 101, obtain the hot word co-occurrence graph that happens suddenly;
Wherein, the hot word co-occurrence graph G of burstk(t) set of node include it is each burst hot word node and respectively with it is each
There is each co-occurrence word node of cooccurrence relation in burst hot word node, the side collection of the hot word co-occurrence graph of burst includes each burst
Side between hot word node and each burst hot word node respectively corresponding co-occurrence word node;Wherein, the burst hot word co-occurrence
Figure to key words co-occurrence figure happen suddenly after hot word detects and obtained, and the key words co-occurrence figure is according in the social networks
Pending data text in the co-occurrence keyword that includes obtain, the co-occurrence keyword refers to appear in same data simultaneously
Keyword in text.
Social networks in the present embodiment such as can be the social networks such as microblogging, forum, and described in the present embodiment is treated
Processing data text such as can be microblog data text accordingly.What deserves to be explained is primarily directed to text in the present embodiment
The data message of this type is handled, referred to as data text.Microblog data has that the quality of data is low, text is short, term is non-
Formally, the features such as non-event noise text is more.In order to accurately detect the burst included in numerous microblog data texts
Event, i.e., in very short time section by extensive discussions, propagate focus incident, firstly, it is necessary to be determined from each data text
Co-occurrence keyword, wherein, co-occurrence keyword refers to appear in the keyword in same data text simultaneously, i.e., same data
Cooccurrence relation be present in the keyword in text.
In the present embodiment, an event in community network such as microblogging is expressed as one group of keyword set being closely related
Close.But in fact, although description one event data text it is varied, but the keyword of its core have it is consistent
Tendency.For accident, its kernel keyword has bursty nature in usage amount.Key is used in the present embodiment
The cooccurrence relation of word and keyword, model the incidence relation between keyword and keyword.
Specifically, key words co-occurrence figure is built first, and in simple terms, the key words co-occurrence figure is represented with G (t), therein
Node set includes the keyword in each data text, i.e., by keyword node configuration node set, each keyword node it
Between cooccurrence relation as the side in line set.Specific building process will be described in detail in the embodiment below, herein not
Elaborate.
Existing each accident during in order to detect the current detection moment in social networks, current embodiment require that above-mentioned
On the basis of key words co-occurrence figure G (t), the heat of the burst in key words co-occurrence figure G (t) is detected by happening suddenly hot word detection
Word, so as to obtain that there is the co-occurrence word of co-occurrence annexation as node by each burst hot word and respectively with each burst hot word
Unexpected hot word co-occurrence graph Gk(t).Unexpected hot word co-occurrence graph Gk(t) contained in most significant to incident detection each
Connection side between node and node, it will be described in detail for the process of burst hot word detection in the embodiment below.
Step 102, bipartite graph corresponding with the hot word co-occurrence graph of burst is determined, the node set of the bipartite graph is by institute
The burst hot word node stated in the hot word co-occurrence graph that happens suddenly forms, and the side in the line set of the bipartite graph is according to the burst hot word
In co-occurrence figure respectively burst hot word node between side determine, and in the line set of the bipartite graph while to have no right.
It is easy to detect for subsequent burst event and subevent, following processing is done to the hot word co-occurrence graph that happens suddenly in the present embodiment
Obtain corresponding bipartite graph (binary graph):Remove the hot word co-occurrence graph G that happens suddenlyk(t) each co-occurrence word node in, only retains
Happen suddenly hot word node, also, the side for being also associated the co-occurrence word node of removal while co-occurrence word node is removed removes, only
Retain the connection side between burst hot word node, moreover, respectively the connection between burst hot word node only remains each while to have no right
Side attachment structure between burst hot word node, below will be with Gt(t)=(Vt(t),Et(t)) bipartite graph is represented.
Step 103, k factions filtration treatment is carried out to the bipartite graph, obtain corporations of each k factions and each k
Each maximum factions corresponding to corporations of factions, wherein, the burst hot word node included in corporations of each k factions forms a burst thing
Part, each maximum factions corresponding to corporations of each k factions form the one side of accident, and k takes whole more than or equal to 3
Number.
It is more in the prior art that the detection of accident is carried out using the method based on figure cluster, one in this kind of detection method
As can have the defects of following:Incomplete semanteme, i.e., same keyword can not be appeared in different accidents;It is accidental
Symbiosis, the i.e. keyword in different contexts may be assigned in same event;Lack hierarchical relationship, in nature
Under state, an event can include many aspects, and so as to have multiple subevents, and existing way can not detect outgoing event
Hierarchical structure.Therefore, following processing mode is present embodiments provided to carry out incident detection and subevent detection.
According to the definition for the hot word co-occurrence graph of happening suddenly, two nodes in figure, if they are not in same data text
Middle appearance, then side connection is not had between them.That is, keyword node is formed most in its 1-hop is a hop neighbor
The event of fine granulation, then the maximum factions in figure (maximal clique) directly define the one side of event.In order to
Formation event and the full appreciation for obtaining event, for bipartite graph Gt(t) k factions filtering (k-clique-, is run
Percolation) processing method, to find overlapped affair clustering.(k-clique-percolation) filters in k factions
Method is exactly to find all maximum factions in figure first in simple terms, then extracts the connected component in factions' figure
(connected component), so as to find corporations of k factions (k-clique-community).One corporation of k factions be by
The corporations of adjacent k factions composition, the neighbouring relations mean to share k-1 node between two adjacent k factions.Based on this,
The definition of event will have the property that:Community local definitions, do not influenceed by the change of figure other parts;It can allow
It is overlapping between community, anticipated and different contexts so as to understand a word more;By adjusting k, it is possible to reduce accidental common
Raw situation;The hierarchical relationship of the different aspect of event is included in community maximal clique.
In the present embodiment, k value is 3, now, any to be abandoned comprising event of the keyword number of nodes less than 3.
Increase k value can abandon the aspect of more events or event, but can have stronger semantic consistency, therefore reality should
With the middle value that need to be integrated compromise and consider k.In k=3, each burst included in figure is finally given by running the above method
Event, each accident are made up of one group of burst hot word node, a now accident being made up of burst hot word node
Figure G can be described ast(t) 3-clique-community on, the various aspects of accident are by each maximal clique
Represent.
Step 104, respectively using the corporations of each k factions in corporations of each k factions as corporations of pending k factions,
According to default node importance Measure Indexes, respectively to being included in each maximum factions corresponding to corporations of the pending k factions
Burst hot word node carry out descending arrangement, obtain by the descending arrangement processing each maximum factions.
Step 105, putting in order according to the hot word node that happened suddenly in each maximum factions by descending arrangement processing,
Accident characteristics tree is built, wherein, the set membership between the accident characteristics tree interior joint is according to each maximum
Happen suddenly the determination that puts in order of hot word node in factions;
Step 106, the accident characteristics tree is carried out to set the breadth first traversal that depth is k, determine the burst
Depth is set in affair character tree and is no more than corresponding each k depth branch during the k;
Step 107, sub-branch corresponding to each k depth branch is determined, wherein, sub-branch corresponding to each k depth branch
Including each sub-branch being undertaken under the leaf node of the k depth branch;
Step 108, determine corresponding to sub-branch corresponding with each k depth branch and each k depth branch most
Burst hot word node included in big factions forms a sub- thing of accident corresponding to corporations of the pending k factions
Part.
After the above-mentioned corporations of each accident Ji Ge k factions obtained in figure, in fact, in same burst thing
Part is the maximal cliques in same community, can there is mutual non-conterminous situation, this non-conterminous situation
It often imply that different subevents or be probably different events.In order to further according to the not Tongfang of each accident
Face determines each subevent that each accident includes, also by building accident characteristics tree in the present embodiment, and to burst
Affair character tree carries out the mode of the breadth first traversal of k depth to determine each subevent that each accident includes, so as to
Know the hierarchical structure of each accident.
Specifically, for each maximum factions of some corporation of k factions (k-clique-community), according to node
Importance measures index such as Activity On the Node frequency etc. carries out descending row to the burst hot word node included in each maximum factions
Row.Wherein, because k factions filter method is directed to bipartite graph Gt(t), Gt(t) in while to have no right, therefore, node is lived
Dynamic frequency can be determined simply with the number on the side associated by each burst hot node.Carried out in each maximum factions
After the sequence processing of above-mentioned burst hot word node, the accident characteristics tree corresponding to the accident is built.Specifically,
According to the ranking results of each maximum factions, the burst hot word node of foremost will be come as root node, according to put in order according to
The secondary child node expanded downwards from level to level.In the process, for example, certain two maximum factions the first two burst hot word node
Identical, the 3rd burst hot word node is different, then the two maximum factions share the first two burst hot word node be first layer and
Second layer depth is identical, and third layer has bifurcated, i.e., separates Liang Ge branches under second burst hot word node, correspond to this respectively
The 3rd burst hot word node in Liang Ge factions.The like obtain accident characteristics tree corresponding to certain accident.
After obtaining accident characteristics tree, the accident characteristics tree is carried out to set the breadth first traversal that depth is k,
So that it is determined that depth is set in accident characteristics tree is no more than corresponding each k depth branch during k.In simple terms, it is exactly only to prominent
The successively traversal inspection that affair character tree carries out k depth is sent out, the accident characteristics tree is possessed each in maximum k depth
Branch corresponds to a k depth branch respectively.And then sub-branch corresponding to each k depth branch is determined, wherein, each k depth point
Sub-branch corresponding to branch includes each sub-branch being undertaken under the leaf node of the k depth branch.In simple terms, entirely set
Depth may be far longer than k, and after the k depth branch that depth capacity is k is obtained, each k depth branch may also include more
The sub-branch of individual carrying in its lower section.In the present embodiment, it is determined that each k depth branch and its corresponding each sub-branch are carried on
Each sub-branch below is multiple-branching construction corresponding to a subevent, and then is determined each corresponding to each such multiple-limb
Maximum factions, the multiple-limb may correspond to multiple maximum factions, so as to be wrapped in maximum factions corresponding to each multiple-limb
A subevent corresponding to each burst hot word node composition contained.
In the present embodiment, obtain include burst hot word in each pending data text and with each burst hot word point
After the hot word co-occurrence graph of each co-occurrence word of other co-occurrence connection, k factions are carried out based on the bipartite graph obtained by the hot word co-occurrence graph
Filtration treatment, it is i.e. every to obtain each maximum factions corresponding to i.e. each accident of corporations of each k factions and corporations of each k factions
The variant aspect of individual accident.In order to further obtain each subevent that each accident is included, to a k factions
Each maximum factions of corporations carry out the hot word node importance sequence processing that happens suddenly respectively, to build by the burst in each maximum factions
The accident characteristics tree of hot word node composition, so as to obtain above-mentioned multiple-limb knot corresponding to each subevent based on this feature tree
Structure.By the program, it is not only able to accurately detect each accident included in social networks, additionally it is possible to detect each
Each subevent that accident includes, so as to realize that the degree incident detection of fine grained is analyzed.
Fig. 2 is the specific implementation flow chart of step 101 in embodiment illustrated in fig. 1 one, as shown in Fig. 2 step 101 in Fig. 1
It can be realized with as follows 1011-1016.
Step 1011, pending data is obtained, the pending data includes at least one data text;
Step 1012, word segmentation processing is carried out to each data text at least one data text respectively, obtained
By the keyword that is included in each data text as node, the cooccurrence relation between the keyword in each data text is made
For the key words co-occurrence figure on side;
Wherein, the cooccurrence relation refers to the keyword while appeared in same data text, co-occurrence pass be present
Connection side between the keyword of system all be present.
Specifically, word segmentation processing is carried out to each data text included in pending data, such as using existing
NLPIR Chinese word segmentation systems carry out word segmentation processing to each data file, so as to according to word be unit by each data text
Divided, obtain each word included in each data text.Not only included in the word included in each data text
Some have the word of entity implication, such as title, verb, do not have the word of concrete meaning, such as pronoun, conjunction with some also
Deng.Therefore, it is necessary to select those that there is the such as title, verb of entity implication from the result of word segmentation processing in the present embodiment
As keyword.
Specifically, when building key words co-occurrence figure, using the keyword in data text as the node in figure, they it
Between cooccurrence relation as side.When arriving a data text, one is therefrom extracted without weight Undirected graph, and carry
The sequence on side therein is taken, the sequence on these constantly caused sides can be considered as streaming diagram data, so as to change over time
Change, with the processing successively of data text, obtained by the keyword in each data text as node, each data text
In keyword between undirected temporal diagram of the cooccurrence relation as side.Fig. 3 is the schematic diagram of key words co-occurrence figure, as shown in figure 3,
The figure on the left side is the schematic diagram of word segmentation processing result in Fig. 3, and the participle of three data texts is illustrated in word segmentation processing result
Result, the i.e. keyword of data text 1 are A, C, D in the first row, and the keyword of data text 2 is in the second row
A, B, D, the keyword of data text 3 are A, B, C in the third line.The figure on the right is corresponding with the figure on the left side crucial in Fig. 3
Word co-occurrence graph, for example for data file 1, keyword A, C, D are appeared in data text 1 simultaneously, are formed in A, C, D
The annexation of one triangle, similarly data text 2 and data text 3.What deserves to be explained is the such as sum of data text 1
According to text 2, the connection side wherein between A, D occurs, but right figure is only illustrated between this two node in Fig. 3 one haves no right
Side, therefore, Fig. 3 are only the simple signals of side structure.
Step 1013, according to reach the current detection moment when key words co-occurrence figure in each edge each arrive the moment
Decaying weight corresponding to each arrival moment of occurrence number and each edge, determines each edge described in the current detection moment
Side frequency.
In the present embodiment, it is G (t)=(N (t), A (t)) to define key words co-occurrence figure, and it is a undirected temporal diagram, i.e.,
Its node, while and while weight can change over time.Wherein, N (t) is the set of streaming diagram data interior joint, A (T)
It is the sequence on side.It can include and repeat among edge sequence, because may go out in different or identical reception to repetition
Existing side data.Change over time, node and side in the figure G (t) can update, moreover, two there is side to connect
Connection side between the node pair of relation be likely to occur it is multiple, i.e., repeatedly occur different at the time of or identical at the time of it is more
Secondary appearance, and for incident detection, the arrival moment on the side between two nodes pair has different influences, distance
The detection moment nearer arrival moment has more sensitive influence important in other words.
Therefore, in order to detect accident, it is necessary to be assigned to bigger weight to most recent incoming side, otherwise it is to nearest
Emergency situations are by the insensitive of change.In order to describe the characteristic on this tense, the weight on side is controlled to decay using decay factor λ
Speed.In the present embodiment, the decaying weight of each edge in figure G (t) is determined using exponential decay model, this smooth declines
Subtracting can avoid splitting a burst character, so as to cause its detection failure.
First, the definition of the decaying weight on following side is introduced in the present embodiment:
In current detection moment t, moment tsThe weight on the side of arrival isWherein, λ is decay factor, decay
Half-life period be 1/ λ, 0<λ<1.
Secondly, the definition of the decaying weight based on above-mentioned side, the side frequency that following side is also introduced in the present embodiment are determined
Justice:
The side frequency for defining side (i, j) is the weight of in current detection moment t side (i, j).
Due in figure G (t), multiple examples on side (i, j) being contained in edge sequence A (t), i.e., at the similar and different moment
The side (i, j) occurs repeatedly, and the definition of the decaying weight on side is just for some tsCorresponding side (i, j) during the moment
Decaying weight, it is not directed to all arrival moment of side (i, j) when reaching current detection moment t.Therefore, determined according to side frequency
Justice, the decay for any a line (i, j) in figure G (t), it is necessary to according to corresponding to each arrival moment of side (i, j)
Weight and each occurrence number for arriving moment side (i, j) determine the side frequency on side (i, j).
Specifically, it is determined that it is in current detection moment t, the side frequency F (i, j, t) of side (i, j):
Wherein, T (i, j, k) is k-th of arrival moment that side (i, j) arrives, and N (i, j, k) is that side (i, j) arrives at k-th
Carry out in the moment number occurred, side (i, j) is any bar side in the key words co-occurrence figure.
In addition, on the basis of defined above, in actual applications, for figure G (t), change over time, only
When having new side to arrive, the data in figure are that side or node can just update.It is understood that for the node in figure
I and node j, if do not arrived comprising their side, their statistics need not update.Therefore, can be based on
Under inference realize the side frequency on the side (i, j) between simple geo-statistic node i and node j:
Assuming that the time that last side (i, j) is arrived is t ', there is following inference for F (i, j, t):Side if (i, j)
Do not arrive, then have in this period at (t ', t):
F (i, j, t)=F (i, j, t ') × 2-λ(t-t’).
During renewal, all statistics are caused all to update time t till now first by above-mentioned inference,
, can be simply using+1 behaviour as new side data arrive and caused renewal for an angle with time decay
Make renewal to arrive in F (`) value.The computation complexity of the operation and the quantity on side are linear.For each node i, this is more
New operation can be handled independently and in a distributed manner, as long as node i can receive the data of its own.Renewal operation can be with
Handled using any of continuous flow type data processing platform (DPP), discrete streaming data processing platform (DPP) and off-line data processing platform
Platform is handled, for example either Spark or Spark Streaming are realized by popular processing platform Storm.
Handled more than, obtained the temporal diagram of a cooccurrence relation based on keyword, side frequency weighs keyword
Between incidence relation tightness degree, Activity On the Node frequency weighs the active level of keyword, namely temperature.One keyword
Context is stored among its neighbors set S (i, t) well.To each node i, there are three statistical informations to need to tie up
Shield:(i) the last time L (i) occurred in the side comprising node i;(ii) node in S (i, t);(iii) node i saves to its neighbour
The sequence F (i, j, L (i)) of the side frequency value on side in point set S (i, t).The statistical information is safeguarded in shared space and figure
It is node degree and directly proportional.Key words co-occurrence figure is typically sparse.| S (i, t) | typically it is far smaller than nodes number
Amount, therefore the information maintenance method is close and efficient, particularly under the scene of Stream Processing.
Step 1014, the neighbors set for determining each node in the key words co-occurrence figure, and according to respectively with institute
The side frequency on side between each neighbors in neighbors set is stated, determines the Activity On the Node of each node described in the current detection moment
Frequency.
In the present embodiment, based on side frequency, also define the Activity On the Node frequency of node, i.e., for arbitrary node i,
Assuming that its neighbors collection is combined into S (i, t) during current sensing time t, then the Activity On the Node frequency of node i is defined as and the node
The side frequency sum on all sides connected i.Therefore, for each node in key words co-occurrence figure, determined first from figure every
The neighbors set of individual node, so according to respectively between each neighbors in its neighbors set side side frequency, it is determined that
The Activity On the Node frequency of current detection moment each node.Specifically, it is determined that Activity On the Node of the node i in current detection moment t
Frequency alpha (i, t) is:
Wherein, S (i, t) be node i neighbors set, the Serial No. of S (i, t) interior joint
Step 1015, the Activity On the Node frequency according to corresponding to each node in different detection moments, it is determined that each node
Motion frequency intensity of variation;
Step 1016, determine that the motion frequency intensity of variation of the key words co-occurrence figure interior joint is more than predeterminable level threshold value
Node be burst hot word node, the node of cooccurrence relation be present with the burst hot word node for the burst hot word node
Co-occurrence word node, the co-occurrence word node as corresponding to each burst hot word node and each burst hot word node difference is obtained, and it is each prominent
The hot word co-occurrence graph of burst of the side composition to generate heat between word node and each burst hot word node respectively corresponding co-occurrence word node.
In the present embodiment, in order to finally realize the detection of accident, it is necessary first to examined in key words co-occurrence figure G (t)
Measure burst hot word.Burst hot word detection is to detect that Activity On the Node frequency has the node of prominent change.Activity On the Node frequency
Increased node, its related side can also embody the feature of side frequency burst to rate suddenly.
In the present embodiment, can according to corresponding to each node in different detection moments Activity On the Node frequency, it is determined that each
The motion frequency intensity of variation of node, so as to if the motion frequency intensity of variation of certain node is more than predeterminable level threshold value, be somebody's turn to do
Node is burst hot word node.Preferably, because the motion frequency of node changes in half-life period span significantly, therefore this
The half-life period motion frequency change of following node defined in embodiment:
The half-life period motion frequency of node i, which becomes, to be turned to:HA (i, t, λ)=α (i, t)-α (i, t-1/ λ).
What deserves to be explained is the half-life period motion frequency change of the node i determined according to above formula is a change value sequence,
Corresponding half and half when i.e. half-life period motion frequency change sequence HA (i, t, λ) is followed successively by the different times by current detection moment t
The phase motion frequency changing value that declines forms, i.e. corresponding each changing value when t takes different time points successively.
And then determine that the motion frequency of node i becomes according to the half-life period motion frequency change sequence HA (i, t, λ) of node i
Change degree ZValue is:
Wherein, μA(i, t, λ) be half-life period motion frequency change sequence HA (i, t, λ) average, σA(i, t, λ) is partly to decline
Phase motion frequency change sequence HA (i, t, λ) standard deviation.
So as to which if it is such as 3 that the motion frequency intensity of variation of node i, which is more than predeterminable level threshold value, the node i is prominent
Generate heat word node, the co-occurrence word node of the node of cooccurrence relation for burst hot word node i be present with the burst hot word node i.
That is in figure G (t), for each keyword node in figure, all carry out above-mentioned motion frequency intensity of variation and compare place
Reason, obtain entirely scheming all burst hot word nodes present in G (t), it is each to happen suddenly what hot word node was associated in figure G (t)
Keyword node is the co-occurrence word node that the keyword node for having connection side with it is the burst hot word node.
So as to obtain being corresponded to respectively by each burst hot word node and each burst hot word node from key words co-occurrence figure G (t)
Co-occurrence word node as node set, distinguish corresponding co-occurrence word node with each burst hot word node as each burst hot word node
Between burst hot word co-occurrence graph G of the side as line setk(t).As shown in figure 4, Fig. 4 is the schematic diagram of the hot word co-occurrence graph of burst,
Citing in the Fig. 4 is based on the key words co-occurrence figure in Fig. 3.It is illustrated that node A for burst hot word, node B, C, D in Fig. 4
It is burst hot word node A co-occurrence word node respectively.
By above-mentioned processing, eliminate in key words co-occurrence figure G (t) to the keyword node of incident detection redundancy,
To there is the co-occurrence word inspection of high correlation to the significant burst hot word of incident detection and with burst hot word
Measure and, in order to carry out follow-up incident detection.
Optionally, after step 101 obtains above-mentioned hot word co-occurrence graph, in order to further increase final incident detection
As a result and subevent testing result accuracy, as shown in figure 5, can also comprise the following steps:
Step 201, filtering denoising is carried out to the hot word co-occurrence graph of burst, obtain the burst hot word co-occurrence after denoising
Figure.
Wherein, the filtering denoising includes:
Filter out the side that side frequency in the hot word co-occurrence graph of burst is less than default side frequency threshold value;
Filter out the node that neighbors number in the hot word co-occurrence graph of burst is not more than predetermined number threshold value, the node bag
Include the burst hot word node and co-occurrence word node in the hot word co-occurrence graph of burst.
In order to obtain the stronger incident detections of incidence relation, before the step of carrying out incident detection,
Noise filtering mode can also be used in each detection cycle to the hot word co-occurrence graph G that happens suddenlyk(t) denoising is carried out.It is specific next
Say, for the hot word co-occurrence graph G that happens suddenlyk(t) each burst hot word node in, its maintenance have its burst severity information i.e. ZValue
Value, current sensing time t, the side frequency value on co-occurrence word node set and the side between each co-occurrence word node.Therefore, base
In these information, on the one hand, filter out the hot word co-occurrence graph G of burstk(t) side frequency is less than the side of default side frequency threshold value in.I.e.
For each burst hot word node, pass through the side frequency on its side between each co-occurrence word node in corresponding co-occurrence word node set
The value of rate removes the side less than default side frequency threshold value compared with default side frequency threshold value.On the other hand, burst is filtered out
Hot word co-occurrence graph Gk(t) neighbors number is not more than the node of predetermined number threshold value such as 1 in, and the node includes the burst
Hot word co-occurrence graph Gk(t) burst hot word node and co-occurrence word node in.Deleted burst hot word node is got along well the 3rd section
Point is common to be occurred, it is believed that it is without tight type semantically, so as to represent an event;Deleted co-occurrence word
Node then represents application method of the burst hot word node associated with it in other linguistic context contexts.Under normal circumstances, go
G can be removed by making an uproark(t) at least half of node in.By denoising, happen suddenly hot word co-occurrence graph Gk(t) it is changed into an oriented cum rights
Scheme Ge(t)=(Ve(t),Ee(t)), node set Ve (t) contains all nodes after denoising, in line set Ee (t)
While pointing to its corresponding co-occurrence word node by burst hot word node, the weight of each edge is corresponding side frequency value.
So as to determine bipartite graph G corresponding with the hot word co-occurrence graph of burst in step 102e(t), it is changed into accordingly:Really
Fixed bipartite graph corresponding with the hot word co-occurrence graph of burst after the denoising, the hot word co-occurrence graph pair of burst with after the denoising
The node set for the bipartite graph answered is made up of the burst hot word node in the hot word co-occurrence graph of burst after the denoising, in line set
In the hot word co-occurrence graph of burst after according to the denoising between each burst hot word while determine.
In the present embodiment, to happening suddenly, hot word co-occurrence graph carries out above-mentioned denoising, has effectively filtered redundant node therein
With redundancy side, so as to be advantageous to improve incident detections accuracy.
In the present embodiment, after word segmentation processing is carried out to pending multiple microblog data texts, each microblogging number is obtained
According to the co-occurrence keyword of text, all there is annexation between each co-occurrence keyword, it is crucial by the co-occurrence to each data text
Word is merged, and is obtained by the keyword in each data text as node, being total between the keyword in each data text
Now key words co-occurrence figure of the relation as side.The key words co-occurrence figure is a undirected temporal diagram, is a streaming figure.To the pass
Each edge in key word co-occurrence graph carries out the determination of decaying weight according to the time that the arrives nearlyer principle with higher weight, by
It is more sensitive for nearest accident in most recent incoming side, so as to ensure more promptly and accurately to detect the thing that happens suddenly
Part.Also, the intensity of variation of the Activity On the Node frequency of each keyword node in key words co-occurrence figure determines key
Burst hot word node in word co-occurrence graph, obtain the hot word being made up of the corresponding each co-occurrence word node of each burst hot word node
Co-occurrence figure, eliminate in key words co-occurrence figure for some keyword nodes of incident detection redundancy so that according to the hot word
It is more accurate to obtain the testing result of accident corresponding to each cluster result that co-occurrence figure carries out figure clustering processing.
Further, after step 107 obtains each subevent of each accident, as shown in figure 5, can also wrap
Include following steps:
Step 202, according to the hot word co-occurrence graph of burst after the denoising, it is determined that with corporations of the pending k factions
Extension co-occurrence word node corresponding to each subevent, and the extension co-occurrence word node of determination is added to corresponding subevent
In be expanded subevent, wherein, extension co-occurrence word node corresponding to each subevent and each dashing forward in corresponding subevent
Heating word node has cooccurrence relation.
Specifically, the detection of the subevent included in above-mentioned accident and accident is based on bipartite graph
Gt(t), wherein having abandoned hot word co-occurrence graph Gk(t) the figure G, or perhaps after denoisinge(t) the co-occurrence word node in,
And co-occurrence word node can also provide event very important information, user can be helped to understand event result.Therefore, the present embodiment
The middle extended operation carried out to subevent, i.e., { Ve(t)-Vt(t) the co-occurrence word node in } is added in subevent.Specifically
Ground, according to the hot word co-occurrence graph G of burst after denoisinge(t) pass between the corresponding co-occurrence word node of each burst hot word node
Connection relation, it is determined that extension co-occurrence word node corresponding to each subevent, wherein, extension co-occurrence word node is corresponding to certain subevent
In Ge(t) in, there is the co-occurrence word node of cooccurrence relation with each burst hot word node in the subevent.So as to by determination
Extension co-occurrence word node is added in corresponding subevent with the subevent that is expanded.
In the present embodiment, by above-mentioned subevent expansion process, more complete and comprehensive subevent description has been obtained.
Finally it should be noted that:Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations;To the greatest extent
The present invention is described in detail with reference to foregoing embodiments for pipe, it will be understood by those within the art that:Its according to
The technical scheme described in foregoing embodiments can so be modified, either which part or all technical characteristic are entered
Row equivalent substitution;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology
The scope of scheme.
Claims (8)
- A kind of 1. Hierarchy Analysis Method of social networks accident, it is characterised in that including:Obtain the hot word co-occurrence graph that happens suddenly;Wherein, the set of node of the hot word co-occurrence graph of the burst include each burst hot word node and Each co-occurrence word node of cooccurrence relation with each burst hot word node respectively be present, the side collection of the hot word co-occurrence graph of burst includes Side between each burst hot word node and each burst hot word node respectively corresponding co-occurrence word node;Wherein, it is described prominent Heating word co-occurrence graph to key words co-occurrence figure happen suddenly after hot word detects and obtained, and the key words co-occurrence figure is according to The co-occurrence keyword included in pending data text in social networks obtains, and the co-occurrence keyword refers to appear in simultaneously Keyword in same data text;It is determined that bipartite graph corresponding with the hot word co-occurrence graph of burst, the node set of the bipartite graph is total to by the burst hot word Now the burst hot word node in figure forms, and the side in the line set of the bipartite graph is according to each prominent in the hot word co-occurrence graph of burst Generate heat word node between side determine, and in the line set of the bipartite graph while to have no right;K factions filtration treatment is carried out to the bipartite graph, is obtained each corresponding to corporations of each k factions and corporations of each k factions Maximum factions, wherein, the burst hot word node included in corporations of each k factions forms an accident, corporations of each k factions Corresponding each maximum factions form the one side of accident, and k takes the integer more than or equal to 3;Respectively using the corporations of each k factions in corporations of each k factions as corporations of pending k factions, according to default node Importance measures index, respectively the burst hot word section to being included in each maximum factions corresponding to corporations of the pending k factions Point carries out descending arrangement, obtains each maximum factions by descending arrangement processing;According to putting in order for the hot word node that happened suddenly in each maximum factions by descending arrangement processing, accident is built Characteristics tree, wherein, the set membership between the accident characteristics tree interior joint is according to the heat that happened suddenly in each maximum factions The determination that puts in order of word node;The accident characteristics tree is carried out to set the breadth first traversal that depth is k, determined in the accident characteristics tree Tree depth is no more than corresponding each k depth branch during the k;It is determined that sub-branch corresponding to each k depth branch, wherein, sub-branch corresponding to each k depth branch includes being undertaken on institute State each sub-branch under the leaf node of k depth branch;It is determined that wrapped in maximum factions corresponding to sub-branch corresponding with each k depth branch and each k depth branch The burst hot word node contained forms a subevent of accident corresponding to corporations of the pending k factions.
- 2. according to the method for claim 1, it is characterised in that the hot word co-occurrence graph of the acquisition burst, including:Pending data is obtained successively, and the pending data includes at least one data text;Word segmentation processing is carried out to each data text at least one data text successively, obtained by each data text In the keyword that includes as node, the key of the cooccurrence relation between the keyword as side in each data text Word co-occurrence graph;According to reach the current detection moment when key words co-occurrence figure in each edge each arrive the moment occurrence number and Decaying weight corresponding to each arrival moment of each edge, determine the side frequency of current detection moment each edge, each edge Side frequency be the current detection moment each edge weight;Determine the neighbors set of each node in the key words co-occurrence figure, and according to respectively with the neighbors set Each neighbors between side side frequency, determine the Activity On the Node frequency of current detection moment each node, each Activity On the Node Frequency is the side frequency sum on all sides being connected with each node;The Activity On the Node frequency according to corresponding to each node in different detection moments, it is determined that the motion frequency change journey of each node Degree;Determine the key words co-occurrence figure interior joint motion frequency intensity of variation be more than predeterminable level threshold value node for burst Hot word node, the node that cooccurrence relation be present with the burst hot word node are the co-occurrence word node of the burst hot word node, Obtain as each burst hot word node and each burst hot word node respectively corresponding to co-occurrence word node, and each burst hot word node with The hot word co-occurrence graph of the burst of side composition between co-occurrence word node corresponding to each burst hot word node difference.
- 3. according to the method for claim 2, it is characterised in that described according to keyword when reaching the current detection moment Each edge each arrives to decay corresponding to the occurrence number at moment and each arrival moment of each edge and weighed in co-occurrence figure Weight, before determining the side frequency of each edge described in the current detection moment, in addition to:Determined according to being defined as below in current detection moment t, decaying weight corresponding to each arrival moment of each edge:In current detection moment t, time tsThe decaying weight on the side of arrival is:Wherein, λ is decay factor, decay Half-life period be 1/ λ, 0<λ<1.
- 4. according to the method for claim 3, it is characterised in that described according to keyword when reaching the current detection moment Each edge each arrives to decay corresponding to the occurrence number at moment and each arrival moment of each edge and weighed in co-occurrence figure Weight, the side frequency of each edge described in the current detection moment is determined, including:It is determined that it is in current detection moment t, the side frequency F (i, j, t) of side (i, j):<mrow> <mi>F</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msubsup> <mi>n</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mi>t</mi> </msubsup> </munderover> <mi>N</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&times;</mo> <msup> <mn>2</mn> <mrow> <mo>-</mo> <mi>&lambda;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mi>T</mi> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </msup> </mrow>Wherein, T (i, j, k) is k-th of arrival moment that side (i, j) arrives, and N (i, j, k) is side (i, j) when arriving for k-th The number occurred in quarter, side (i, j) are any bar side in the key words co-occurrence figure,Untill moment t, to there is side Number at the time of (i, j).
- 5. according to the method for claim 4, it is characterised in that each node determined in the key words co-occurrence figure Neighbors set, and according to respectively between each neighbors in the neighbors set side side frequency, it is determined that current inspection The Activity On the Node frequency of each node described in the moment is surveyed, including:Determine that Activity On the Node frequency alpha (i, t) of the node i in current detection moment t is:<mrow> <mi>&alpha;</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <msubsup> <mi>j</mi> <mn>1</mn> <mi>i</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mi>j</mi> <mrow> <mo>|</mo> <mi>S</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>|</mo> </mrow> <mi>i</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> </munderover> <mi>F</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>m</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow>Wherein, S (i, t) be node i neighbors set, the Serial No. of S (i, t) interior joint
- 6. according to the method for claim 5, it is characterised in that each node of basis is corresponding to different detection moments Activity On the Node frequency, it is determined that the motion frequency intensity of variation of each node, including:The half-life period motion frequency change sequence HA (i, t, λ) of node i is determined according to equation below:HA (i, t, λ)=α (i, t)-α (i, t-1/ λ);Wherein, corresponded to when half-life period motion frequency change sequence HA (i, t, λ) is followed successively by the different times by current detection moment t Each half-life period motion frequency changing value composition;The motion frequency intensity of variation of node i is determined according to the half-life period motion frequency change sequence HA (i, t, λ) of node i ZValue is:<mrow> <mi>Z</mi> <mi>V</mi> <mi>a</mi> <mi>l</mi> <mi>u</mi> <mi>e</mi> <mo>=</mo> <mfrac> <mrow> <mi>H</mi> <mi>A</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> <mo>-</mo> <msup> <mi>&mu;</mi> <mi>A</mi> </msup> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msup> <mi>&sigma;</mi> <mi>A</mi> </msup> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>Wherein, μA(i, t, λ) be half-life period motion frequency change sequence HA (i, t, λ) average, σA(i, t, λ) lives for half-life period Dynamic frequency change sequence HA (i, t, λ) standard deviation.
- 7. the method according to any one of claim 2 to 6, it is characterised in that described to obtain the burst hot word co-occurrence After figure, in addition to:Filtering denoising is carried out to the hot word co-occurrence graph of burst, obtains the hot word co-occurrence graph of the burst after denoising, wherein, it is described Filtering denoising includes:Filter out the side that side frequency in the hot word co-occurrence graph of burst is less than default side frequency threshold value;The node that neighbors number in the hot word co-occurrence graph of burst is not more than predetermined number threshold value is filtered out, the node includes institute State the burst hot word node and co-occurrence word node in the hot word co-occurrence graph that happens suddenly;Accordingly, determination bipartite graph corresponding with the hot word co-occurrence graph of burst, including:It is determined that bipartite graph corresponding with the hot word co-occurrence graph of burst after the denoising, the burst hot word with after the denoising are total to Now the node set of bipartite graph corresponding to figure is made up of the burst hot word node in the hot word co-occurrence graph of burst after the denoising, side In set in the hot word co-occurrence graph of burst after according to the denoising between each burst hot word while determine.
- 8. according to the method for claim 7, it is characterised in that the determination and each k depth branch and each k Burst hot word node corresponding to sub-branch corresponding to depth branch included in maximum factions forms society of the pending k factions After a subevent of accident corresponding to group, in addition to:According to the hot word co-occurrence graph of burst after the denoising, it is determined that with each subevent pair in corporations of the pending k factions The extension co-occurrence word node answered, and the extension co-occurrence word node of determination is added in corresponding subevent the son that is expanded Event, wherein, extension co-occurrence word node has with each burst hot word node in corresponding subevent corresponding to each subevent There is cooccurrence relation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510061738.3A CN104615718B (en) | 2015-02-05 | 2015-02-05 | The Hierarchy Analysis Method of social networks accident |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510061738.3A CN104615718B (en) | 2015-02-05 | 2015-02-05 | The Hierarchy Analysis Method of social networks accident |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104615718A CN104615718A (en) | 2015-05-13 |
CN104615718B true CN104615718B (en) | 2017-12-15 |
Family
ID=53150160
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510061738.3A Active CN104615718B (en) | 2015-02-05 | 2015-02-05 | The Hierarchy Analysis Method of social networks accident |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104615718B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106469203B (en) * | 2016-08-31 | 2019-07-23 | 北京联创众升科技有限公司 | A kind of screening technique and device of incident data |
CN111737555A (en) * | 2020-06-18 | 2020-10-02 | 苏州朗动网络科技有限公司 | Method and device for selecting hot keywords and storage medium |
CN112562849B (en) * | 2020-12-08 | 2023-11-17 | 中国科学技术大学 | Clinical automatic diagnosis method and system based on hierarchical structure and co-occurrence structure |
CN113536077B (en) * | 2021-05-31 | 2022-06-17 | 烟台中科网络技术研究所 | Mobile APP specific event content detection method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102819576A (en) * | 2012-07-23 | 2012-12-12 | 无锡雅座在线科技发展有限公司 | Data mining method and system based on microblog |
CN104216954A (en) * | 2014-08-20 | 2014-12-17 | 北京邮电大学 | Prediction device and prediction method for state of emergency topic |
CN104281608A (en) * | 2013-07-08 | 2015-01-14 | 上海锐英软件技术有限公司 | Emergency analyzing method based on microblogs |
-
2015
- 2015-02-05 CN CN201510061738.3A patent/CN104615718B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102819576A (en) * | 2012-07-23 | 2012-12-12 | 无锡雅座在线科技发展有限公司 | Data mining method and system based on microblog |
CN104281608A (en) * | 2013-07-08 | 2015-01-14 | 上海锐英软件技术有限公司 | Emergency analyzing method based on microblogs |
CN104216954A (en) * | 2014-08-20 | 2014-12-17 | 北京邮电大学 | Prediction device and prediction method for state of emergency topic |
Non-Patent Citations (2)
Title |
---|
Uncovering the overlapping community structure of complex networks in nature and society;Gergely Palla et al;《nature》;20050609;第435卷;第814-818页 * |
社交网络中一种基于模块化的社区检测算法;崔泓;《计算机工程》;20140715;第40卷(第7期);第62-68页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104615718A (en) | 2015-05-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104598629B (en) | Social networks incident detection method based on streaming graph model | |
CN104615718B (en) | The Hierarchy Analysis Method of social networks accident | |
O’Callaghan et al. | An analysis of interactions within and between extreme right communities in social media | |
CN104615717A (en) | Multi-dimension assessment method for social network emergency | |
CN106021508A (en) | Sudden event emergency information mining method based on social media | |
Lamba et al. | A tempest in a teacup? Analyzing firestorms on Twitter | |
CN104216954A (en) | Prediction device and prediction method for state of emergency topic | |
CN105740245A (en) | Frequent item set mining method | |
Ma et al. | Natural disaster topic extraction in sina microblogging based on graph analysis | |
CN101166159A (en) | A method and system for identifying rubbish information | |
CN104484343A (en) | Topic detection and tracking method for microblog | |
CN103294818A (en) | Multi-information fusion microblog hot topic detection method | |
CN106055604A (en) | Short text topic model mining method based on word network to extend characteristics | |
CN104166726B (en) | A kind of burst keyword detection method towards microblogging text flow | |
CN102214241A (en) | Method for detecting burst topic in user generation text stream based on graph clustering | |
CN103179198A (en) | Topic influence individual digging method based on relational network | |
CN105138577A (en) | Big data based event evolution analysis method | |
CN103885993A (en) | Public opinion monitoring method and device for microblog | |
CN109753797A (en) | For the intensive subgraph detection method and system of streaming figure | |
CN104598632A (en) | Hot event detection method and device | |
CN106156117A (en) | Hidden community core communication circle detection towards particular topic finds method and system | |
CN110012009A (en) | Internet of Things intrusion detection method based on decision tree and self similarity models coupling | |
CN107705213A (en) | A kind of overlapping Combo discovering method of static social networks | |
Sun et al. | Topic shift detection in online discussions using structural context | |
Li et al. | Exploiting statistically significant dependent rules for associative classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |