CN117974340A - Social media event detection method combining deep learning classification and graph clustering - Google Patents
Social media event detection method combining deep learning classification and graph clustering Download PDFInfo
- Publication number
- CN117974340A CN117974340A CN202410373064.XA CN202410373064A CN117974340A CN 117974340 A CN117974340 A CN 117974340A CN 202410373064 A CN202410373064 A CN 202410373064A CN 117974340 A CN117974340 A CN 117974340A
- Authority
- CN
- China
- Prior art keywords
- message
- social media
- graph
- text
- pairs
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013135 deep learning Methods 0.000 title claims abstract description 50
- 238000001514 detection method Methods 0.000 title claims abstract description 35
- 239000013598 vector Substances 0.000 claims abstract description 43
- 238000010586 diagram Methods 0.000 claims abstract description 42
- 238000000034 method Methods 0.000 claims abstract description 35
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 27
- 238000013145 classification model Methods 0.000 claims abstract description 13
- 239000011159 matrix material Substances 0.000 claims description 21
- 238000010276 construction Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- 230000002159 abnormal effect Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 230000003213 activating effect Effects 0.000 claims 1
- 230000008569 process Effects 0.000 abstract description 3
- 238000011156 evaluation Methods 0.000 description 5
- 238000004590 computer program Methods 0.000 description 4
- 230000001360 synchronised effect Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2323—Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Business, Economics & Management (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Economics (AREA)
- Databases & Information Systems (AREA)
- Discrete Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a social media event detection method combining deep learning classification and graph clustering, which is used for constructing and obtaining a message heterogram based on a message text and extracted characteristic information; constructing a plurality of shared characteristic edges in the message pairs, and constructing a multi-relation message graph; obtaining the similarity of the message pairs by using a deep learning classification model; if the similarity of the message pairs reaches a preset threshold, constructing an edge in the message pairs, and constructing to obtain a message isomorphic diagram; and clustering the message isomorphic graphs as input of a graph clustering algorithm to obtain a social media event detection result. The method avoids the process of representing the social messages as vectors, takes heterogeneous association of the message pairs and message texts as input, judges whether the message pairs belong to the same event by utilizing the deep learning classification model difference, constructs a message isomorphic graph by the prediction result of the message pairs, discovers the social message clusters with close association as social events by utilizing a graph clustering algorithm, and is used for massive social media event detection tasks.
Description
Technical Field
The invention relates to the technical field of natural language processing and text mining, in particular to a social media event detection method combining deep learning classification and graph clustering.
Background
With the development of the internet, social media platforms have changed the lifestyle of people, and become a main information source for people. Social media is significantly more sensitive to the propagation speed of messages and the discovery of new events than traditional media. Therefore, it is important to analyze the social media text information deeply to find the social media event.
Deep learning social media event detection is a current mainstream method, and social media events are found through a distance or density clustering algorithm on the basis of learning social message vector representations by using a deep neural network. However, because social messages are text short and word co-occurrence sparse, it is difficult to represent social messages of the same event as distance similarity vectors using a deep learning model, with vectors of different events being far apart from each other. Event clustering is an important step in social media event detection, and aims to sort huge social media data into related content sets related to the same event, thereby helping users to better understand and track the development of the event. The clustering algorithm adopted by most event detection tasks is mainly based on word characteristics, and external characteristics are introduced in solving the problem that short text clustering has high-dimensional sparsity, however, excessive attention to the word characteristics can cause excessive influence of noise and outliers, and clustering centers cannot be found in streaming data to influence clustering performance. For streaming data in social media, the current common clustering algorithm has high dependency on order and high calculation cost, and is low in efficiency when processing massive social media texts, so that the performance of a model is greatly affected.
Disclosure of Invention
Therefore, the invention provides a social media event detection method combining deep learning classification and graph clustering, which aims to solve the problems that social message vectors of different events are difficult to be separated from each other in a mode that social messages of the same event are represented as distance similar vectors in the existing deep learning social media event detection method, a clustering algorithm is excessively focused on characteristics of words, efficiency is low when massive social media texts are processed, and the like.
In order to achieve the above object, the present invention provides the following technical solutions:
according to a first aspect of an embodiment of the present invention, a social media event detection method combining deep learning classification and graph clustering is provided, the method including:
extracting characteristic information of a message text in a social media data stream, taking the message text and the characteristic information as nodes, connecting the message text with the extracted characteristic information to form edges, and constructing to obtain a message heterogram;
Based on the message heterogram, two message text nodes are randomly selected to construct message pairs, and a plurality of shared characteristic edges are constructed in each pair of message pairs, so that a multi-relation message diagram is constructed;
based on the multi-relation message graph, obtaining the similarity of the message pairs by using a deep learning classification model;
Judging whether the similarity of the message pairs reaches a preset threshold value, if so, constructing an edge between two message text nodes forming the message pairs, and constructing to obtain a message isomorphic diagram;
And clustering the constructed message isomorphic graphs as input of a graph clustering algorithm, wherein a clustering result is used as a detected social media event.
Further, the method further comprises:
and acquiring a social media text data set, dividing the social media text data set into a plurality of data blocks according to the release time, and continuously training the model by simulating the streaming attribute of the social media message.
Further, extracting feature information of a message text in a social media data stream, taking the message text and the feature information as nodes, connecting the message text and the extracted feature information to form edges, and constructing to obtain a message iso-composition, wherein the method specifically comprises the following steps of:
Given single message text Extracting characteristic information of the message from the text, wherein the characteristic information comprises entity information, user information, topic label information and time information, and is defined as;
Defining message heterogramsWhereinA collection of nodes representing the text itself of the social media message and various types of event-related feature information therein,Representing a collection of edges between a message and corresponding features.
Further, based on the message heterogram, two message text nodes are randomly selected to construct message pairs, and a plurality of shared characteristic edges are constructed in each pair of message pairs, so that a multi-relation message diagram is constructed, specifically comprising:
defining a multiple relationship message diagram as ;
Wherein the method comprises the steps ofIs a set of message text nodes,Is the number of message text nodes, each node having a different characteristic representation and having a representation ofA kind of electronic deviceA dimension feature vector representing message node types as,,,AndRepresenting message, user, entity, topic tag and time information, respectively, the set of all node features is expressed asRandom sampling and nodes from other message textConstructing message pairs;
Edges belonging to different sharing characteristic relationships are respectively established when the message text shares different types of characteristic information,Is a message pairThe edges can be associated with a plurality of shared characteristic relationships:
Wherein the method comprises the steps of Is a sub-matrix of the adjacency matrix of the message iso-graph, the rows represent all characteristic information nodes, and the columns represent the belonging relationsIs provided for the message node(s) of the (c) network,Is a transpose of the matrix.
Further, based on the multi-relation message graph, obtaining the similarity of the message pairs by using a deep learning classification model specifically comprises the following steps:
Preprocessing the message text, including word segmentation and word de-segmentation, and then word embedding the message text by a BERT pre-training language model Wired word vectorThe word(s) of the expression,;
For message text vectorsEmbedding to obtain message textIs embedded in vectors of (a)Message textIs embedded in vectors of (a)Splicing the embedded vectors of the two message texts through the coding layer to obtain a coding vector of the message pair:
Wherein, A concatenation operation representing vectors;
the obtained coding vector is processed through a linear transformation layer to obtain a low-dimensional vector And pass throughActivating a function to obtain a message textAndSimilarity of (2)Wherein:
Wherein,,,AndThe parameters in the model are represented as such,Representing an activation function.
Further, judging whether the similarity of the message pair reaches a preset threshold, if so, constructing an edge between two message text nodes forming the message pair to obtain a message isomorphic diagram, wherein the message isomorphic diagram specifically comprises:
the message isomorphic diagram still retains all the common features of the message isomorphic diagram,
Wherein the method comprises the steps ofIs an adjacency matrix of message isomorphism graph, whereinIs the total number of message nodes in the graph,The node type is indicated as such,Is a sub-matrix of an adjacency matrix in a message heterogram, comprisingLine sum of typesA column of the type in which,Is a transpose of the matrix; if a message text nodeAnd message text nodeTo some type of connectionAt the time of the node point,Will be greater than or equal to 1, thenWill be equal to 1;
for message text And message textMessage pairs formed, judging similarity of message pairsWhether or not a preset threshold is reachedWhen the similarity reaches a preset thresholdThen construct an edgeAfter all the message pairs are judged, an initial message isomorphic diagram is formedWhereinIs a collection of message text nodes that are,Is a collection of edges.
Further, clustering the constructed message isomorphic graph as input of a graph clustering algorithm, wherein a clustering result is used as a detected social media event, and specifically comprises the following steps:
And performing community expansion according to transitivity of feature association among the messages, performing multi-round game among message nodes to select a better community, finally realizing a stable state of the community, and taking a clustering result obtained by community division as a detected social media event.
Further, community expansion is performed according to transitivity of feature association among messages, then multiple rounds of game are performed among message nodes to select a better community, and finally stable states of the community are realized, specifically comprising:
the community scale is enlarged, if two edges are arranged among three points to form a half triangle, according to the ternary closure principle, two nodes are also associated under the condition of having a relationship with a common node, and then the three points are considered to have a hidden association relationship, namely, the three points belong to a community;
performing multiple iterations, wherein in each iteration, all nodes make more optimal selection according to the current community division, and the three options are as follows: the current community is not changed; leaving the current community without joining any other communities; leaving the current community and joining another community;
After a plurality of games, the algorithm achieves Nash equilibrium, namely all nodes join the community which is most satisfied, and whether the algorithm achieves Nash equilibrium is judged by setting a threshold value.
According to a second aspect of an embodiment of the present invention, there is provided a social media event detection system combining deep learning classification and graph clustering, the system comprising:
The message heterogram construction module is used for extracting characteristic information of a message text in the social media data stream, taking the message text and the characteristic information as nodes, connecting the message text with the extracted characteristic information to form edges, and constructing to obtain a message heterogram;
the multi-relation message diagram construction module is used for randomly selecting two message text nodes to construct message pairs based on the message iso-diagram, constructing a plurality of shared characteristic edges in each pair of message pairs, and constructing to obtain a multi-relation message diagram;
The deep learning classification module is used for obtaining the similarity of the message pairs by using a deep learning classification model based on the multi-relation message graph;
The message isomorphic diagram construction module is used for judging whether the similarity of the message pairs reaches a preset threshold value, if so, constructing an edge between two message text nodes forming the message pairs, and constructing to obtain a message isomorphic diagram;
and the graph clustering module is used for clustering the constructed message isomorphic graphs as input of a graph clustering algorithm, and the clustering result is used as the detected social media event.
According to a third aspect of an embodiment of the present invention, there is provided an electronic device, the device including: a processor and a memory;
The memory is used for storing one or more program instructions;
the processor is configured to execute one or more program instructions to perform the steps of a social media event detection method that combines deep learning classification with graph clustering as described in any of the above.
According to a fourth aspect of an embodiment of the present invention, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of a social media event detection method combining deep learning classification and graph clustering as described in any one of the above.
The invention provides a social media event detection method combining deep learning classification and graph clustering, which is characterized in that characteristic information extraction is carried out on message texts in a social media data stream, the message texts and the characteristic information are used as nodes, and the message texts and the extracted characteristic information are connected to form edges to construct message heterograms; based on the message heterogram, two message text nodes are randomly selected to construct message pairs, and a plurality of shared characteristic edges are constructed in each pair of message pairs, so that a multi-relation message diagram is constructed; based on the multi-relation message graph, obtaining the similarity of the message pairs by using a deep learning classification model; judging whether the similarity of the message pairs reaches a preset threshold value, if so, constructing an edge between two message text nodes forming the message pairs, and constructing to obtain a message isomorphic diagram; and clustering the constructed message isomorphic graphs as input of a graph clustering algorithm, wherein a clustering result is used as a detected social media event. The method combines the deep learning classification model and the graph clustering algorithm, avoids the process of representing the social messages as vectors, takes heterogeneous association of message pairs and message texts as input, judges whether the message pairs belong to the same event by utilizing the difference of the deep learning classification model, constructs a message isomorphic graph by utilizing the prediction result of the message pairs, and discovers the social message clusters with close association as the social events by utilizing the graph clustering algorithm; the performance index of the invention is superior to the baseline model through verification; the method and the device ensure the calculation difficulty and complexity, and can be used for massive social media event detection tasks.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.
The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the ambit of the technical disclosure.
FIG. 1 is a flowchart of a social media event detection method combining deep learning classification and graph clustering provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of a social media event detection method combining deep learning classification and graph clustering according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a social media event detection system combining deep learning classification and graph clustering according to an embodiment of the present invention.
Detailed Description
Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The first embodiment of the present invention provides a social media event detection method combining deep learning classification and graph clustering, and is described below with reference to fig. 1 and 2.
As shown in fig. 1, in step S101, feature information extraction is performed on a message text in a social media data stream, the message text and the feature information are used as nodes, and the message text and the extracted feature information are connected to form edges, so as to construct a message iso-composition.
In this embodiment, the social media message is divided into data blocks according to the distribution time, continuous training is performed by simulating the streaming attribute of the social media message, and entity information, user information, label information and time information in the message are extracted as features to construct a message heterogram.
Specifically, the following four types of features are extracted from the message to realize the maximum utilization of the social media data, and the extracted features are further processed in a unified manner to construct the message heterogram. Given single message textEntity information, user information, tag information and time information of a message are extracted from a text, and are defined asUsing these features and the text of the social media message itself as nodes, inAnd the extracted characteristic information form an edge to form a message abnormal pattern.
Defining message heterogramsWhereinA collection of nodes representing the text itself of the social media message and various types of event-related feature information therein,Representing a collection of edges between a message and corresponding features.
As shown in fig. 1, in step S102, two message text nodes are randomly selected to construct message pairs based on the message iso-graph, and multiple shared feature edges are constructed in each pair of message pairs, so as to construct a multi-relationship message graph.
In this embodiment, the features in the message heterostructure are mapped to the message pairs to prevent the loss of heterogeneous feature information between different types of event elements, and multiple information feature edges are constructed in each pair of the message pairs to form a multiple relationship message graph for storing rich social media message features.
Specifically, define a multiple relationship message diagram asWhereinIs a set of nodes in a data block,Is the number of nodes, each node having a different characteristic representation and having a representation asA kind of electronic deviceA dimension feature vector representing node types as,,,AndRepresenting message, user, entity, tag and time information, respectively, the set of all node characteristics is expressed asRandomly sampling from other messages and constructing message pairs with the same。
Edges belonging to different feature relationships are established when messages share different types of feature elements,Is a message pairThe edges can be associated with various relations, and four different types of relations are available in total;
Wherein the method comprises the steps of Is a sub-matrix of the adjacency matrix of the message iso-graph, the rows represent all characteristic information nodes, and the columns represent the belonging relationsIs provided for the message node(s) of the (c) network,Is a transpose of the matrix, taking the smaller of the two.
As shown in fig. 1, in step S103, the similarity of the message pairs is obtained using a deep learning classification model based on the multi-relationship message graph.
In this embodiment, the text of the message and the feature are processed by using the BERT pre-training language model to obtain the embedded vector of the message text, the embedded vector of the text is randomly spliced to form a message pair, and the message pair is sent to the coding layer to obtain the coding vector of the message pair. Taking the code vector of the obtained message pair as input, obtaining a low-dimensional vector by a linear transformation layer, and obtaining a low-dimensional vector byThe activation function calculates the similarity of message pairs.
Specifically, before inputting text, firstly preprocessing the text, performing word segmentation and word removal operation, performing word embedding processing on the message text through a BERT pre-training language model, and usingWired word vectorThe word(s) of the expression,. For message text vectorsEmbedding to obtain message textIs embedded in vectors of (a)Message textIs embedded in vectors of (a)The embedded vectors of the two texts are spliced and then the coded vectors of the message pair are obtained through a coding layer:
Wherein, Representing the concatenation operation of the vectors.
The obtained coding vector is passed through linear transformation layer to obtain a low-dimensional vectorAnd pass throughActivating a function to obtain a message textAndSimilarity of (2)Wherein:
Wherein,,,AndThe parameters in the model are represented as such,Representing an activation function.
As shown in fig. 1, in step S104, it is determined whether the similarity of the message pair reaches a preset threshold, and if the similarity of the message pair reaches the preset threshold, an edge is constructed between two message text nodes that form the message pair, so as to construct a message isomorphic diagram.
In this embodiment, in order to enhance the association degree between the messages, multiple rich information features are combined into unique features, and the message isograms are mapped into message isomorphic graphs, so as to implement a graph clustering algorithm. The similarity of the message pairs is used as the only basis for constructing the message isomorphic diagram, the message isomorphic diagram only comprises message nodes, and only one side sharing all common elements exists between the messages.
In particular, the message isomorphic diagram still retains all the common features of the message isomorphic diagram,
Wherein the method comprises the steps ofIs an adjacency matrix of message isomorphism graph, whereinIs the total number of message nodes in the graph,The node type is indicated as such,Is a sub-matrix of an adjacency matrix in a message heterogram, comprisingLine sum of typesA column of the type in which,Is a transpose of the matrix, taking the smaller of the two. If a messageSum messageTo some type of connectionAt the time of the node point,Will be greater than or equal to 1, thenWill be equal to 1.
For messagesSum messageJudging the similarity of the twoWhen the similarity reaches the threshold valueThen construct an edgeAfter all event pairs are judged, an initial diagram is formedWhereinIs a collection of nodes that are configured to be connected,Is a collection of edges.
As shown in fig. 1, in step S105, the constructed message isomorphism map is clustered as input of a map clustering algorithm, and the clustered result is used as a detected social media event.
In the embodiment, the constructed message isomorphic graphs are used as the input of a graph clustering algorithm to be clustered, community expansion is carried out according to the transitivity of characteristic association among the messages, then multiple rounds of game selection is carried out among message nodes to select a better community, the stable state of the community is finally realized, and the community division result is used as a clustering result to realize social media event detection.
The specific process comprises the steps of expanding the community scale, and considering that a hidden association relationship exists among three points, namely the three points belong to a community together according to the ternary closure principle if two edges form a half triangle;
and carrying out multiple iterations, wherein in each iteration, all nodes can make better selection for themselves according to the current community division, and the three following choices are all adopted: the current community is not changed; leaving the current community without joining any other communities; leaving the current community and joining another community;
after a plurality of games, the algorithm achieves Nash equilibrium, namely all nodes join the community which is most satisfied with the nodes, and a threshold value is set for judging the state of the algorithm.
To illustrate the effectiveness of the present invention, the present invention compared to existing methods performed on a large-scale public dataset Event2012, which was published for approximately 29 days, consisted of initial blocks with a first week messageThe other is divided into 21 message blocks by the distribution time. The evaluation index is consistent with the comparison method, normalized Mutual Information (NMI), adjustment Mutual Information (AMI) and Adjustment Rand Index (ARI) are adopted as evaluation clustering result indexes, and the threshold value for ending the clustering algorithm is 0.1%. The experimental results are shown in tables 1 to 3, respectively:
Table 1 incremental evaluation NMI score
Blocks | Word2vec | LDA | WMD | BERT | BiLSTM | PP-GCN | EventX | KPGNN | FinEventd | ours |
M1 | .19±.00 | .11±.00 | .32±.00 | .36±.00 | .24±.00 | .23±.00 | .36±.00 | .39±.00 | .84±.01 | 0.47±.00 ↓.36 |
M2 | .50±.00 | .27±.01 | .71±.00 | .78±.00 | .50±.00 | .57±.02 | .68±.00 | .79±.01 | .84±.00 | 0.92±.01 ↑.08 |
M3 | .39±.00 | .28±.00 | .67±.00 | .75±.00 | .39±.00 | .55±.01 | .63±.00 | .76±.00 | .89±.00 | 0.87±.01 ↓.01 |
M4 | .34±.00 | .25±.00 | .50±.00 | .60±.00 | .40±.00 | .46±.01 | .63±.00 | .67±.00 | .71±.01 | 0.84±.00 ↑.13 |
M5 | .41±.00 | .26±.00 | .61±.00 | .72±.00 | .41±.00 | .48±.01 | .59±.00 | .73±.01 | .83±.00 | 0.84±.00 ↑.01 |
M6 | .53±.00 | .32±.00 | .61±.00 | .78±.00 | .50±.00 | .57±.01 | .70±.00 | .82±.01 | .83±.00 | 0.94±.00 ↑.11 |
M7 | .25±.00 | .18±.01 | .46±.00 | .54±.00 | .33±.00 | .37±.00 | .51±.00 | .55±.01 | .73±.01 | 0.64±.02 ↓.08 |
M8 | .46±.00 | .37±.01 | .67±.00 | .79±.00 | .49±.00 | .55±.02 | .71±.00 | .80±.00 | .87±.02 | 0.90±.00 ↑.03 |
M9 | .35±.00 | .34±.00 | .55±.00 | .70±.00 | .43±.00 | .51±.02 | .67±.00 | .74±.02 | .79±.01 | 0.88±.01 ↑.09 |
M10 | .51±.00 | .44±.01 | .61±.00 | .74±.00 | .50±.00 | .55±.02 | .68±.00 | .80±.01 | .82±.01 | 0.95±.00 ↑.13 |
M11 | .37±.00 | .33±.01 | .50±.00 | .68±.00 | .49±.00 | .50±.01 | .65±.00 | .74±.01 | .75±.00 | 0.90±.00 ↑.15 |
M12 | .30±.00 | .22±.01 | .60±.00 | .59±.00 | .39±.00 | .45±.01 | .61±.00 | .68±.01 | .67±.01 | 0.81±.00 ↑.14 |
M13 | .37±.00 | .27±.00 | .54±.00 | .63±.00 | .46±.00 | .47±.01 | .58±.00 | .69±.01 | .79±.00 | 0.84±.01 ↑.05 |
M14 | .36±.00 | .21±.00 | .66±.00 | .64±.00 | .44±.00 | .44±.01 | .57±.00 | .69±.00 | .82±.00 | 0.79±.01 ↓.03 |
M15 | .27±.00 | .21±.00 | .51±.00 | .54±.00 | .40±.00 | .39±.01 | .49±.00 | .58±.00 | .69±.01 | 0.85±.01 ↑.16 |
M16 | .49±.00 | .35±.01 | .60±.00 | .75±.00 | .53±.00 | .55±.01 | .62±.00 | .79±.01 | .90±.01 | 0.91±.01 ↑.01 |
M17 | .33±.00 | .19±.00 | .55±.00 | .63±.00 | .45±.00 | .48±.00 | .58±.00 | .70±.01 | .83±.00 | 0.84±.00 ↑.01 |
M18 | .29±.00 | .18±.00 | .63±.00 | .57±.00 | .44±.00 | .47±.01 | .59±.00 | .68±.02 | .74±.01 | 0.85±.01 ↑.11 |
M19 | .37±.00 | .29±.01 | .54±.00 | .66±.00 | .44±.00 | .51±.02 | .60±.00 | .73±.01 | .66±.01 | 0.87±.00 ↑.21 |
M20 | .38±.00 | .35±.00 | .58±.00 | .68±.00 | .48±.00 | .51±.01 | .67±.00 | .72±.02 | .80±.00 | 0.85±.02 ↑.05 |
M21 | .31±.00 | .19±.00 | .58±.00 | .59±.00 | .41±.00 | .41±.02 | .53±.00 | .60±.00 | .74±.01 | 0.75±.01 ↑.01 |
AVG | 0.3700 | 0.2671 | 0.5742 | 0.6533 | 0.4345 | 0.4771 | 0.8024 | 0.6876 | 0.7876 | 0.8338 |
Table 2 incremental evaluation of AMI scores
Blocks | Word2vec | LDA | WMD | BERT | BiLSTM | PP-GCN | EventX | KPGNN | FinEventd | ours |
M1 | .08±.00 | .08±.00 | .30±.00 | .34±.00 | .12±.00 | .21±.00 | .06±.00 | .37±.00 | .84±.01 | 0.40±.01 ↓.43 |
M2 | .41±.00 | .20±.01 | .69±.00 | .76±.00 | .41±.00 | .55±.02 | .29±.02 | .78±.01 | .84±.01 | 0.89±.00 ↑.05 |
M3 | .31±.00 | .22±.01 | .63±.00 | .73±.00 | .31±.00 | .52±.01 | .18±.01 | .74±.00 | .89±.01 | 0.82±.00 ↓.06 |
M4 | .24±.00 | .17±.00 | .45±.00 | .55±.00 | .30±.00 | .42±.01 | .19±.01 | .64±.01 | .69±.00 | 0.77±.00 ↑.08 |
M5 | .33±.00 | .21±.00 | .57±.00 | .71±.00 | .33±.00 | .46±.01 | .14±.00 | .71±.01 | .82±.00 | 0.79±.02 ↓.03 |
M6 | .40±.00 | .20±.00 | .57±.00 | .74±.00 | .36±.00 | .52±.02 | .27±.00 | .79±.01 | .82±.02 | 0.90±.00 ↑.08 |
M7 | .13±.00 | .12±.01 | .46±.00 | .50±.00 | .20±.00 | .34±.00 | .13±.00 | .51±.01 | .72±.00 | 0.55±.03 ↓.14 |
M8 | .33±.00 | .24±.01 | .63±.00 | .75±.00 | .35±.00 | .49±.02 | .21±.00 | .76±.01 | .87±.01 | 0.83±.00 ↓.03 |
M9 | .24±.00 | .24±.00 | .46±.00 | .66±.00 | .32±.00 | .46±.02 | .19±.00 | .71±.02 | .78±.01 | 0.81±.01 ↑.03 |
M10 | .39±.00 | .36±.01 | .57±.00 | .70±.00 | .39±.00 | .51±.02 | .24±.00 | .78±.01 | .81±.00 | 0.92±.00 ↑.11 |
M11 | .26±.00 | .25±.01 | .42±.00 | .65±.00 | .37±.00 | .46±.01 | .24±.00 | .71±.01 | .74±.00 | 0.86±.00 ↑.12 |
M12 | .23±.00 | .16±.01 | .58±.00 | .56±.00 | .32±.00 | .42±.01 | .16±.00 | .66±.01 | .67±.02 | 0.72±.01 ↑.05 |
M13 | .23±.00 | .19±.00 | .50±.00 | .59±.00 | .31±.00 | .43±.01 | .16±.00 | .67±.01 | .79±.00 | 0.80±.00 ↑.01 |
M14 | .26±.00 | .15±.00 | .64±.00 | .61±.00 | .34±.00 | .41±.01 | .14±.00 | .65±.00 | .82±.01 | 0.72±.01 ↓.09 |
M15 | .15±.00 | .13±.00 | .47±.00 | .50±.00 | .26±.00 | .35±.01 | .07±.00 | .54±.00 | .67±.01 | 0.80±.00 ↑.13 |
M16 | .36±.00 | .27±.01 | .59±.00 | .72±.00 | .41±.00 | .52±.01 | .19±.00 | .77±.01 | .90±.01 | 0.88±.02 ↓.01 |
M17 | .24±.00 | .13±.00 | .57±.00 | .60±.00 | .35±.00 | .45±.00 | .18±.00 | .68±.01 | .82±.01 | 0.79±.00 ↓.02 |
M18 | .21±.00 | .12±.00 | .60±.00 | .53±.00 | .35±.00 | .45±.01 | .16±.00 | .66±.02 | .74±.00 | 0.81±.00 ↑.07 |
M19 | .28±.00 | .22±.01 | .49±.00 | .63±.00 | .35±.00 | .48±.02 | .16±.00 | .71±.01 | .66±.01 | 0.83±.01 ↑.17 |
M20 | .24±.00 | .23±.00 | .55±.00 | .62±.00 | .34±.00 | .45±.02 | .18±.00 | .68±.02 | .78±.00 | 0.76±.00 ↓.02 |
M21 | .21±.00 | .13±.00 | .52±.00 | .57±.00 | .31±.00 | .38±.02 | .10±.00 | .57±.00 | .64±.01 | 0.69±.01 ↑.05 |
AVG | 0.2633 | 0.1914 | 0.5362 | 0.6200 | 0.3238 | 0.4419 | 0.1733 | 0.6710 | 0.7767 | 0.7781 |
TABLE 3 incremental evaluation of ARI scores
Blocks | Word2vec | LDA | WMD | BERT | BiLSTM | PP-GCN | EventX | KPGNN | FinEventd | ours |
M1 | .01±.00 | .01±.00 | .04±.00 | .03±.00 | .03±.00 | .05±.00 | .01±.00 | .07±.01 | .90±.00 | 0.11±.01 ↓.78 |
M2 | .49±.00 | .08±.00 | .48±.00 | .64±.00 | .49±.00 | .67±.03 | .45±.02 | .76±.02 | .90±.01 | 0.91±.00 ↑.01 |
M3 | .16±.00 | .02±.01 | .28±.00 | .43±.00 | .17±.00 | .47±.01 | .09±.01 | .58±.00 | .89±.01 | 0.78±.01 ↓.10 |
M4 | .07±.00 | .07±.00 | .11±.00 | .19±.00 | .11±.00 | .24±.01 | .07±.01 | .29±.01 | .27±.01 | 0.56±.01 ↑.29 |
M5 | .17±.00 | .06±.00 | .26±.00 | .44±.00 | .19±.00 | .34±.00 | .04±.00 | .47±.03 | .63±.02 | 0.76±.01 ↑.13 |
M6 | .25±.00 | .07±.01 | .16±.00 | .44±.00 | .18±.00 | .55±.03 | .14±.00 | .72±.03 | .74±.00 | 0.90±.00 ↑.16 |
M7 | .02±.00 | .01±.00 | .08±.00 | .07±.00 | .08±.00 | .11±.02 | .02±.00 | .12±.00 | .45±.01 | 0.15±.02 ↓.28 |
M8 | .17±.00 | .03±.00 | .22±.00 | .50±.00 | .08±.00 | .43±.04 | .09±.00 | .60±.01 | .72±.01 | 0.79±.00 ↑.07 |
M9 | .08±.00 | .03±.01 | .12±.00 | .33±.00 | .27±.00 | .31±.02 | .07±.00 | .46±.02 | .68±.00 | 0.63±.02 ↓.03 |
M10 | .23±.00 | .09±.02 | .20±.00 | .44±.00 | .22±.00 | .50±.07 | .13±.00 | .70±.06 | .74±.01 | 0.88±.00 ↑.14 |
M11 | .09±.00 | .03±.01 | .12±.00 | .27±.00 | .17±.00 | .38±.02 | .16±.00 | .49±.03 | .60±.01 | 0.93±.00 ↑.33 |
M12 | .09±.00 | .02±.01 | .27±.00 | .31±.00 | .13±.00 | .34±.03 | .07±.00 | .48±.01 | .26±.00 | 0.60±.01 ↑.14 |
M13 | .06±.00 | .01±.00 | .13±.00 | .14±.00 | .13±.00 | .19±.01 | .04±.00 | .29±.03 | .75±.02 | 0.73±.01 ↓.01 |
M14 | .10±.00 | .02±.00 | .33±.00 | .30±.00 | .16±.00 | .29±.01 | .10±.00 | .42±.02 | .81±.01 | 0.58±.00 ↓.22 |
M15 | .09±.00 | .01±.00 | .16±.00 | .10±.00 | .14±.00 | .15±.00 | .01±.00 | .17±.00 | .46±.00 | 0.86±.00 ↑.40 |
M16 | .10±.00 | .11±.01 | .32±.00 | .41±.00 | .10±.00 | .51±.03 | .08±.00 | .66±.05 | .88±.01 | 0.85±.01 ↓.02 |
M17 | .06±.00 | .02±.00 | .26±.00 | .24±.00 | .17±.00 | .35±.03 | .12±.00 | .43±.05 | .81±.01 | 0.70±.03 ↓.08 |
M18 | .21±.00 | .02±.00 | .35±.00 | .24±.00 | .19±.00 | .39±.03 | .08±.00 | .47±.04 | .52±.01 | 0.76±.01 ↑.24 |
M19 | .28±.00 | .03±.00 | .12±.00 | .32±.00 | .16±.00 | .41±.02 | .07±.00 | .51±.03 | .35±.01 | 0.75±.02 ↑.40 |
M20 | .24±.00 | .02±.01 | .19±.00 | .33±.00 | .20±.00 | .41±.01 | .11±.00 | .51±.04 | .71±.01 | 0.67±.01 ↓.03 |
M21 | .21±.00 | .01±.01 | .19±.00 | .18±.00 | .16±.00 | .20±.03 | .01±.00 | .20±.01 | .48±.00 | 0.51±.00 ↑.03 |
AVG | 0.1514 | 0.0367 | 0.2090 | 0.3024 | 0.1681 | 0.3471 | 0.0933 | 0.4476 | 0.6452 | 0.6861 |
As shown by the experimental results, compared with the baseline model FinEvent, the NMI value of the experimental group model is averagely improved by 4.62%, the AMI value is averagely improved by 0.14%, and the ARI value is averagely improved by 4.09%. According to the method and the device, under the condition of ensuring calculation difficulty and complexity, the graph structure is concerned, and the graph clustering algorithm is adopted to mine the connection relation in the social media information, so that the social media event detection performance can be better improved.
Corresponding to the social media event detection method combining deep learning classification and graph clustering disclosed above, the embodiment of the invention also discloses a social media event detection system combining deep learning classification and graph clustering, as shown in fig. 3, which specifically comprises:
The message heterogram construction module is used for extracting characteristic information of a message text in the social media data stream, taking the message text and the characteristic information as nodes, connecting the message text with the extracted characteristic information to form edges, and constructing to obtain a message heterogram;
the multi-relation message diagram construction module is used for randomly selecting two message text nodes to construct message pairs based on the message iso-diagram, constructing a plurality of shared characteristic edges in each pair of message pairs, and constructing to obtain a multi-relation message diagram;
The deep learning classification module is used for obtaining the similarity of the message pairs by using a deep learning classification model based on the multi-relation message graph;
The message isomorphic diagram construction module is used for judging whether the similarity of the message pairs reaches a preset threshold value, if so, constructing an edge between two message text nodes forming the message pairs, and constructing to obtain a message isomorphic diagram;
and the graph clustering module is used for clustering the constructed message isomorphic graphs as input of a graph clustering algorithm, and the clustering result is used as the detected social media event.
It should be noted that, for the detailed description of the social media event detection system combining the deep learning classification and the graph clustering provided by the embodiment of the present application, reference may be made to the related description of the social media event detection method combining the deep learning classification and the graph clustering provided by the embodiment of the present application, which is not repeated herein.
In addition, the embodiment of the invention also provides electronic equipment, which comprises: a processor and a memory; the memory is used for storing one or more program instructions; the processor is configured to execute one or more program instructions to perform the steps of a social media event detection method that combines deep learning classification with graph clustering as described above.
It should be noted that, for the detailed description of an electronic device provided by the embodiment of the present application, reference may be made to the related description of a social media event detection method combining deep learning classification and graph clustering provided by the embodiment of the present application, which is not repeated here.
In addition, the embodiment of the invention also provides a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and the computer program realizes the steps of the social media event detection method combining deep learning classification and graph clustering when being executed by a processor.
It should be noted that, for the detailed description of a computer readable storage medium provided by the embodiment of the present application, reference may be made to a related description of a social media event detection method combining deep learning classification and graph clustering provided by the embodiment of the present application, which is not repeated here.
In the embodiment of the invention, the processor may be an integrated circuit chip with signal processing capability. The Processor may be a general purpose Processor, a digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field programmable gate array (Field Programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The processor reads the information in the storage medium and, in combination with its hardware, performs the steps of the above method.
The storage medium may be memory, for example, may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.
The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable ROM (ELECTRICALLY EPROM, EEPROM), or a flash Memory.
The volatile memory may be a random access memory (Random Access Memory, RAM for short) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate Synchronous dynamic random access memory (Double DATA RATESDRAM, ddr SDRAM), enhanced Synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCHLINK DRAM, SLDRAM), and direct memory bus random access memory (DirectRambus RAM, DRRAM).
The storage media described in embodiments of the present invention are intended to comprise, without being limited to, these and any other suitable types of memory.
Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in a combination of hardware and software. When the software is applied, the corresponding functions may be stored in a computer-readable medium or transmitted as one or more instructions or code on the computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.
Claims (10)
1. A social media event detection method combining deep learning classification and graph clustering, the method comprising:
extracting characteristic information of a message text in a social media data stream, taking the message text and the characteristic information as nodes, connecting the message text with the extracted characteristic information to form edges, and constructing to obtain a message heterogram;
Based on the message heterogram, two message text nodes are randomly selected to construct message pairs, and a plurality of shared characteristic edges are constructed in each pair of message pairs, so that a multi-relation message diagram is constructed;
based on the multi-relation message graph, obtaining the similarity of the message pairs by using a deep learning classification model;
Judging whether the similarity of the message pairs reaches a preset threshold value, if so, constructing an edge between two message text nodes forming the message pairs, and constructing to obtain a message isomorphic diagram;
And clustering the constructed message isomorphic graphs as input of a graph clustering algorithm, wherein a clustering result is used as a detected social media event.
2. The method for social media event detection combining deep learning classification and graph clustering of claim 1, further comprising:
and acquiring a social media text data set, dividing the social media text data set into a plurality of data blocks according to the release time, and continuously training the model by simulating the streaming attribute of the social media message.
3. The method for detecting social media events by combining deep learning classification and graph clustering according to claim 1, wherein the method is characterized in that feature information extraction is performed on message texts in a social media data stream, the message texts and the feature information are used as nodes, the message texts and the extracted feature information are connected to form edges, and message abnormal patterns are constructed, and specifically comprises the following steps:
Given single message text Extracting characteristic information of the message from the text, wherein the characteristic information comprises entity information, user information, topic label information and time information, and is defined as/>;
Defining message heterogramsWherein/>Node set representing social media message text itself and various types of event related feature information therein,/>Representing a collection of edges between a message and corresponding features.
4. The method for detecting social media events by combining deep learning classification and graph clustering according to claim 3, wherein two message text nodes are randomly selected to construct message pairs based on the message abnormal graph, and a plurality of shared feature edges are constructed in each pair of the message pairs, so as to construct a multi-relation message graph, and the method specifically comprises the following steps:
defining a multiple relationship message diagram as ;
Wherein the method comprises the steps ofIs the message text node set/>,/>Is the number of message text nodes, each node having a different characteristic representation and having a representation expressed as/>/>Dimension feature vector, representing message node type as/>,/>,/>,/>And/>Representing message, user, entity, topic tag and time information, respectively, the set of all node features is expressed asRandom sampling and node/>, from other message textBuild message pair/>;
Edges belonging to different sharing characteristic relationships are respectively established when the message text shares different types of characteristic information,Is a message pair/>The edges can be associated with a plurality of shared characteristic relationships:
wherein/> Is a sub-matrix of the adjacency matrix of the message iso-graph, the rows represent all characteristic information nodes, and the columns represent the information nodes belonging to the relation/>Is/are all message nodes of (1)Is a transpose of the matrix.
5. The method for detecting social media events by combining deep learning classification and graph clustering according to claim 1, wherein the similarity of message pairs is obtained by using a deep learning classification model based on the multi-relation message graph, and the method specifically comprises the following steps:
Preprocessing the message text, including word segmentation and word de-segmentation, and then word embedding the message text by a BERT pre-training language model Wizard vector/>Expressed word,/>;
For message text vectorsEmbedding to obtain message text/>Is embedded vector/>Message text/>Is embedded vector/>Splicing the embedded vectors of the two message texts through the coding layer to obtain a coding vector of the message pair:
Wherein/> A concatenation operation representing vectors;
the obtained coding vector is processed through a linear transformation layer to obtain a low-dimensional vector And pass/>Activating the function results in a message text/>And/>Similarity/>Wherein/>:
,/>Wherein/>,/>,/>And/>Representing parameters in a model,/>Representing an activation function.
6. The method for detecting social media events by combining deep learning classification and graph clustering according to claim 3, wherein the method is characterized by judging whether the similarity of the message pairs reaches a preset threshold, if the similarity of the message pairs reaches the preset threshold, constructing an edge between two message text nodes forming the message pairs, and constructing a message isomorphic graph, and specifically comprises the following steps:
the message isomorphic diagram still retains all the common features of the message isomorphic diagram,
Wherein/>Is an adjacency matrix of message isomorphism graph, wherein/>Is the total number of message nodes in the graph,/>Representing node type,/>Is a sub-matrix of an adjacency matrix in a message heterogram, comprisingLine sum/>, of typeColumn of type,/>Is a transpose of the matrix; if the message text node/>And message text node/>To some types/>At the time of node,/>Will be greater than or equal to 1, then/>Will be equal to 1;
for message text And message text/>Message pairs formed, judging the similarity/>, of the message pairsWhether or not a preset threshold/>When the similarity reaches a preset threshold/>Then construct an edge/>After all the message pairs are judged, an initial message isomorphic diagram/>Wherein/>Is a collection of message text nodes,/>Is a collection of edges.
7. The method for detecting social media events by combining deep learning classification and graph clustering according to claim 1, wherein the constructed message isomorphism graph is used as an input of a graph clustering algorithm for clustering, and a clustering result is used as a detected social media event, and specifically comprises the following steps:
And performing community expansion according to transitivity of feature association among the messages, performing multi-round game among message nodes to select a better community, finally realizing a stable state of the community, and taking a clustering result obtained by community division as a detected social media event.
8. The method for detecting social media events by combining deep learning classification and graph clustering according to claim 7, wherein community expansion is performed according to transitivity of feature association between messages, then multiple rounds of gaming are performed between message nodes to select a better community, and finally a stable state of the community is realized, and the method specifically comprises the following steps:
the community scale is enlarged, if two edges are arranged among three points to form a half triangle, according to the ternary closure principle, two nodes are also associated under the condition of having a relationship with a common node, and then the three points are considered to have a hidden association relationship, namely, the three points belong to a community;
performing multiple iterations, wherein in each iteration, all nodes make more optimal selection according to the current community division, and the three options are as follows: the current community is not changed; leaving the current community without joining any other communities; leaving the current community and joining another community;
After a plurality of games, the algorithm achieves Nash equilibrium, namely all nodes join the community which is most satisfied, and whether the algorithm achieves Nash equilibrium is judged by setting a threshold value.
9. A social media event detection system that combines deep learning classification with graph clustering, the system comprising:
The message heterogram construction module is used for extracting characteristic information of a message text in the social media data stream, taking the message text and the characteristic information as nodes, connecting the message text with the extracted characteristic information to form edges, and constructing to obtain a message heterogram;
the multi-relation message diagram construction module is used for randomly selecting two message text nodes to construct message pairs based on the message iso-diagram, constructing a plurality of shared characteristic edges in each pair of message pairs, and constructing to obtain a multi-relation message diagram;
The deep learning classification module is used for obtaining the similarity of the message pairs by using a deep learning classification model based on the multi-relation message graph;
The message isomorphic diagram construction module is used for judging whether the similarity of the message pairs reaches a preset threshold value, if so, constructing an edge between two message text nodes forming the message pairs, and constructing to obtain a message isomorphic diagram;
and the graph clustering module is used for clustering the constructed message isomorphic graphs as input of a graph clustering algorithm, and the clustering result is used as the detected social media event.
10. An electronic device, the device comprising: a processor and a memory;
The memory is used for storing one or more program instructions;
the processor configured to execute one or more program instructions to perform the steps of a social media event detection method of combining deep learning classification with graph clustering as claimed in any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410373064.XA CN117974340B (en) | 2024-03-29 | 2024-03-29 | Social media event detection method combining deep learning classification and graph clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410373064.XA CN117974340B (en) | 2024-03-29 | 2024-03-29 | Social media event detection method combining deep learning classification and graph clustering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117974340A true CN117974340A (en) | 2024-05-03 |
CN117974340B CN117974340B (en) | 2024-06-18 |
Family
ID=90859854
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410373064.XA Active CN117974340B (en) | 2024-03-29 | 2024-03-29 | Social media event detection method combining deep learning classification and graph clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117974340B (en) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105260356A (en) * | 2015-10-10 | 2016-01-20 | 西安交通大学 | Chinese interactive text emotion and topic identification method based on multitask learning |
CN105956197A (en) * | 2016-06-15 | 2016-09-21 | 杭州量知数据科技有限公司 | Social media graph representation model-based social risk event extraction method |
US20160343027A1 (en) * | 2015-05-22 | 2016-11-24 | Facebook, Inc. | Clustering users of a social networking system based on user interactions with content items associated with a topic |
CN106383877A (en) * | 2016-09-12 | 2017-02-08 | 电子科技大学 | On-line short text clustering and topic detection method of social media |
CN108959323A (en) * | 2017-05-25 | 2018-12-07 | 腾讯科技(深圳)有限公司 | Video classification methods and device |
CN110457711A (en) * | 2019-08-20 | 2019-11-15 | 电子科技大学 | A kind of social media event topic recognition methods based on descriptor |
CN113158194A (en) * | 2021-03-30 | 2021-07-23 | 西北大学 | Vulnerability model construction method and detection method based on multi-relation graph network |
CN114093422A (en) * | 2021-11-23 | 2022-02-25 | 湖南大学 | MiRNA (micro ribonucleic acid) and gene interaction prediction method and system based on multi-relation graph convolution network |
CN115859793A (en) * | 2022-11-21 | 2023-03-28 | 河北工业大学 | Attention-based method and system for detecting abnormal behaviors of heterogeneous information network users |
CN115936104A (en) * | 2022-09-16 | 2023-04-07 | 中国银联股份有限公司 | Method and apparatus for training machine learning models |
CN115952362A (en) * | 2023-01-03 | 2023-04-11 | 西北工业大学 | Self-evolution false message detection method for social media |
CN116468586A (en) * | 2023-04-20 | 2023-07-21 | 福建商学院 | Intelligent wholesale method and system for appeal events in social media |
CN116681176A (en) * | 2023-06-12 | 2023-09-01 | 济南大学 | Traffic flow prediction method based on clustering and heterogeneous graph neural network |
CN117670571A (en) * | 2024-01-30 | 2024-03-08 | 昆明理工大学 | Incremental social media event detection method based on heterogeneous message graph relation embedding |
-
2024
- 2024-03-29 CN CN202410373064.XA patent/CN117974340B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160343027A1 (en) * | 2015-05-22 | 2016-11-24 | Facebook, Inc. | Clustering users of a social networking system based on user interactions with content items associated with a topic |
CN105260356A (en) * | 2015-10-10 | 2016-01-20 | 西安交通大学 | Chinese interactive text emotion and topic identification method based on multitask learning |
CN105956197A (en) * | 2016-06-15 | 2016-09-21 | 杭州量知数据科技有限公司 | Social media graph representation model-based social risk event extraction method |
CN106383877A (en) * | 2016-09-12 | 2017-02-08 | 电子科技大学 | On-line short text clustering and topic detection method of social media |
CN108959323A (en) * | 2017-05-25 | 2018-12-07 | 腾讯科技(深圳)有限公司 | Video classification methods and device |
CN110457711A (en) * | 2019-08-20 | 2019-11-15 | 电子科技大学 | A kind of social media event topic recognition methods based on descriptor |
CN113158194A (en) * | 2021-03-30 | 2021-07-23 | 西北大学 | Vulnerability model construction method and detection method based on multi-relation graph network |
CN114093422A (en) * | 2021-11-23 | 2022-02-25 | 湖南大学 | MiRNA (micro ribonucleic acid) and gene interaction prediction method and system based on multi-relation graph convolution network |
CN115936104A (en) * | 2022-09-16 | 2023-04-07 | 中国银联股份有限公司 | Method and apparatus for training machine learning models |
CN115859793A (en) * | 2022-11-21 | 2023-03-28 | 河北工业大学 | Attention-based method and system for detecting abnormal behaviors of heterogeneous information network users |
CN115952362A (en) * | 2023-01-03 | 2023-04-11 | 西北工业大学 | Self-evolution false message detection method for social media |
CN116468586A (en) * | 2023-04-20 | 2023-07-21 | 福建商学院 | Intelligent wholesale method and system for appeal events in social media |
CN116681176A (en) * | 2023-06-12 | 2023-09-01 | 济南大学 | Traffic flow prediction method based on clustering and heterogeneous graph neural network |
CN117670571A (en) * | 2024-01-30 | 2024-03-08 | 昆明理工大学 | Incremental social media event detection method based on heterogeneous message graph relation embedding |
Also Published As
Publication number | Publication date |
---|---|
CN117974340B (en) | 2024-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111701247B (en) | Method and equipment for determining unified account | |
Zhang et al. | Reconstructing heterogeneous networks via compressive sensing and clustering | |
Yang et al. | True and fake information spreading over the Facebook | |
CN117272195A (en) | Block chain abnormal node detection method and system based on graph convolution attention network | |
CN104933143A (en) | Method and device for acquiring recommended object | |
Wu et al. | A maximal ordered ego-clique based approach for prevalent co-location pattern mining | |
CN113268667A (en) | Chinese comment emotion guidance-based sequence recommendation method and system | |
CN115599541A (en) | Sorting device and method | |
Jadbabaie et al. | Inference in opinion dynamics under social pressure | |
CN115905630A (en) | Graph database query method, device, equipment and storage medium | |
Bandaru et al. | A dimensionally-aware genetic programming architecture for automated innovization | |
CN117974340B (en) | Social media event detection method combining deep learning classification and graph clustering | |
Moghaddam et al. | A general framework for sorting large data sets using independent subarrays of approximately equal length | |
CN117390480A (en) | Information extraction method, device, equipment and storage medium | |
Schweitzer | Problems of unknown complexity: graph isomorphism and Ramsey theoretic numbers | |
CN111079843A (en) | Training method based on RBF neural network | |
CN116646002A (en) | Multi-non-coding RNA and disease association prediction method, device, equipment and medium | |
CN114528810A (en) | Data code generation method and device, electronic equipment and storage medium | |
Kumar et al. | A new Initial Centroid finding Method based on Dissimilarity Tree for K-means Algorithm | |
CN111310066B (en) | Friend recommendation method and system based on topic model and association rule algorithm | |
Wu et al. | Link prediction based on random forest in signed social networks | |
Cho et al. | Finite difference schemes for an axisymmetric nonlinear heat equation with blow-up | |
CN106790620B (en) | Distributed big data processing method | |
CN111488924A (en) | Multivariate time sequence data clustering method | |
CN116842073B (en) | Graph data mining method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |