CN112241492B - Early identification method for multi-source heterogeneous online network topics - Google Patents

Early identification method for multi-source heterogeneous online network topics Download PDF

Info

Publication number
CN112241492B
CN112241492B CN202011141881.0A CN202011141881A CN112241492B CN 112241492 B CN112241492 B CN 112241492B CN 202011141881 A CN202011141881 A CN 202011141881A CN 112241492 B CN112241492 B CN 112241492B
Authority
CN
China
Prior art keywords
network
community
short text
complex network
complex
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011141881.0A
Other languages
Chinese (zh)
Other versions
CN112241492A (en
Inventor
徐小艳
周帅鹏
张贝贝
吕伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Shiyou University
Original Assignee
Xian Shiyou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Shiyou University filed Critical Xian Shiyou University
Priority to CN202011141881.0A priority Critical patent/CN112241492B/en
Publication of CN112241492A publication Critical patent/CN112241492A/en
Application granted granted Critical
Publication of CN112241492B publication Critical patent/CN112241492B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Computing Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multisource heterogeneous online network topic early identification method, which comprises the following steps: 1) Obtaining short text keyword set D 0 (ii) a 2) Constructing complex networks based on keyword coincidence
Figure DDA0002738520150000011
3) For the complex network constructed in the step 2)
Figure DDA0002738520150000012
Community structure division is carried out by utilizing dynamic community division method, and time interval t 0 ,t end ]Dividing the social network by taking the time progressive increment delta t as an interval, and constructing t through newly added short text information of various different source online social networks crawled in the time progressive increment delta t 0 Complex network at time + Δ t
Figure DDA0002738520150000013
Then t is 0 Complex network at time + Δ t
Figure DDA0002738520150000014
Community division is carried out by utilizing dynamic community division method to realize complex network
Figure DDA0002738520150000015
Dividing the community; 4) Statistical complex networks
Figure DDA0002738520150000016
Finally found topic keyword sets are constructed according to the community division results, and the method can be used for solving the problems of multiple online social networksAnd carrying out early topic discovery and extraction on the short text information data crawled by the platform.

Description

Early identification method for multi-source heterogeneous online network topics
Technical Field
The invention belongs to the research field of online network topic early identification methods, and relates to a multisource heterogeneous online network topic early identification method.
Background
On one hand, with the high-speed and deep development of the internet, particularly the mobile internet, the internet breaks the space-time limitation of the traditional information interaction circulation, subverts the traditional information propagation mode, and changes the role of an internet user in the information propagation and diffusion process from an information consumer to an information diffuser or even an information producer; the phenomenon that information is spread mutually is gradually started to appear and formed between different online social network system main bodies. The production, the transmission and the interaction of information among the multi-source heterogeneous online networks are more and more complex, so that the early discovery of topics is more difficult. And at present, more topic discovery methods are mainly used for researching the discovery and propagation rules of hot topics, and a great research space is provided for the early topic discovery method.
On the other hand, network information sources and propagation channels are increased rapidly, the scale and the influence of network public opinion are getting bigger and bigger, how to determine early topics in a heterogeneous online network is convenient for governments and supervision departments to perform timely and effective supervision and prevention.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an early identification method of multi-source heterogeneous online network topics, which can be used for early discovering and extracting topics from short text information data crawled from a plurality of online social network platforms.
In order to achieve the purpose, the method for early identifying the multi-source heterogeneous online network topics comprises the following steps:
1) Analyzing the characteristics of different online social network structures, designing a distributed parallel crawler engine aiming at the characteristics of the different online social network structures, crawling original short text information disclosed by the online social network by using the distributed parallel crawler engine, and then performing Chinese word segmentation and text characteristic valueThe extraction method carries out text preprocessing on the original short text information disclosed by the online social network to obtain a short text keyword set D 0
2) At an initial time t 0 Using short text keyword sets D 0 Complex network based on keyword coincidence is constructed according to behavior relation between network users represented by online social network text information and users
Figure GDA0004018352940000021
3) For the complex network constructed in the step 2)
Figure GDA0004018352940000022
Community structure division is carried out by utilizing dynamic community division method, and time interval t 0 ,t end ]Dividing the social network by taking the time progressive increment delta t as an interval, and constructing t through newly added short text information of various different source online social networks crawled in the time progressive increment delta t 0 Complex network at time + Δ t>
Figure GDA0004018352940000023
Then t is 0 Complex network at time + Δ t>
Figure GDA0004018352940000024
Community division is carried out by utilizing a dynamic community division method to realize the judgment of a complex network>
Figure GDA0004018352940000025
Dividing the community;
4) Statistical complex networks
Figure GDA0004018352940000026
The total number of the participating users of the short texts represented by all the nodes of each community in the community division result is then judged according to the complex network->
Figure GDA0004018352940000027
Total number of short text participated users represented by all nodes of community in community division resultSorting to obtain the top N communities;
5) And 4) counting keyword sets corresponding to the short texts in the first N communities obtained in the step 4), sequencing TF-IDF in the counted keyword sets, and constructing a finally found topic keyword set by using the first N keywords in the sequencing result.
In the step 1), original short text information disclosed by the crawled online social network comprises news titles of news websites and microblogs of microblog platforms, and a short text keyword set is constructed according to the crawled original short text information by a method of Chinese word segmentation and text characteristic value extraction TF-IDF.
Short text as a complex network
Figure GDA0004018352940000031
The edges between the nodes represent the association relation between the short texts.
Complex network
Figure GDA0004018352940000032
Where i, j denotes the time t 0 Previously crawled microblog information and news headlines, C i A set of keywords representing short text i; n is a radical of ij Representing a short text keyword set C i And C j Is determined by the number of coincidences of the keyword(s), is greater than or equal to>
Figure GDA0004018352940000033
V i Network node represented by short text message i, E ij For the association between short text i and short text j, N ij =0 denotes no continuous edge between short texts i and j, N ij 0 indicates that there is an edge between the short texts i and j, and edge E ij Is weighted by N ij
Step 3) adopting a static community discovery method to the complex network
Figure GDA0004018352940000034
And carrying out community division.
Adding newly-added short text and connection information in time incremental increment delta t into complexNetwork
Figure GDA0004018352940000035
In order to form a new complex network &>
Figure GDA0004018352940000036
Adding the new short text and the connection information in the time increment delta t according to the complex network
Figure GDA0004018352940000037
Relationships in middle communities fall into two categories, where the first category is based on and/or associated with a complex network>
Figure GDA0004018352940000038
Newly added text node set with medium relationship close to each other>
Figure GDA0004018352940000039
The second type is associated with a complex network->
Figure GDA00040183529400000310
Newly added text node set with loose middle community relation>
Figure GDA00040183529400000311
Determining a newly added text node set based on the modularity gain index delta Q>
Figure GDA00040183529400000312
And complex network>
Figure GDA00040183529400000313
The membership of the middle community, and the newly added text node set is/are judged by using a static community division method>
Figure GDA00040183529400000314
Carrying out community division, determining a newly added community, and realizing the combination of a complex network>
Figure GDA00040183529400000315
Dynamic community partitioning.
The invention has the following beneficial effects:
when the method for early identifying the topics of the multi-source heterogeneous online network is specifically operated, the distributed parallel crawler engine is used for crawling the original short text information disclosed by the online social network, and a short text keyword set D is constructed according to the original short text information 0 Reuse of short text keyword sets D 0 Constructing complex networks based on keyword superposition
Figure GDA0004018352940000041
Then to the complex network->
Figure GDA0004018352940000042
The method comprises the steps of utilizing a dynamic community division method to divide community structures, and constructing t through newly added short text information of various source online social networks obtained by crawling in time incremental increment delta t 0 Complex network at time + Δ t>
Figure GDA0004018352940000043
At the same time for t 0 Complex network at time + Δ t>
Figure GDA0004018352940000044
The community division based on the time-varying dynamic network is realized, and finally, the complex network is utilized>
Figure GDA0004018352940000045
And extracting topic keyword set from the final community division result, and realizing effective and objective discovery of the multi-source online network topic.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flowchart of a first embodiment.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings:
referring to fig. 1, the method for early identifying a multi-source heterogeneous online network topic, provided by the invention, comprises the following steps:
1) Analyzing the characteristics of different online social network structures, designing a distributed parallel crawler engine aiming at the characteristics of the different online social network structures, crawling original short text information disclosed by the online social network by using the distributed parallel crawler engine, and performing text preprocessing on the original short text information disclosed by the online social network by using a Chinese word segmentation and text characteristic value extraction method to obtain a short text keyword set D 0
The method comprises the steps that original short text information disclosed by a crawled online social network comprises news titles of news websites and microblogs of microblog platforms, and a short text keyword set is constructed according to the crawled original short text information through Chinese word segmentation and text characteristic value extraction TF-IDF.
2) At an initial time t 0 Using short text keyword sets D 0 Complex network based on keyword coincidence is constructed according to behavior relation between network users represented by online social network text information and users
Figure GDA0004018352940000051
Wherein the short text is used as a complex network
Figure GDA0004018352940000052
The edges between the nodes represent the association relation between the short texts. Complex network>
Figure GDA0004018352940000053
Where i, j denotes the time t 0 Previously crawled microblog information and news headlines, C i A set of keywords representing short text i; n is a radical of ij Representing short text keyword set C i And C j Is determined by the number of coincidences of the keyword(s), is greater than or equal to>
Figure GDA0004018352940000054
V i Network node represented by short text message i, E ij For the association between short text i and short text j, N ij =0 tableShowing no continuous edge between short texts i and j, N ij > 0 indicates that there is an edge between short texts i and j, and edge E ij Is weighted by N ij
3) For the complex network constructed in the step 2)
Figure GDA0004018352940000055
Community structure division is carried out by utilizing dynamic community division method, and time interval t 0 ,t end ]Dividing the social network by taking the time progressive increment delta t as an interval, and constructing t through newly added short text information of various different source online social networks crawled in the time progressive increment delta t 0 Complex network at time + Δ t>
Figure GDA0004018352940000056
Then t is 0 Complex network at time + Δ t>
Figure GDA0004018352940000057
Community division is carried out by utilizing a dynamic community division method to realize the purpose of combining a complex network>
Figure GDA0004018352940000058
Dividing the community;
wherein, a static community discovery method is adopted for a complex network
Figure GDA0004018352940000059
And carrying out community division.
Adding the newly added short text and connection information in the time incremental delta t to the complex network
Figure GDA00040183529400000510
To form a new complex network->
Figure GDA00040183529400000511
Newly-added short text and connection information in the time progressive increment delta t are based on the complex network->
Figure GDA00040183529400000512
Relationships in middle communities fall into two categories, where the first category is based on and/or associated with a complex network>
Figure GDA00040183529400000513
Newly added text node set with close relation
Figure GDA00040183529400000514
The second category is with complex networks &>
Figure GDA00040183529400000515
Newly added text node set with middle community relation loose->
Figure GDA00040183529400000516
Determining a newly added text node set based on the modularity gain index delta Q>
Figure GDA00040183529400000517
And complex network>
Figure GDA00040183529400000518
The membership of the middle community, and the newly added text node set is/are judged by using a static community division method>
Figure GDA00040183529400000519
Carrying out community division, determining a newly added community and realizing the judgment of a complex network>
Figure GDA0004018352940000061
Dynamic community partitioning.
The specific calculation process of the modularity gain index delta Q is as follows:
newly added text node set
Figure GDA0004018352940000062
Each node i in the network is divided into communities of adjacent nodes j, and the complex network at the moment is calculated>
Figure GDA0004018352940000063
Traversing all nodes i and j, extracting the maximum modularity gain index max delta Q, and outputting the corresponding i max And j max And finally determining a complex network &>
Figure GDA0004018352940000064
The community structure of (1).
4) Statistical complex networks
Figure GDA0004018352940000065
The total number of the participating users of the short texts represented by all the nodes of each community in the community division result is then judged according to the complex network->
Figure GDA0004018352940000066
Sequencing the total number of short text participating users represented by all the nodes of the communities in the community division result to obtain the top N communities;
5) And 4) counting keyword sets corresponding to the short texts in the first N communities obtained in the step 4), sequencing TF-IDF in the counted keyword sets, and constructing a finally found topic keyword set by using the first N keywords in the sequencing result.
Example one
Referring to fig. 2, the specific operation process of this embodiment is:
1) Analyzing the characteristics of different online social network structures, designing a distributed parallel crawler engine aiming at the characteristics of the different online social network structures, crawling original short text information disclosed by the online social network by using the distributed parallel crawler engine, and performing text preprocessing on the original short text information disclosed by the online social network by using a Chinese word segmentation and text characteristic value extraction method to obtain a short text keyword set D 0
The method comprises the steps that original short text information disclosed by the online social network comprises news titles of news websites and microblogs of microblog platforms, and a short text keyword set is constructed according to the original short text information disclosed by the online social network through Chinese word segmentation and text characteristic value extraction TF-IDF.
2) At the beginningMoment t 0 Using short text keyword sets D 0 Complex network based on keyword coincidence is constructed according to behavior relation between network users represented by online social network text information and users
Figure GDA0004018352940000071
Wherein the short text is used as a complex network
Figure GDA0004018352940000072
The edges between the nodes represent the association relation between the short texts. Complex network>
Figure GDA0004018352940000073
Where i, j denotes the time t 0 Previously crawled microblog information and news headlines, C i A set of keywords representing short text i; n is a radical of ij Representing short text keyword set C i And C j Is determined by the number of coincidences of the keyword(s), is greater than or equal to>
Figure GDA0004018352940000074
V i Network node represented by short text message i, E ij For the association between short text i and short text j, N ij =0 denotes no continuous edge between short texts i and j, N ij 0 indicates that there is an edge between the short texts i and j, and edge E ij Is weighted by N ij
3) For the complex network constructed in the step 2)
Figure GDA0004018352940000075
Community structure division is carried out by utilizing dynamic community division method, and time interval t 0 ,t end ]Dividing the social network by taking the time progressive increment delta t as an interval, and constructing t through newly added short text information of various different source online social networks crawled in the time progressive increment delta t 0 Complex network at time + Δ t>
Figure GDA0004018352940000076
Then t is 0 Complex network at time + Δ t>
Figure GDA0004018352940000077
Community division is carried out by utilizing a dynamic community division method to realize the judgment of a complex network>
Figure GDA0004018352940000078
The community division of (2);
wherein, a static community discovery method is adopted for a complex network
Figure GDA0004018352940000079
And carrying out community division.
Adding newly-added short text and connection information in time incremental increment delta t into complex network
Figure GDA00040183529400000710
To form a new complex network->
Figure GDA00040183529400000711
Newly-added short text and connection information in the time progressive increment delta t are based on the complex network->
Figure GDA00040183529400000712
The relationship of the middle community is divided into two categories, wherein the first category is based on the complex network->
Figure GDA00040183529400000713
Newly added text node set with medium relationship close to each other>
Figure GDA00040183529400000714
The second category is with complex networks &>
Figure GDA00040183529400000715
Newly added text node set with loose middle community relation>
Figure GDA00040183529400000716
Determining from the modularity gain index Δ QNewly added text node set>
Figure GDA00040183529400000717
And complex network>
Figure GDA00040183529400000718
The membership of the middle community, and the newly added text node set and the method of dividing the static community are utilized to combine and combine the nodes>
Figure GDA0004018352940000081
Carrying out community division, determining a newly added community and realizing the judgment of a complex network>
Figure GDA0004018352940000082
Dynamic community partitioning.
The specific calculation process of the modularity gain index delta Q is as follows:
newly added text node set
Figure GDA0004018352940000083
Each node i in the network is divided into communities of adjacent nodes j, and the complex network at the moment is calculated>
Figure GDA0004018352940000084
Traversing all nodes i and j, extracting the maximum modularity gain index max delta Q, and outputting corresponding i max And j max And finally determining a complex network &>
Figure GDA0004018352940000085
The community structure of (1).
4) Counting the complex network in step 3)
Figure GDA0004018352940000086
The total number of the participating users of the short texts represented by all the nodes of each community in the final community division result is output to a complex network ^ and ^>
Figure GDA0004018352940000087
Sequencing the first 1 communities according to the total number of short text participating users in the communities in the final community division result; c1:391238
5) Counting keyword sets corresponding to short texts in all communities in the first 1 communities, and taking out keywords in the first 5 ranked TF-IDF in the corresponding keyword sets;
the top 5 keyword set in the C1 community is { boy basket, suo mosaic, iran, chinese team, asia };
6) Taking the first n keywords corresponding to each community as a keyword set of finally discovered topics;
the key word set of the top 5 in the C1 community is { boy basket, sunday, iran, chinese team, asia }, and the formed topic is 'Chinese boy basket Sunday'.

Claims (6)

1. A multi-source heterogeneous online network topic early identification method is characterized by comprising the following steps:
1) Analyzing the characteristics of different online social network structures, designing a distributed parallel crawler engine aiming at the characteristics of the different online social network structures, crawling original short text information disclosed by the online social network by using the distributed parallel crawler engine, and performing text preprocessing on the original short text information disclosed by the online social network by using a Chinese word segmentation and text characteristic value extraction method to obtain a short text keyword set D 0
2) At an initial time t 0 Using short text keyword sets D 0 Complex network based on keyword coincidence is constructed according to behavior relation between network users represented by online social network text information and users
Figure FDA0002738520120000011
3) For the complex network constructed in the step 2)
Figure FDA0002738520120000012
The dynamic community dividing method is utilized to divide the community structure for the time interval t 0 ,t end ]Dividing the social network by taking the time progressive increment delta t as an interval, and constructing t through newly added short text information of various different source online social networks crawled in the time progressive increment delta t 0 Complex network at time + Δ t>
Figure FDA0002738520120000013
Then t is 0 Complex network at time + Δ t>
Figure FDA0002738520120000014
Community division is carried out by utilizing a dynamic community division method to realize the judgment of a complex network>
Figure FDA0002738520120000015
Dividing the community;
4) Statistical complex networks
Figure FDA0002738520120000016
The total number of the participating users of the short texts represented by all the nodes of each community in the community division result is then judged according to the complex network->
Figure FDA0002738520120000017
Sequencing the total number of short text participating users represented by all the nodes of the communities in the community division result to obtain the top N communities;
5) And 4) counting keyword sets corresponding to the short texts in the first N communities obtained in the step 4), sequencing TF-IDF in the counted keyword sets, and constructing a finally found topic keyword set by using the first N keywords in the sequencing result.
2. The method for early identifying the multi-source heterogeneous online network topics according to claim 1, wherein in the step 1), the original short text information disclosed by the crawled online social network comprises news titles of news websites and microblogs of microblog platforms, and a short text keyword set is constructed according to the crawled original short text information by a method of Chinese word segmentation and text feature value extraction TF-IDF.
3. The method for early recognition of the multi-source heterogeneous online network topic according to claim 1, wherein the short text is taken as a complex network
Figure FDA0002738520120000021
The edges between the nodes represent the incidence relation between the short texts;
complex network
Figure FDA0002738520120000022
Where i, j denotes the time t 0 Previously crawled microblog information and news headlines, C i A set of keywords representing short text i; n is a radical of hydrogen ij Representing a short text keyword set C i And C j Is determined by the number of coincidences of the keyword(s), is greater than or equal to>
Figure FDA0002738520120000023
V i Network node represented by short text message i, E ij For the association between short text i and short text j, N ij =0 denotes no continuous edge between short texts i and j, N ij 0 indicates that there is an edge between the short texts i and j, and edge E ij Is weighted by N ij
4. The method for early identifying the multi-source heterogeneous online network topics as claimed in claim 1, wherein a static community discovery method is adopted in the step 3) to identify the complex network topics
Figure FDA0002738520120000024
And carrying out community division.
5. The method for early identifying the topic in the multi-source heterogeneous online network according to claim 1, wherein the newly added short texts and connection information in the time increment delta t are added to the complex network
Figure FDA0002738520120000025
To form a new complex network
Figure FDA0002738520120000026
6. The method for early identifying the multi-source heterogeneous online network topic as claimed in claim 1, wherein the short text and the connection information added in the time increment delta t are determined according to the complex network topic
Figure FDA0002738520120000027
The relationship of the middle community is divided into two categories, wherein the first category is based on the complex network->
Figure FDA0002738520120000028
Newly added text node set with medium relationship close to each other>
Figure FDA0002738520120000029
The second category is with complex networks &>
Figure FDA00027385201200000210
Newly added text node set with middle community relation loose->
Figure FDA00027385201200000211
Determining newly added text node set based on modularity gain index delta Q>
Figure FDA00027385201200000212
And complex network>
Figure FDA00027385201200000213
The membership of the middle community, and the newly added text node set is/are judged by using a static community division method>
Figure FDA00027385201200000214
Go to societyDividing the groups, determining a newly added community and realizing the judgment of the complex network>
Figure FDA0002738520120000031
Dynamic community partitioning. />
CN202011141881.0A 2020-10-22 2020-10-22 Early identification method for multi-source heterogeneous online network topics Active CN112241492B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011141881.0A CN112241492B (en) 2020-10-22 2020-10-22 Early identification method for multi-source heterogeneous online network topics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011141881.0A CN112241492B (en) 2020-10-22 2020-10-22 Early identification method for multi-source heterogeneous online network topics

Publications (2)

Publication Number Publication Date
CN112241492A CN112241492A (en) 2021-01-19
CN112241492B true CN112241492B (en) 2023-04-07

Family

ID=74169687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011141881.0A Active CN112241492B (en) 2020-10-22 2020-10-22 Early identification method for multi-source heterogeneous online network topics

Country Status (1)

Country Link
CN (1) CN112241492B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268230A (en) * 2014-09-28 2015-01-07 福州大学 Method for detecting objective points of Chinese micro-blogs based on heterogeneous graph random walk
CN106055604A (en) * 2016-05-25 2016-10-26 南京大学 Short text topic model mining method based on word network to extend characteristics
CN106372125A (en) * 2016-08-24 2017-02-01 安阳师范学院 Method for building case study model of educational technology microblog group under SNA perspective
CN108804432A (en) * 2017-04-26 2018-11-13 慧科讯业有限公司 It is a kind of based on network media data Stream Discovery and to track the mthods, systems and devices of much-talked-about topic
CN110532390A (en) * 2019-08-26 2019-12-03 南京邮电大学 A kind of news keyword extracting method based on NER and Complex Networks Feature

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130098772A (en) * 2012-02-28 2013-09-05 삼성전자주식회사 Topic-based community index generation apparatus, topic-based community searching apparatus, topic-based community index generation method and topic-based community searching method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268230A (en) * 2014-09-28 2015-01-07 福州大学 Method for detecting objective points of Chinese micro-blogs based on heterogeneous graph random walk
CN106055604A (en) * 2016-05-25 2016-10-26 南京大学 Short text topic model mining method based on word network to extend characteristics
CN106372125A (en) * 2016-08-24 2017-02-01 安阳师范学院 Method for building case study model of educational technology microblog group under SNA perspective
CN108804432A (en) * 2017-04-26 2018-11-13 慧科讯业有限公司 It is a kind of based on network media data Stream Discovery and to track the mthods, systems and devices of much-talked-about topic
CN110532390A (en) * 2019-08-26 2019-12-03 南京邮电大学 A kind of news keyword extracting method based on NER and Complex Networks Feature

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Detecting popular topics in micro-blogging based on a user interest-based model";Shuangyong Song et al.;《 International Joint Conference on Neural Networks》;20120730;全文 *
"基于词共现关系和粗糙集的微博话题检测方法";兰天 等;《计算机系统应用》;20160615;全文 *

Also Published As

Publication number Publication date
CN112241492A (en) 2021-01-19

Similar Documents

Publication Publication Date Title
CN106980692B (en) Influence calculation method based on microblog specific events
CN103823844B (en) Question forwarding system and question forwarding method on the basis of subjective and objective context and in community question-and-answer service
Wu et al. User-as-Graph: User Modeling with Heterogeneous Graph Pooling for News Recommendation.
CN104991956B (en) Microblogging based on theme probabilistic model is propagated group and is divided and account liveness appraisal procedure
US20150142820A1 (en) Association strengths and value significances of ontological subjects of networks and compositions
CN103927398A (en) Microblog hype group discovering method based on maximum frequent item set mining
Hristakieva et al. The spread of propaganda by coordinated communities on social media
CN108230169B (en) Information propagation model based on social influence and situation perception system and method
CN106992966B (en) Network information transmission implementation method for true and false messages
CN106570763A (en) User influence evaluation method and system
Arakawa et al. Adding T witter‐specific features to stylistic features for classifying tweets by user type and number of retweets
CN106156117A (en) Hidden community core communication circle detection towards particular topic finds method and system
CN113032557A (en) Microblog hot topic discovery method based on frequent word set and BERT semantics
Sha et al. Matching user accounts across social networks based on users message
CN114218457A (en) False news detection method based on forward social media user representation
CN112241492B (en) Early identification method for multi-source heterogeneous online network topics
CN115329078B (en) Text data processing method, device, equipment and storage medium
Dong et al. Online Burst Events Detection Oriented Real-Time Microblog Message Stream.
Hogan Using Information Networks to Study Social Behavior: An Appraisal.
CN113849598A (en) Social media false information detection method and system based on deep learning
Xiao et al. Data analysis algorithms for mining online communities from microblogs
Zhao et al. High-value user identification based on topic weight
Tu et al. How to improve the rumor-confutation ability of official rumor-refuting account on social media: A Chinese case study
CN107577681A (en) A kind of terrain analysis based on social media picture, recommend method and system
Liu et al. Data Acquisition, Hot Issues and System of Microblog Mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant