CN105302866A - OSN community discovery method based on LDA Theme model - Google Patents

OSN community discovery method based on LDA Theme model Download PDF

Info

Publication number
CN105302866A
CN105302866A CN201510611455.1A CN201510611455A CN105302866A CN 105302866 A CN105302866 A CN 105302866A CN 201510611455 A CN201510611455 A CN 201510611455A CN 105302866 A CN105302866 A CN 105302866A
Authority
CN
China
Prior art keywords
document
topic
lda
model
probability distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510611455.1A
Other languages
Chinese (zh)
Inventor
曹玖新
马卓
陈巧云
刘波
周涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201510611455.1A priority Critical patent/CN105302866A/en
Publication of CN105302866A publication Critical patent/CN105302866A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an online social network (short for OSN) community discovery method based on a Latent Dirichlet Allocation (short for LDA) theme model. The method comprises the following steps first pre-processing data, building an LDA theme model (including an LDA-F model and an LDA-T model) based on a relationship between a user in the online social network and other friends and word information expressed by the user to solve a model probability distribution, then estimating parameters via a Gibbs sampling algorithm, and at last discovering an OSN community according to the estimated parameters. By the use of the OSN community discovery method based on the LDA Theme model, a corresponding probability model can be achieved based on user blog semantic information discovery without the use of information connection via the network topology; blog content semantic similarities are introduced to effectively describe user interest and hobby probability distribution conditions; and with the introduction of community internal topological connection tightness, communities with close internal topological connections can be discovered.

Description

OSN community discovery method based on LDA topic model
Technical Field
The invention relates to an Online Social Network (OSN) community discovery mechanism utilizing an LDA topic model, belonging to the field of social computing, in particular to the field of community discovery.
Background
With the rapid development of the internet, the network gradually changes from the original data-based core to the human-based core, which promotes the rapid development of the online social network. The online social network is different from the traditional interpersonal relationship network, not only has large-scale users and friend relationships thereof, but also has a large amount of text information spontaneously expressed by the users, which brings new vitality and challenge to community discovery work.
The traditional community discovery method is mainly based on connection, namely the topological structure of a graph, the method carries out community division by analyzing explicit connection among individuals, the connection among nodes in discovered communities is relatively tight, the connection among different communities is relatively sparse, but the method does not consider the theme characteristics of users. In the microblog, the tweet of the user usually implies the information of the interests, hobbies, behavior patterns and the like of the user, and the topic model used in the natural language processing can take the factors into consideration.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the prior art, the community discovery method based on the LDA topic model provided by the invention can be used for obtaining a corresponding probability model by mining the microblog semantic information of a user on the basis of not depending on network topology connection information, and can be used for effectively describing the probability distribution condition of the interests and hobbies of the user by introducing the semantic similarity of microblog contents; introducing the compactness of topological connection in the community, and mining the community with relatively very tight topological connection in the community.
The technical scheme is as follows: in order to solve the above problems, the invention provides an OSN community discovery method based on an LDA topic model, which utilizes the relationship between users and friends thereof in an online social network and the character information spontaneously expressed by the users to perform the OSN community discovery process, and comprises the following steps:
1) preprocessing a data set, performing preprocessing work such as word segmentation, word pause removal, noise removal and the like on the microblog document of the original user, specifically, extracting [ uid, text ] fields of each record from the weibo data set, and classifying all microblogs according to the uid, wherein the format of each record is [ uid, text 1; text 2; … … ], performing word segmentation by using a Chinese lexical analysis system ICTCCLAS 2013 version of Chinese academy of sciences, and removing pause words and words (such as URL (uniform resource locator), punctuation marks, Chinese language words and the like) which have no practical significance to the model and microblog emoticons in the word segmentation process; performing user relationship bidirectional processing on a followers data set in a document for recording user relationship and eliminating users without friends, wherein the format of each record is [ user, friend 1; friend 2; ... ];
2) the method comprises the steps that an LDA theme model is built according to established community elements, and comprises a theme model LDA-T built based on semantic similarity of microblog content in a community and a theme model LDA-F built based on topological connection closeness, a term set in the LDA-T is a set formed by terms in all tweets of users, a document set is a set formed by the tweets of all users, the theme is a set of the community, the term set in the LDA-F is a set formed by all friends of the users, the document set is a set formed by all users, and the theme is a set of the community;
3) according to the models LDA-T and LDA-F obtained in the step 2, Dirichlet distribution is applied to the topic probability distribution under the document and the term probability distribution under the topic, and combined probability distribution p (w) based on the hyper-parameters is generatedm,zm,θmΦ | α), where α and β are hyper-parameters of the Dirichlet distribution, wmSet representing all terms in the mth document, zmSet representing topics corresponding to all terms in the mth document, θmRepresenting the topic probability distribution of the mth document, wherein phi represents a set of term probability distributions under all topics;
4) estimating the probability distribution theta of the theme when giving the document by utilizing a Gibbs sampling algorithm according to the joint probability distribution obtained in the step 3mAnd probability distribution of terms given a topic
5) And acquiring communities according to the parameters obtained in the step 4.
The generation process and parameters of the document in the LDA model in the step 2 are defined as follows:
1) for each topic K ∈ [1, K]Sampling term probability distributions for topic k
2) For each document M ∈ [1, M]Sampling the topic probability distribution θ of document mm~Dir(α);
3) For each document M ∈ [1, M]Length N of sample document mm~Poiss(ξ);
4) For term N ∈ [1, N ] in each document mm]Selecting an implicit topic zm,n~Mult(θm) Generating a term
Wherein N ismThe number of terms contained in the mth document is shown, K is the number of topics, M is the number of documents, and α, β and ξ are parameters of probability distribution.
The joint probability distribution generated in step 3 is:
wherein, wmSet representing all terms in the mth document, zmSet representing topics corresponding to all terms in the mth document, θmRepresenting the topic probability distribution of the mth document,. phi.represents the set of term probability distributions under all topics, α and β are hyper-parameters of the Dirichlet distribution, wm,nN term, z, representing the m documentm.nRepresenting the topic corresponding to the nth term in the mth document, NmIndicating the number of terms contained in the mth document.
In the step 4, the known term set is required to apply the Gibbs sampling algorithm to the LDA modelThe prior Dirichlet distribution parameter α and the topic number K finally obtain the probability distribution theta of the topic in the given document and the probability distribution of the term in the given topicThe calculation method comprises the following steps:
θ m , k = n m ( k ) + α k Σ k = 1 K ( n m ( k ) + α k ) - - - ( 2 )
wherein, thetam,kTo representGiven the probability of a topic being k for document m,representing the number of times the topic k appears in the document m, α ═ α1,α2,…,αmHyper-parameters for M-dimensional Dirichlet distribution, αkFor positive real numbers, the pair parameter theta is reflectedmK is the number of topics in the document m;representing the probability of the term being t given a topic k,representing the number of occurrences of term t in topic k, β ═ β1,β2,…,βkHyper-parameters for K-dimensional Dirichlet distribution, βtFor positive real number, the pair of parameters are reflectedV is the number of terms in the topic k.
Has the advantages that: the invention adopts the technical scheme, and has the following advantages:
1) the semantic similarity of the microblog contents is introduced, and the probability distribution condition of the interests and hobbies of the user is effectively described;
2) introducing the compactness of internal topological connection of the community, and excavating the community with relatively very tight internal topological connection;
3) the method comprises the steps that a traditional community discovery method is improved by using a topic model, and a corresponding probability model is obtained by mining user microblog semantic information on the basis of not depending on network topology connection information;
4) the Gibbs sampling algorithm is utilized to carry out parameter estimation on the model, and compared with two algorithms for parameter estimation, namely variational reasoning and EM algorithm, the algorithm can process the situation of complex distribution more simply and rapidly;
5) and a data set preprocessing mechanism is introduced, so that the accuracy of community discovery results is ensured.
Drawings
FIG. 1 is a diagram of an LDA topic model of the present invention;
FIG. 2 is a flow chart of the Gibbs sampling algorithm of the present invention;
FIG. 3 is a flow chart of community discovery according to the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, which is defined in the appended claims, as interpreted by those skilled in the art.
An OSN community discovery method based on an LDA topic model comprises the steps of firstly, preprocessing a data set; establishing an LDA topic model (comprising an LDA-F model and an LDA-T model) by utilizing the relationship between the users and friends thereof in the online social network and the character information spontaneously expressed by the users, and solving the probability distribution of the model; then, parameter estimation is carried out by utilizing a Gibbs sampling algorithm; and finally, carrying out OSN community discovery according to the estimated parameters, specifically comprising the following steps:
1) carrying out data set preprocessing:
1.1) dataset preprocessing of LDA-F models
Because the friend relationship defined by the LDA-F model must be a bidirectional edge, the user relationship bidirectional processing needs to be carried out on the followers data set, users without friends are eliminated, and the format of each record is [ user, friend 1; friend 2; … … ];
1.2) dataset preprocessing of LDA-T model
Extracting [ uid, text ] fields of each record from the weibo data set, and classifying all microblogs according to the uid, wherein the format of each record is [ uid, text 1; text 2; … … ]; the word segmentation is carried out on the corpus of the LDA-T model by using a Chinese lexical analysis system ICTCCLAS 2013 version of Chinese academy of sciences, and in the word segmentation process, stop words and words (such as URLs, punctuations, word elements and the like) which have no practical significance to the model are removed, and microblog emoticons are removed.
2) Solving the probability distribution of the model:
the topic model LDA-T constructed based on the semantic similarity of the microblog content in the community and the topic model LDA-F constructed based on the topological connection compactness in the community belong to the LDA model.
In a topic model LDA-T constructed based on semantic similarity of microblog content in a community, a term set is a set formed by terms in all tweets of users, a document set is a set formed by the tweets of all users, and a topic is a set of the community; in a topic model LDA-F constructed based on the intra-community topological connection closeness, a term set is a set formed by all friends of a user, a document set is a set formed by all users, and a topic is a set of communities.
For an LDA model with M documents and K subjects, the generation process and parameters of the documents in the specific LDA model are defined as follows:
2.1) for each topic K ∈ [1, K]Sampling term probability distributions for topic k
2.2) for each document M ∈ [1, M]Sampling the topic probability distribution θ of document mm~Dir(α);
2.3) for each document M ∈ [1, M]Length N of sample document mm~Poiss(ξ);
2.4) term N ∈ [1, N) for each document mm]Selecting an implicit topic zm,n~Mult(θm) Generating a term
Wherein N ismThe number of terms included in the mth document is represented, and α, β, and ξ are parameters of probability distribution.
According to the generated LDA model document, Dirichlet distribution is applied to the topic probability distribution under the document and the term probability distribution under the topic, and a combined probability distribution p (w) based on the hyper-parameters is generatedm,zm,θm,Φ|α,β):
Wherein, wmSet representing all terms in the mth document, zmSet representing topics corresponding to all terms in the mth document, θmRepresenting the topic probability distribution of the mth document,. phi.represents the set of term probability distributions under all topics, α and β are hyper-parameters of the Dirichlet distribution, wm,nN term, z, representing the m documentm.nRepresenting the topic corresponding to the nth term in the mth document, NmIndicating the number of terms contained in the mth document.
3) Parameter estimation using gibbs sampling:
estimating the parameter theta and theta from the subject variable z using the Gibbs sampling algorithmFor an LDA model to use Gibbs sampling algorithm, a known term set is neededParameters of prior Dirichlet distributionsα and the number of subjects K finally determine the parameters theta and theta to be estimatedWhere θ is the probability distribution of the topic given the document, the calculation method is shown in equation 2,the probability distribution of terms for a given topic is calculated as shown in equation 3:
θ m , k = n m ( k ) + α k Σ k = 1 K ( n m ( k ) + α k ) - - - ( 2 )
wherein, thetam,kRepresenting the probability of a topic k given a document m,representing the number of times the topic k appears in the document m, α ═ α1,α2,…,αmHyper-parameters for M-dimensional Dirichlet distribution, αkFor positive real numbers, the pair parameter theta is reflectedmK is the number of topics in the document m;representing the probability of the term being t given a topic k,representing the number of occurrences of term t in topic k, β ═ β1,β2,…,βkHyper-parameters for K-dimensional Dirichlet distribution, βtFor positive real number, the pair of parameters are reflectedV is the number of terms in the topic k. The specific gibbs sampling algorithm is as follows:
3.1) initializing Global variablesnkAnd nmWhereinrepresenting the number of times the term t appears in the topic k,indicates the number of times the topic k appears in the document m, nkIs composed ofSum of (1), nmIs composed ofThe sum of (a);
3.2) for each document M ∈ [1, M]Term N ∈ [1, N ] in (1)m]Sampling a subject zm,nLet K to Mult (1/K), let global variablenkAnd nmRespectively carrying out self-increment operation;
3.3) skipping to the step 3.2 until all the documents are circularly traversed, and skipping to the step 3.4 to start iteration after the circulation traversal is finished;
3.4) for each document M ∈ [1, M]Term N ∈ [1, N ] in (1)m]Make a global variablenkAnd nmSeparately performing a self-subtraction operation, and then sampling the subjectThen make the global variablenkAnd nmRespectively carrying out self-increment operation;
3.5) jump to step 3.4 until the number of iterations I is reached.
Furthermore, as mentioned in step 3.4 Is the gibbs sampling formula of the LDA model.
4) From the resulting parameter, the probability distribution θ of the topic given the documentmPractical significance in LDA-T model and LDA-F model, the parameter theta is knownmThe actual meanings of are givenAnd (4) probability distribution of communities when the user is used, thereby obtaining the communities represented in the form of probability distribution.

Claims (6)

1. An OSN community discovery method based on an LDA topic model is characterized in that the OSN community discovery process is carried out by utilizing the relationship between users and friends thereof in an online social network and character information spontaneously expressed by the users, and comprises the following steps:
1) preprocessing a data set, performing preprocessing work such as word segmentation, word pause removal, noise removal and the like on a microblog document of an original user, performing user relationship bidirectional processing on a followers data set in a document recording user relationship, and eliminating users without friends;
2) the method comprises the steps that an LDA theme model is built according to established community elements, and comprises a theme model LDA-T built based on semantic similarity of microblog content in a community and a theme model LDA-F built based on topological connection closeness, a term set in the LDA-T is a set formed by terms in all tweets of users, a document set is a set formed by the tweets of all users, the theme is a set of the community, the term set in the LDA-F is a set formed by all friends of the users, the document set is a set formed by all users, and the theme is a set of the community;
3) according to the models LDA-T and LDA-F obtained in the step 2, Dirichlet distribution is applied to the topic probability distribution under the document and the term probability distribution under the topic, and combined probability distribution p (w) based on the hyper-parameters is generatedm,zmmΦ | α), where α and β are hyper-parameters of the Dirichlet distribution, wmSet representing all terms in the mth document, zmSet representing topics corresponding to all terms in the mth document, θmRepresenting the topic probability distribution of the mth document, wherein phi represents a set of term probability distributions under all topics;
4) estimating the probability distribution theta of the theme when giving the document by utilizing a Gibbs sampling algorithm according to the joint probability distribution obtained in the step 3mAnd probability distribution of terms given a topic
5) And acquiring communities according to the parameters obtained in the step 4.
2. The OSN community discovery method based on LDA topic model as claimed in claim 1, wherein the noise removed in step 1 comprises URL, punctuation mark, tone word and emoticon.
3. The OSN community discovery method based on LDA topic model as claimed in claim 1, wherein the generation process and parameters of the document in LDA model in step 2 are defined as follows:
1) for each topic K ∈ [1, K]To adoptTerm probability distribution of sample topic k
2) For each document M ∈ [1, M]Sampling the topic probability distribution θ of document mm~Dir(α);
3) For each document M ∈ [1, M]Length N of sample document mm~Poiss(ξ);
4) For term N ∈ [1, N ] in each document mm]Selecting an implicit topic zm,n~Mult(θm) Generating a term
Wherein N ismThe number of terms contained in the mth document is shown, K is the number of topics, M is the number of documents, and α, β and ξ are parameters of probability distribution.
4. The OSN community discovery method based on LDA topic model as claimed in claim 3, wherein the joint probability distribution generated in step 3 is:
wherein, wmSet representing all terms in the mth document, zmSet representing topics corresponding to all terms in the mth document, θmRepresenting the topic probability distribution of the mth document,. phi.represents the set of term probability distributions under all topics, α and β are hyper-parameters of the Dirichlet distribution, wm,nN term, z, representing the m documentm.nRepresenting the topic corresponding to the nth term in the mth document, NmIndicating the number of terms contained in the mth document.
5. The OSN community discovery method based on LDA topic model as claimed in claim 4, wherein the probability distribution of topics given the document in step 4 is calculated by:
θ m , k = n m ( k ) + α k Σ k = 1 K ( n m ( k ) + α k ) - - - ( 2 )
wherein, thetam,kRepresenting the probability of a topic k given a document m,indicating the number of times the topic k appears in the document m, α ═<α12,…,αm>Hyper-parameter for M-dimensional Dirichlet distribution, αkFor positive real numbers, the pair parameter theta is reflectedmK is the number of topics in the document m.
6. The OSN community discovery method based on LDA topic model as claimed in claim 4, wherein the calculation method of the probability distribution of terms in the step 4 given the topic is:
wherein,representing the probability of the term being t given a topic k,indicating the number of occurrences of term t in topic k, β ═<β12,…,βk>Hyper-parameter for K-dimensional Dirichlet distribution, βtFor positive real number, the pair of parameters are reflectedV is the number of terms in the topic k.
CN201510611455.1A 2015-09-23 2015-09-23 OSN community discovery method based on LDA Theme model Pending CN105302866A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510611455.1A CN105302866A (en) 2015-09-23 2015-09-23 OSN community discovery method based on LDA Theme model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510611455.1A CN105302866A (en) 2015-09-23 2015-09-23 OSN community discovery method based on LDA Theme model

Publications (1)

Publication Number Publication Date
CN105302866A true CN105302866A (en) 2016-02-03

Family

ID=55200136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510611455.1A Pending CN105302866A (en) 2015-09-23 2015-09-23 OSN community discovery method based on LDA Theme model

Country Status (1)

Country Link
CN (1) CN105302866A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095976A (en) * 2016-06-20 2016-11-09 杭州电子科技大学 A kind of interest Dimensional level extracting method based on microblog data supporting OLAP to apply
CN107122455A (en) * 2017-04-26 2017-09-01 中国人民解放军国防科学技术大学 A kind of network user's enhancing method for expressing based on microblogging
CN107704460A (en) * 2016-06-22 2018-02-16 北大方正集团有限公司 Customer relationship abstracting method and customer relationship extraction system
CN112487110A (en) * 2020-12-07 2021-03-12 中国船舶重工集团公司第七一六研究所 Overlapped community evolution analysis method and system based on network structure and node content
CN112632215A (en) * 2020-12-01 2021-04-09 重庆邮电大学 Community discovery method and system based on word-pair semantic topic model
CN114461879A (en) * 2022-01-21 2022-05-10 哈尔滨理工大学 Semantic social network multi-view community discovery method based on text feature integration

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488637A (en) * 2012-06-11 2014-01-01 北京大学 Method for carrying out expert search based on dynamic community mining
CN104268271A (en) * 2014-10-13 2015-01-07 北京建筑大学 Interest and network structure double-cohesion social network community discovering method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488637A (en) * 2012-06-11 2014-01-01 北京大学 Method for carrying out expert search based on dynamic community mining
CN104268271A (en) * 2014-10-13 2015-01-07 北京建筑大学 Interest and network structure double-cohesion social network community discovering method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAIZHENG ZHANG 等: ""An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks"", 《INTELLIGENCE & SECURITY INFORMATICS,2007 IEEE》 *
吴小兰 等: ""结合内容和链接关系的社区发现方法研究"", 《情报理论与实践》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095976A (en) * 2016-06-20 2016-11-09 杭州电子科技大学 A kind of interest Dimensional level extracting method based on microblog data supporting OLAP to apply
CN106095976B (en) * 2016-06-20 2019-09-24 杭州电子科技大学 A kind of interest Dimensional level extracting method based on microblog data for supporting OLAP to apply
CN107704460A (en) * 2016-06-22 2018-02-16 北大方正集团有限公司 Customer relationship abstracting method and customer relationship extraction system
CN107122455A (en) * 2017-04-26 2017-09-01 中国人民解放军国防科学技术大学 A kind of network user's enhancing method for expressing based on microblogging
CN107122455B (en) * 2017-04-26 2019-12-31 中国人民解放军国防科学技术大学 Network user enhanced representation method based on microblog
CN112632215A (en) * 2020-12-01 2021-04-09 重庆邮电大学 Community discovery method and system based on word-pair semantic topic model
CN112487110A (en) * 2020-12-07 2021-03-12 中国船舶重工集团公司第七一六研究所 Overlapped community evolution analysis method and system based on network structure and node content
CN114461879A (en) * 2022-01-21 2022-05-10 哈尔滨理工大学 Semantic social network multi-view community discovery method based on text feature integration

Similar Documents

Publication Publication Date Title
CN107729392B (en) Text structuring method, device and system and non-volatile storage medium
CN105302866A (en) OSN community discovery method based on LDA Theme model
CN107122455B (en) Network user enhanced representation method based on microblog
CN103793501B (en) Based on the theme Combo discovering method of social networks
CN108681557B (en) Short text topic discovery method and system based on self-expansion representation and similar bidirectional constraint
CN106886580B (en) Image emotion polarity analysis method based on deep learning
CN107798043B (en) Text clustering method for long text auxiliary short text based on Dirichlet multinomial mixed model
CN109033320B (en) Bilingual news aggregation method and system
CN113051932B (en) Category detection method for network media event of semantic and knowledge expansion theme model
CN102270212A (en) User interest feature extraction method based on hidden semi-Markov model
CN108733647B (en) Word vector generation method based on Gaussian distribution
Nhlabano et al. Impact of text pre-processing on the performance of sentiment analysis models for social media data
CN112487110A (en) Overlapped community evolution analysis method and system based on network structure and node content
Khan et al. Sentiment Analysis using Support Vector Machine and Random Forest
CN114065749A (en) Text-oriented Guangdong language recognition model and training and recognition method of system
WO2016090625A1 (en) Scalable web data extraction
Chen et al. Learning user embedding representation for gender prediction
Shi et al. SRTM: A Sparse RNN-Topic Model for Discovering Bursty Topics in Big Data of Social Networks.
Shi et al. A sparse topic model for bursty topic discovery in social networks.
CN111339289B (en) Topic model inference method based on commodity comments
Yan et al. Multilayer network representation learning method based on random walk of multiple information
Mashayekhi et al. Microblog topic detection using evolutionary clustering and social network information
Jayakumar et al. Analyzing the development of complex social systems of characters in a work of literary fiction
Han et al. An effective heterogeneous information network representation learning framework
Fan et al. Topic modeling methods for short texts: A survey

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160203

RJ01 Rejection of invention patent application after publication