CN105302866A - OSN community discovery method based on LDA Theme model - Google Patents
OSN community discovery method based on LDA Theme model Download PDFInfo
- Publication number
- CN105302866A CN105302866A CN201510611455.1A CN201510611455A CN105302866A CN 105302866 A CN105302866 A CN 105302866A CN 201510611455 A CN201510611455 A CN 201510611455A CN 105302866 A CN105302866 A CN 105302866A
- Authority
- CN
- China
- Prior art keywords
- document
- topic
- lda
- model
- probability distribution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000009826 distribution Methods 0.000 claims abstract description 67
- 238000005070 sampling Methods 0.000 claims abstract description 19
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 230000011218 segmentation Effects 0.000 claims description 6
- 230000002457 bidirectional effect Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000005065 mining Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an online social network (short for OSN) community discovery method based on a Latent Dirichlet Allocation (short for LDA) theme model. The method comprises the following steps first pre-processing data, building an LDA theme model (including an LDA-F model and an LDA-T model) based on a relationship between a user in the online social network and other friends and word information expressed by the user to solve a model probability distribution, then estimating parameters via a Gibbs sampling algorithm, and at last discovering an OSN community according to the estimated parameters. By the use of the OSN community discovery method based on the LDA Theme model, a corresponding probability model can be achieved based on user blog semantic information discovery without the use of information connection via the network topology; blog content semantic similarities are introduced to effectively describe user interest and hobby probability distribution conditions; and with the introduction of community internal topological connection tightness, communities with close internal topological connections can be discovered.
Description
Technical Field
The invention relates to an Online Social Network (OSN) community discovery mechanism utilizing an LDA topic model, belonging to the field of social computing, in particular to the field of community discovery.
Background
With the rapid development of the internet, the network gradually changes from the original data-based core to the human-based core, which promotes the rapid development of the online social network. The online social network is different from the traditional interpersonal relationship network, not only has large-scale users and friend relationships thereof, but also has a large amount of text information spontaneously expressed by the users, which brings new vitality and challenge to community discovery work.
The traditional community discovery method is mainly based on connection, namely the topological structure of a graph, the method carries out community division by analyzing explicit connection among individuals, the connection among nodes in discovered communities is relatively tight, the connection among different communities is relatively sparse, but the method does not consider the theme characteristics of users. In the microblog, the tweet of the user usually implies the information of the interests, hobbies, behavior patterns and the like of the user, and the topic model used in the natural language processing can take the factors into consideration.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the prior art, the community discovery method based on the LDA topic model provided by the invention can be used for obtaining a corresponding probability model by mining the microblog semantic information of a user on the basis of not depending on network topology connection information, and can be used for effectively describing the probability distribution condition of the interests and hobbies of the user by introducing the semantic similarity of microblog contents; introducing the compactness of topological connection in the community, and mining the community with relatively very tight topological connection in the community.
The technical scheme is as follows: in order to solve the above problems, the invention provides an OSN community discovery method based on an LDA topic model, which utilizes the relationship between users and friends thereof in an online social network and the character information spontaneously expressed by the users to perform the OSN community discovery process, and comprises the following steps:
1) preprocessing a data set, performing preprocessing work such as word segmentation, word pause removal, noise removal and the like on the microblog document of the original user, specifically, extracting [ uid, text ] fields of each record from the weibo data set, and classifying all microblogs according to the uid, wherein the format of each record is [ uid, text 1; text 2; … … ], performing word segmentation by using a Chinese lexical analysis system ICTCCLAS 2013 version of Chinese academy of sciences, and removing pause words and words (such as URL (uniform resource locator), punctuation marks, Chinese language words and the like) which have no practical significance to the model and microblog emoticons in the word segmentation process; performing user relationship bidirectional processing on a followers data set in a document for recording user relationship and eliminating users without friends, wherein the format of each record is [ user, friend 1; friend 2; ... ];
2) the method comprises the steps that an LDA theme model is built according to established community elements, and comprises a theme model LDA-T built based on semantic similarity of microblog content in a community and a theme model LDA-F built based on topological connection closeness, a term set in the LDA-T is a set formed by terms in all tweets of users, a document set is a set formed by the tweets of all users, the theme is a set of the community, the term set in the LDA-F is a set formed by all friends of the users, the document set is a set formed by all users, and the theme is a set of the community;
3) according to the models LDA-T and LDA-F obtained in the step 2, Dirichlet distribution is applied to the topic probability distribution under the document and the term probability distribution under the topic, and combined probability distribution p (w) based on the hyper-parameters is generatedm,zm,θmΦ | α), where α and β are hyper-parameters of the Dirichlet distribution, wmSet representing all terms in the mth document, zmSet representing topics corresponding to all terms in the mth document, θmRepresenting the topic probability distribution of the mth document, wherein phi represents a set of term probability distributions under all topics;
4) estimating the probability distribution theta of the theme when giving the document by utilizing a Gibbs sampling algorithm according to the joint probability distribution obtained in the step 3mAnd probability distribution of terms given a topic
5) And acquiring communities according to the parameters obtained in the step 4.
The generation process and parameters of the document in the LDA model in the step 2 are defined as follows:
1) for each topic K ∈ [1, K]Sampling term probability distributions for topic k
2) For each document M ∈ [1, M]Sampling the topic probability distribution θ of document mm~Dir(α);
3) For each document M ∈ [1, M]Length N of sample document mm~Poiss(ξ);
4) For term N ∈ [1, N ] in each document mm]Selecting an implicit topic zm,n~Mult(θm) Generating a term
Wherein N ismThe number of terms contained in the mth document is shown, K is the number of topics, M is the number of documents, and α, β and ξ are parameters of probability distribution.
The joint probability distribution generated in step 3 is:
wherein, wmSet representing all terms in the mth document, zmSet representing topics corresponding to all terms in the mth document, θmRepresenting the topic probability distribution of the mth document,. phi.represents the set of term probability distributions under all topics, α and β are hyper-parameters of the Dirichlet distribution, wm,nN term, z, representing the m documentm.nRepresenting the topic corresponding to the nth term in the mth document, NmIndicating the number of terms contained in the mth document.
In the step 4, the known term set is required to apply the Gibbs sampling algorithm to the LDA modelThe prior Dirichlet distribution parameter α and the topic number K finally obtain the probability distribution theta of the topic in the given document and the probability distribution of the term in the given topicThe calculation method comprises the following steps:
wherein, thetam,kTo representGiven the probability of a topic being k for document m,representing the number of times the topic k appears in the document m, α ═ α1,α2,…,αmHyper-parameters for M-dimensional Dirichlet distribution, αkFor positive real numbers, the pair parameter theta is reflectedmK is the number of topics in the document m;representing the probability of the term being t given a topic k,representing the number of occurrences of term t in topic k, β ═ β1,β2,…,βkHyper-parameters for K-dimensional Dirichlet distribution, βtFor positive real number, the pair of parameters are reflectedV is the number of terms in the topic k.
Has the advantages that: the invention adopts the technical scheme, and has the following advantages:
1) the semantic similarity of the microblog contents is introduced, and the probability distribution condition of the interests and hobbies of the user is effectively described;
2) introducing the compactness of internal topological connection of the community, and excavating the community with relatively very tight internal topological connection;
3) the method comprises the steps that a traditional community discovery method is improved by using a topic model, and a corresponding probability model is obtained by mining user microblog semantic information on the basis of not depending on network topology connection information;
4) the Gibbs sampling algorithm is utilized to carry out parameter estimation on the model, and compared with two algorithms for parameter estimation, namely variational reasoning and EM algorithm, the algorithm can process the situation of complex distribution more simply and rapidly;
5) and a data set preprocessing mechanism is introduced, so that the accuracy of community discovery results is ensured.
Drawings
FIG. 1 is a diagram of an LDA topic model of the present invention;
FIG. 2 is a flow chart of the Gibbs sampling algorithm of the present invention;
FIG. 3 is a flow chart of community discovery according to the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, which is defined in the appended claims, as interpreted by those skilled in the art.
An OSN community discovery method based on an LDA topic model comprises the steps of firstly, preprocessing a data set; establishing an LDA topic model (comprising an LDA-F model and an LDA-T model) by utilizing the relationship between the users and friends thereof in the online social network and the character information spontaneously expressed by the users, and solving the probability distribution of the model; then, parameter estimation is carried out by utilizing a Gibbs sampling algorithm; and finally, carrying out OSN community discovery according to the estimated parameters, specifically comprising the following steps:
1) carrying out data set preprocessing:
1.1) dataset preprocessing of LDA-F models
Because the friend relationship defined by the LDA-F model must be a bidirectional edge, the user relationship bidirectional processing needs to be carried out on the followers data set, users without friends are eliminated, and the format of each record is [ user, friend 1; friend 2; … … ];
1.2) dataset preprocessing of LDA-T model
Extracting [ uid, text ] fields of each record from the weibo data set, and classifying all microblogs according to the uid, wherein the format of each record is [ uid, text 1; text 2; … … ]; the word segmentation is carried out on the corpus of the LDA-T model by using a Chinese lexical analysis system ICTCCLAS 2013 version of Chinese academy of sciences, and in the word segmentation process, stop words and words (such as URLs, punctuations, word elements and the like) which have no practical significance to the model are removed, and microblog emoticons are removed.
2) Solving the probability distribution of the model:
the topic model LDA-T constructed based on the semantic similarity of the microblog content in the community and the topic model LDA-F constructed based on the topological connection compactness in the community belong to the LDA model.
In a topic model LDA-T constructed based on semantic similarity of microblog content in a community, a term set is a set formed by terms in all tweets of users, a document set is a set formed by the tweets of all users, and a topic is a set of the community; in a topic model LDA-F constructed based on the intra-community topological connection closeness, a term set is a set formed by all friends of a user, a document set is a set formed by all users, and a topic is a set of communities.
For an LDA model with M documents and K subjects, the generation process and parameters of the documents in the specific LDA model are defined as follows:
2.1) for each topic K ∈ [1, K]Sampling term probability distributions for topic k
2.2) for each document M ∈ [1, M]Sampling the topic probability distribution θ of document mm~Dir(α);
2.3) for each document M ∈ [1, M]Length N of sample document mm~Poiss(ξ);
2.4) term N ∈ [1, N) for each document mm]Selecting an implicit topic zm,n~Mult(θm) Generating a term
Wherein N ismThe number of terms included in the mth document is represented, and α, β, and ξ are parameters of probability distribution.
According to the generated LDA model document, Dirichlet distribution is applied to the topic probability distribution under the document and the term probability distribution under the topic, and a combined probability distribution p (w) based on the hyper-parameters is generatedm,zm,θm,Φ|α,β):
Wherein, wmSet representing all terms in the mth document, zmSet representing topics corresponding to all terms in the mth document, θmRepresenting the topic probability distribution of the mth document,. phi.represents the set of term probability distributions under all topics, α and β are hyper-parameters of the Dirichlet distribution, wm,nN term, z, representing the m documentm.nRepresenting the topic corresponding to the nth term in the mth document, NmIndicating the number of terms contained in the mth document.
3) Parameter estimation using gibbs sampling:
estimating the parameter theta and theta from the subject variable z using the Gibbs sampling algorithmFor an LDA model to use Gibbs sampling algorithm, a known term set is neededParameters of prior Dirichlet distributionsα and the number of subjects K finally determine the parameters theta and theta to be estimatedWhere θ is the probability distribution of the topic given the document, the calculation method is shown in equation 2,the probability distribution of terms for a given topic is calculated as shown in equation 3:
wherein, thetam,kRepresenting the probability of a topic k given a document m,representing the number of times the topic k appears in the document m, α ═ α1,α2,…,αmHyper-parameters for M-dimensional Dirichlet distribution, αkFor positive real numbers, the pair parameter theta is reflectedmK is the number of topics in the document m;representing the probability of the term being t given a topic k,representing the number of occurrences of term t in topic k, β ═ β1,β2,…,βkHyper-parameters for K-dimensional Dirichlet distribution, βtFor positive real number, the pair of parameters are reflectedV is the number of terms in the topic k. The specific gibbs sampling algorithm is as follows:
3.1) initializing Global variablesnkAnd nmWhereinrepresenting the number of times the term t appears in the topic k,indicates the number of times the topic k appears in the document m, nkIs composed ofSum of (1), nmIs composed ofThe sum of (a);
3.2) for each document M ∈ [1, M]Term N ∈ [1, N ] in (1)m]Sampling a subject zm,nLet K to Mult (1/K), let global variablenkAnd nmRespectively carrying out self-increment operation;
3.3) skipping to the step 3.2 until all the documents are circularly traversed, and skipping to the step 3.4 to start iteration after the circulation traversal is finished;
3.4) for each document M ∈ [1, M]Term N ∈ [1, N ] in (1)m]Make a global variablenkAnd nmSeparately performing a self-subtraction operation, and then sampling the subjectThen make the global variablenkAnd nmRespectively carrying out self-increment operation;
3.5) jump to step 3.4 until the number of iterations I is reached.
Furthermore, as mentioned in step 3.4 Is the gibbs sampling formula of the LDA model.
4) From the resulting parameter, the probability distribution θ of the topic given the documentmPractical significance in LDA-T model and LDA-F model, the parameter theta is knownmThe actual meanings of are givenAnd (4) probability distribution of communities when the user is used, thereby obtaining the communities represented in the form of probability distribution.
Claims (6)
1. An OSN community discovery method based on an LDA topic model is characterized in that the OSN community discovery process is carried out by utilizing the relationship between users and friends thereof in an online social network and character information spontaneously expressed by the users, and comprises the following steps:
1) preprocessing a data set, performing preprocessing work such as word segmentation, word pause removal, noise removal and the like on a microblog document of an original user, performing user relationship bidirectional processing on a followers data set in a document recording user relationship, and eliminating users without friends;
2) the method comprises the steps that an LDA theme model is built according to established community elements, and comprises a theme model LDA-T built based on semantic similarity of microblog content in a community and a theme model LDA-F built based on topological connection closeness, a term set in the LDA-T is a set formed by terms in all tweets of users, a document set is a set formed by the tweets of all users, the theme is a set of the community, the term set in the LDA-F is a set formed by all friends of the users, the document set is a set formed by all users, and the theme is a set of the community;
3) according to the models LDA-T and LDA-F obtained in the step 2, Dirichlet distribution is applied to the topic probability distribution under the document and the term probability distribution under the topic, and combined probability distribution p (w) based on the hyper-parameters is generatedm,zm,θmΦ | α), where α and β are hyper-parameters of the Dirichlet distribution, wmSet representing all terms in the mth document, zmSet representing topics corresponding to all terms in the mth document, θmRepresenting the topic probability distribution of the mth document, wherein phi represents a set of term probability distributions under all topics;
4) estimating the probability distribution theta of the theme when giving the document by utilizing a Gibbs sampling algorithm according to the joint probability distribution obtained in the step 3mAnd probability distribution of terms given a topic
5) And acquiring communities according to the parameters obtained in the step 4.
2. The OSN community discovery method based on LDA topic model as claimed in claim 1, wherein the noise removed in step 1 comprises URL, punctuation mark, tone word and emoticon.
3. The OSN community discovery method based on LDA topic model as claimed in claim 1, wherein the generation process and parameters of the document in LDA model in step 2 are defined as follows:
1) for each topic K ∈ [1, K]To adoptTerm probability distribution of sample topic k
2) For each document M ∈ [1, M]Sampling the topic probability distribution θ of document mm~Dir(α);
3) For each document M ∈ [1, M]Length N of sample document mm~Poiss(ξ);
4) For term N ∈ [1, N ] in each document mm]Selecting an implicit topic zm,n~Mult(θm) Generating a term
Wherein N ismThe number of terms contained in the mth document is shown, K is the number of topics, M is the number of documents, and α, β and ξ are parameters of probability distribution.
4. The OSN community discovery method based on LDA topic model as claimed in claim 3, wherein the joint probability distribution generated in step 3 is:
wherein, wmSet representing all terms in the mth document, zmSet representing topics corresponding to all terms in the mth document, θmRepresenting the topic probability distribution of the mth document,. phi.represents the set of term probability distributions under all topics, α and β are hyper-parameters of the Dirichlet distribution, wm,nN term, z, representing the m documentm.nRepresenting the topic corresponding to the nth term in the mth document, NmIndicating the number of terms contained in the mth document.
5. The OSN community discovery method based on LDA topic model as claimed in claim 4, wherein the probability distribution of topics given the document in step 4 is calculated by:
wherein, thetam,kRepresenting the probability of a topic k given a document m,indicating the number of times the topic k appears in the document m, α ═<α1,α2,…,αm>Hyper-parameter for M-dimensional Dirichlet distribution, αkFor positive real numbers, the pair parameter theta is reflectedmK is the number of topics in the document m.
6. The OSN community discovery method based on LDA topic model as claimed in claim 4, wherein the calculation method of the probability distribution of terms in the step 4 given the topic is:
wherein,representing the probability of the term being t given a topic k,indicating the number of occurrences of term t in topic k, β ═<β1,β2,…,βk>Hyper-parameter for K-dimensional Dirichlet distribution, βtFor positive real number, the pair of parameters are reflectedV is the number of terms in the topic k.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510611455.1A CN105302866A (en) | 2015-09-23 | 2015-09-23 | OSN community discovery method based on LDA Theme model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510611455.1A CN105302866A (en) | 2015-09-23 | 2015-09-23 | OSN community discovery method based on LDA Theme model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105302866A true CN105302866A (en) | 2016-02-03 |
Family
ID=55200136
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510611455.1A Pending CN105302866A (en) | 2015-09-23 | 2015-09-23 | OSN community discovery method based on LDA Theme model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105302866A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106095976A (en) * | 2016-06-20 | 2016-11-09 | 杭州电子科技大学 | A kind of interest Dimensional level extracting method based on microblog data supporting OLAP to apply |
CN107122455A (en) * | 2017-04-26 | 2017-09-01 | 中国人民解放军国防科学技术大学 | A kind of network user's enhancing method for expressing based on microblogging |
CN107704460A (en) * | 2016-06-22 | 2018-02-16 | 北大方正集团有限公司 | Customer relationship abstracting method and customer relationship extraction system |
CN112487110A (en) * | 2020-12-07 | 2021-03-12 | 中国船舶重工集团公司第七一六研究所 | Overlapped community evolution analysis method and system based on network structure and node content |
CN112632215A (en) * | 2020-12-01 | 2021-04-09 | 重庆邮电大学 | Community discovery method and system based on word-pair semantic topic model |
CN114461879A (en) * | 2022-01-21 | 2022-05-10 | 哈尔滨理工大学 | Semantic social network multi-view community discovery method based on text feature integration |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488637A (en) * | 2012-06-11 | 2014-01-01 | 北京大学 | Method for carrying out expert search based on dynamic community mining |
CN104268271A (en) * | 2014-10-13 | 2015-01-07 | 北京建筑大学 | Interest and network structure double-cohesion social network community discovering method |
-
2015
- 2015-09-23 CN CN201510611455.1A patent/CN105302866A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488637A (en) * | 2012-06-11 | 2014-01-01 | 北京大学 | Method for carrying out expert search based on dynamic community mining |
CN104268271A (en) * | 2014-10-13 | 2015-01-07 | 北京建筑大学 | Interest and network structure double-cohesion social network community discovering method |
Non-Patent Citations (2)
Title |
---|
HAIZHENG ZHANG 等: ""An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks"", 《INTELLIGENCE & SECURITY INFORMATICS,2007 IEEE》 * |
吴小兰 等: ""结合内容和链接关系的社区发现方法研究"", 《情报理论与实践》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106095976A (en) * | 2016-06-20 | 2016-11-09 | 杭州电子科技大学 | A kind of interest Dimensional level extracting method based on microblog data supporting OLAP to apply |
CN106095976B (en) * | 2016-06-20 | 2019-09-24 | 杭州电子科技大学 | A kind of interest Dimensional level extracting method based on microblog data for supporting OLAP to apply |
CN107704460A (en) * | 2016-06-22 | 2018-02-16 | 北大方正集团有限公司 | Customer relationship abstracting method and customer relationship extraction system |
CN107122455A (en) * | 2017-04-26 | 2017-09-01 | 中国人民解放军国防科学技术大学 | A kind of network user's enhancing method for expressing based on microblogging |
CN107122455B (en) * | 2017-04-26 | 2019-12-31 | 中国人民解放军国防科学技术大学 | Network user enhanced representation method based on microblog |
CN112632215A (en) * | 2020-12-01 | 2021-04-09 | 重庆邮电大学 | Community discovery method and system based on word-pair semantic topic model |
CN112487110A (en) * | 2020-12-07 | 2021-03-12 | 中国船舶重工集团公司第七一六研究所 | Overlapped community evolution analysis method and system based on network structure and node content |
CN114461879A (en) * | 2022-01-21 | 2022-05-10 | 哈尔滨理工大学 | Semantic social network multi-view community discovery method based on text feature integration |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107729392B (en) | Text structuring method, device and system and non-volatile storage medium | |
CN105302866A (en) | OSN community discovery method based on LDA Theme model | |
CN107122455B (en) | Network user enhanced representation method based on microblog | |
CN103793501B (en) | Based on the theme Combo discovering method of social networks | |
CN108681557B (en) | Short text topic discovery method and system based on self-expansion representation and similar bidirectional constraint | |
CN106886580B (en) | Image emotion polarity analysis method based on deep learning | |
CN107798043B (en) | Text clustering method for long text auxiliary short text based on Dirichlet multinomial mixed model | |
CN109033320B (en) | Bilingual news aggregation method and system | |
CN113051932B (en) | Category detection method for network media event of semantic and knowledge expansion theme model | |
CN102270212A (en) | User interest feature extraction method based on hidden semi-Markov model | |
CN108733647B (en) | Word vector generation method based on Gaussian distribution | |
Nhlabano et al. | Impact of text pre-processing on the performance of sentiment analysis models for social media data | |
CN112487110A (en) | Overlapped community evolution analysis method and system based on network structure and node content | |
Khan et al. | Sentiment Analysis using Support Vector Machine and Random Forest | |
CN114065749A (en) | Text-oriented Guangdong language recognition model and training and recognition method of system | |
WO2016090625A1 (en) | Scalable web data extraction | |
Chen et al. | Learning user embedding representation for gender prediction | |
Shi et al. | SRTM: A Sparse RNN-Topic Model for Discovering Bursty Topics in Big Data of Social Networks. | |
Shi et al. | A sparse topic model for bursty topic discovery in social networks. | |
CN111339289B (en) | Topic model inference method based on commodity comments | |
Yan et al. | Multilayer network representation learning method based on random walk of multiple information | |
Mashayekhi et al. | Microblog topic detection using evolutionary clustering and social network information | |
Jayakumar et al. | Analyzing the development of complex social systems of characters in a work of literary fiction | |
Han et al. | An effective heterogeneous information network representation learning framework | |
Fan et al. | Topic modeling methods for short texts: A survey |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160203 |
|
RJ01 | Rejection of invention patent application after publication |