CN105183909B - social network user interest predicting method based on Gaussian mixture model - Google Patents

social network user interest predicting method based on Gaussian mixture model Download PDF

Info

Publication number
CN105183909B
CN105183909B CN201510646248.XA CN201510646248A CN105183909B CN 105183909 B CN105183909 B CN 105183909B CN 201510646248 A CN201510646248 A CN 201510646248A CN 105183909 B CN105183909 B CN 105183909B
Authority
CN
China
Prior art keywords
microblog
formula
user
gaussian mixture
hot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510646248.XA
Other languages
Chinese (zh)
Other versions
CN105183909A (en
Inventor
郑相涵
赖太平
郭文忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201510646248.XA priority Critical patent/CN105183909B/en
Publication of CN105183909A publication Critical patent/CN105183909A/en
Application granted granted Critical
Publication of CN105183909B publication Critical patent/CN105183909B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a social network user interest predicting method based on a Gaussian mixture model. The method comprises the following steps that 1, user data are acquired from a social network; 2, feature vector extraction is performed on the acquired user data, and a series of feature vectors are generated; 3, a predicting model is built by adopting the Gaussian mixture model; 4, parameters are optimized by adopting an EM algorithm, and a predicting result is calculated. According to the social network user interest predicting method based on the Gaussian mixture model, the Gaussian mixture model is adopted, therefore, the higher predicting precision can be achieved, the using time is shortened, and the short-term interest of a user is effectively predicted.

Description

Social network user interest prediction method based on Gaussian mixture model
Technical Field
The invention relates to the technical field of social network information analysis, in particular to a social network user interest prediction method based on a Gaussian mixture model.
Background
The rapid diffusion of information and the convenience of social networks facilitate a large number of users sharing their daily activities, exchanging opinions, or building friendships with others. A report showed that by 2017, the number of users in the global social network was estimated to be 23.3 billion. Therefore, effective feature learning and interest prediction are of great significance not only to users (e.g., looking for users with similar interests), but also to service providers (e.g., analyzing user behavior in a set of application scenarios for personalized recommendations).
However, given the characteristics of social data (e.g., large amount, diversity, data value, etc.), it is difficult to predict user interests with high accuracy while ensuring that computational complexity and latency are within acceptable ranges. Furthermore, short-term interests may change dynamically (e.g., by friends) in the user interest profile. Therefore, a social network user interest prediction method based on a Gaussian mixture model is provided, and the short-term interest of the user can be effectively predicted.
Disclosure of Invention
In view of this, the present invention provides a social network user interest prediction method based on a gaussian mixture model, so as to achieve higher prediction accuracy, shorten the usage time, and effectively predict the short-term interest of the user.
The invention is realized by adopting the following scheme: a social network user interest prediction method based on a Gaussian mixture model comprises the following steps:
step S1: obtaining user data from a social network;
step S2: extracting a characteristic vector of the acquired user data to generate a series of characteristic vectors;
step S3: adopting a Gaussian mixture model to construct a prediction model;
step S4: and optimizing parameters by adopting an EM algorithm and calculating a prediction result.
Further, the step S1 is specifically: microblog information published or forwarded by p microblog users is acquired as training data, microblog information published or forwarded by q microblog users is acquired as test data, and r hot microblog categories and s hot microblogs in each hot microblog category are acquired.
Further, the step S2 is specifically: preprocessing the hot microblog, wherein the preprocessing comprises word segmentation, word frequency statistics and duplicate removal, t hot keywords can be obtained and used as interest characteristic values of hot microblog classes, and therefore r t-dimensional hot microblog characteristic vectors are generated; meanwhile, with microblog users as units, preprocessing the training data and the test data, including Chinese word segmentation, stop word processing and word frequency statistics; and extracting t interest characteristic values corresponding to the user from microblog information published or forwarded by the microblog user according to the r t-dimensional hot microblog characteristic vectors, and converting the t interest characteristic values into the characteristic vectors of the microblog user.
Preferably, the method for Chinese word segmentation comprises the following steps: a Chinese word segmentation system is adopted, and a user-defined user dictionary is combined to segment words of the microblog galaxies; the stop word processing method comprises the following steps: and filtering useless information by adopting a HashMap quick index table look-up method to reduce the noise of microblog information.
Further, the gaussian mixture model in step S3 is defined as a linearly superimposed gaussian model, as shown in formula (1):
wherein the Gaussian density N (x | mu)kΣ k) is a hybrid component with an average value μkWith a covariance of ∑k,πkIs the mixing coefficient; integrating both sides of equation (1) with respect to x and normalizing p (x) and the single gaussian component yields equation (2) as follows:
since it is required that p (x) is not less than 0, N (x | mu)kΣ k) is equal to or greater than 0, then πk≥0;
In conjunction with equation (2), equation (3) is obtained:
0≤πk≤1 (3)
therefore, the mixing coefficient satisfies the condition of becoming probability, and the marginal density obtained by the addition and multiplication principle is as shown in formula (4):
the formula (4) corresponds to the formula (1), where πkP (k), is the prior probability of the kth element, density N (x | μ |)kWhere Σ k) ═ p (x | k) is the probability of x under k conditions; therefore, according to bayes' theorem, the following formula (5) is generated:
assume that the feature vector data set that needs to be predicted is { x }1,……,xNRepresents the dataset as an N × D matrix X, where Xn TRepresents the nth row; using a corresponding stealth random variable with zn TAn N × K matrix Z representation representing rows;
then the mixture of gaussiansThe shape of the distribution can be controlled by the parameters pi, mu and sigma, where pi ≡ { pi ≡ pi1,…,πk},μ≡{μ1,…,μk},Σ≡{Σ1,…,Σk}; after performing the maximum likelihood estimation, the formula (1) is converted into the following formula (6):
wherein X ═ { X ═ X1,……,xN}。
Further, the step S4 specifically includes the following steps:
step S41: initializing the mean value mu by using EM algorithmkCovariance ΣkπkAnd coefficient of mixing pikAnd evaluating the initial log-likelihood estimation function value;
step S42: the implicit class variables are estimated using the following equation (7):
step S43: the parameter update is performed by using the following formula (8), formula (9), formula (10), and formula (12):
wherein,
step S44: the log-likelihood estimation function value is evaluated using the following formula (12)
If the formula (12) does not satisfy the convergence criterion, the step S42 is returned to.
Compared with the prior art, the method adopts the Gaussian mixture model, can realize higher prediction precision on the interest of the social network user, shortens the use time, and effectively predicts the short-term interest of the user.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a system framework diagram of interest prediction in the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
The embodiment provides a social network user interest prediction method based on a gaussian mixture model, as shown in fig. 1 and fig. 2, including the following steps:
step S1: obtaining user data from a social network;
step S2: extracting a characteristic vector of the acquired user data to generate a series of characteristic vectors;
step S3: adopting a Gaussian mixture model to construct a prediction model;
step S4: and optimizing parameters by adopting an EM algorithm and calculating a prediction result.
In this embodiment, the step S1 specifically includes: microblog information published or forwarded by p microblog users is acquired as training data, microblog information published or forwarded by q microblog users is acquired as test data, and r hot microblog categories and s hot microblogs in each hot microblog category are acquired.
In this embodiment, the step S2 specifically includes: preprocessing the hot microblog, wherein the preprocessing comprises word segmentation, word frequency statistics and duplicate removal, t hot keywords can be obtained and used as interest characteristic values of hot microblog classes, and therefore r t-dimensional hot microblog characteristic vectors are generated; meanwhile, with microblog users as units, preprocessing the training data and the test data, including Chinese word segmentation, stop word processing and word frequency statistics; and extracting t interest characteristic values corresponding to the user from microblog information published or forwarded by the microblog user according to the r t-dimensional hot microblog characteristic vectors, and converting the t interest characteristic values into the characteristic vectors of the microblog user.
In this embodiment, preferably, the method for chinese word segmentation includes: a Chinese word segmentation system is adopted, and a user-defined user dictionary is combined to segment words of the microblog galaxies; the stop word processing method comprises the following steps: and filtering useless information by adopting a HashMap quick index table look-up method to reduce the noise of microblog information.
In this embodiment, deduplication is performed to account for different classes that may contain the same key, and deduplication functionality is necessary to reduce the redundant manual process.
In this embodiment, the gaussian mixture model in step S3 is defined as a linearly superimposed gaussian model, as shown in formula (1):
wherein the Gaussian density N (x | mu)kΣ k) is a hybrid component with an average value μkWith a covariance of ∑k,πkIs the mixing coefficient; integrating both sides of equation (1) with respect to x and normalizing p (x) and the single gaussian component yields equation (2) as follows:
since it is required that p (x) is not less than 0, N (x | mu)kΣ k) is equal to or greater than 0, then πk≥0;
In conjunction with equation (2), equation (3) is obtained:
0≤πk≤1 (3)
therefore, the mixing coefficient satisfies the condition of becoming probability, and the marginal density obtained by the addition and multiplication principle is as shown in formula (4):
the formula (4) corresponds to the formula (1), where πkP (k), is the prior probability of the kth element, density N (x | μ |)kWhere Σ k) ═ p (x | k) is the probability of x under k conditions; therefore, according to bayes' theorem, the following formula (5) is generated:
assume that the feature vector data set that needs to be predicted is { x }1,……,xNRepresents the dataset as an N × D matrix X, where Xn TRepresents the nth row; using a corresponding stealth random variable with zn TAn N × K matrix Z representation representing rows;
the shape of the gaussian mixture profile can be controlled by the parameters pi, mu and sigma, where pi ≡ { pi ≡ pi1,…,πk},μ≡{μ1,…,μk},Σ≡{Σ1,…,Σk}; after performing the maximum likelihood estimation, the formula (1) is converted into the following formula (6):
wherein X ═ { X ═ X1,……,xN}。
In this embodiment, the step S4 specifically includes the following steps:
step S41: initializing the mean value mu by using EM algorithmkCovariance ΣkπkAnd coefficient of mixing pikAnd evaluating the initial log-likelihood estimation function value;
step S42: the implicit class variables are estimated using the following equation (7):
step S43: the parameter update is performed by using the following formula (8), formula (9), formula (10), and formula (12):
wherein,
step S44: the log-likelihood estimation function value is evaluated using the following formula (12)
If the formula (12) does not satisfy the convergence criterion, the step S42 is returned to.
The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims (2)

1. A social network user interest prediction method based on a Gaussian mixture model is characterized by comprising the following steps: the method comprises the following steps:
step S1: obtaining user data from a social network;
step S2: extracting a characteristic vector of the acquired user data to generate a series of characteristic vectors;
step S3: adopting a Gaussian mixture model to construct a prediction model;
step S4: optimizing parameters by adopting an EM algorithm and calculating a prediction result;
the step S1 specifically includes: acquiring microblog information issued or forwarded by p microblog users as training data, acquiring microblog information issued or forwarded by q microblog users as test data, and acquiring r hot microblog categories and s hot microblogs in each hot microblog category;
the step S2 specifically includes: preprocessing the hot microblog, wherein the preprocessing comprises word segmentation, word frequency statistics and duplicate removal, t hot keywords can be obtained and used as interest characteristic values of hot microblog classes, and therefore r t-dimensional hot microblog characteristic vectors are generated; meanwhile, with microblog users as units, preprocessing the training data and the test data, including Chinese word segmentation, stop word processing and word frequency statistics; extracting t interest characteristic values corresponding to the user from microblog information published or forwarded by the microblog user according to the r t-dimensional hot microblog characteristic vectors, and converting the t interest characteristic values into the characteristic vectors of the microblog user;
the gaussian mixture model in step S3 is defined as a linearly superimposed gaussian model, as shown in formula (1):
wherein the Gaussian density N (x | mu)kΣ k) is a hybrid component with an average value μkWith a covariance of ∑k,πkIs the mixing coefficient; integrating both sides of equation (1) with respect to x and normalizing p (x) and the single gaussian component yields equation (2) as follows:
since it is required that p (x) is not less than 0, N (x | mu)kΣ k) is equal to or greater than 0, then πk≥0;
In conjunction with equation (2), equation (3) is obtained:
0≤πk≤1 (3)
therefore, the mixing coefficient satisfies the condition of becoming probability, and the marginal density obtained by the addition and multiplication principle is as shown in formula (4):
the formula (4) corresponds to the formula (1), where πkP (k), is the prior probability of the kth element, density N (x | μ |)kWhere Σ k) ═ p (x | k) is the probability of x under k conditions; therefore, according to bayes' theorem, the following formula (5) is generated:
assume that the feature vector data set that needs to be predicted is { x }1,……,xNRepresents the dataset as an N × D matrix X, where Xn TRepresents the nth row; using a corresponding stealth random variable with zn TAn N × K matrix Z representation representing rows;
the shape of the gaussian mixture profile can be controlled by the parameters pi, mu and sigma, where pi ≡ { pi ≡ pi1,…,πk},μ≡{μ1,…,μk},Σ≡{Σ1,…,Σk}; after performing the maximum likelihood estimation, the formula (1) is converted into the following formula (6):
wherein X ═ { X ═ X1,……,xN};
The step S4 specifically includes the following steps:
step S41: initializing the mean value mu by using EM algorithmkCovariance ΣkAnd coefficient of mixing pikAnd evaluating the initial log-likelihood estimation function value;
step S42: the implicit class variables are estimated using the following equation (7):
step S43: the parameter update is performed by using the following formula (8), formula (9), formula (10), and formula (11):
wherein,
step S44: the log-likelihood estimation function value is evaluated using the following formula (12)
If the formula (12) does not satisfy the convergence criterion, the step S42 is returned to.
2. The method of claim 1, wherein the social network user interest prediction method based on the Gaussian mixture model comprises: the Chinese word segmentation method comprises the following steps: a Chinese word segmentation system is adopted, and a user-defined user dictionary is combined to segment words of the microblog galaxies; the stop word processing method comprises the following steps: and filtering useless information by adopting a HashMap quick index table look-up method to reduce the noise of microblog information.
CN201510646248.XA 2015-10-09 2015-10-09 social network user interest predicting method based on Gaussian mixture model Active CN105183909B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510646248.XA CN105183909B (en) 2015-10-09 2015-10-09 social network user interest predicting method based on Gaussian mixture model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510646248.XA CN105183909B (en) 2015-10-09 2015-10-09 social network user interest predicting method based on Gaussian mixture model

Publications (2)

Publication Number Publication Date
CN105183909A CN105183909A (en) 2015-12-23
CN105183909B true CN105183909B (en) 2017-04-12

Family

ID=54905990

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510646248.XA Active CN105183909B (en) 2015-10-09 2015-10-09 social network user interest predicting method based on Gaussian mixture model

Country Status (1)

Country Link
CN (1) CN105183909B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220233A (en) * 2017-05-09 2017-09-29 北京理工大学 A kind of user knowledge demand model construction method based on gauss hybrid models

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105786711A (en) * 2016-03-25 2016-07-20 广州华多网络科技有限公司 Data analysis method and device
CN109949938B (en) * 2017-12-20 2024-04-26 北京亚信数据有限公司 Method and device for standardizing medical non-standard names
CN110869953B (en) * 2018-02-06 2024-09-24 北京嘀嘀无限科技发展有限公司 System and method for recommending traffic travel service
CN110119827A (en) * 2018-02-06 2019-08-13 北京嘀嘀无限科技发展有限公司 With the prediction technique and device of vehicle type
CN108182339B (en) * 2018-03-20 2021-08-13 北京工业大学 Window state prediction method and system based on Gaussian distribution
CN109190040B (en) * 2018-08-31 2021-05-28 合肥工业大学 Collaborative evolution-based personalized recommendation method and device
CN111241821B (en) * 2018-11-28 2023-04-28 杭州海康威视数字技术股份有限公司 Method and device for determining behavior characteristics of user

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077412A (en) * 2014-07-14 2014-10-01 福州大学 Micro-blog user interest prediction method based on multiple Markov chains
CN104636496A (en) * 2015-03-04 2015-05-20 重庆理工大学 Hybrid clustering recommendation method based on Gaussian distribution and distance similarity

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140358630A1 (en) * 2013-05-31 2014-12-04 Thomson Licensing Apparatus and process for conducting social media analytics

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077412A (en) * 2014-07-14 2014-10-01 福州大学 Micro-blog user interest prediction method based on multiple Markov chains
CN104636496A (en) * 2015-03-04 2015-05-20 重庆理工大学 Hybrid clustering recommendation method based on Gaussian distribution and distance similarity

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220233A (en) * 2017-05-09 2017-09-29 北京理工大学 A kind of user knowledge demand model construction method based on gauss hybrid models
CN107220233B (en) * 2017-05-09 2020-06-16 北京理工大学 User knowledge demand model construction method based on Gaussian mixture model

Also Published As

Publication number Publication date
CN105183909A (en) 2015-12-23

Similar Documents

Publication Publication Date Title
CN105183909B (en) social network user interest predicting method based on Gaussian mixture model
Schelldorfer et al. Estimation for high‐dimensional linear mixed‐effects models using ℓ1‐penalization
Brooks et al. Nonparametric convergence assessment for MCMC model selection
US9875294B2 (en) Method and apparatus for classifying object based on social networking service, and storage medium
CN113259325B (en) Network security situation prediction method for optimizing Bi-LSTM based on sparrow search algorithm
CN114239464B (en) Circuit yield prediction method and system based on Bayesian filter and resampling
CN108734287A (en) Compression method and device, terminal, the storage medium of deep neural network model
Cong et al. Fast and effective model order selection method to determine the number of sources in a linear transformation model
Noughabi et al. On the entropy estimators
CN116187563A (en) Sea surface temperature space-time intelligent prediction method based on fusion improvement variation modal decomposition
CN115345293A (en) Training method and device of text processing model based on differential privacy
Ding et al. Full‐reference image quality assessment using statistical local correlation
Zitouni et al. Asymptotic properties of the estimator for a finite mixture of exponential dispersion models
CN109217844B (en) Hyper-parameter optimization method based on pre-training random Fourier feature kernel LMS
Sevilla et al. Bayesian topology inference on partially known networks from input-output pairs
JP2016520220A (en) Hidden attribute model estimation device, method and program
Wiencierz et al. Restricted likelihood ratio testing in linear mixed models with general error covariance structure
Madukaife et al. Estimation of Shannon differential entropy: An extensive comparative review
Debbabi et al. A new unsupervised threshold determination for hybrid models
Hansen et al. Bayesian compressed sensing with unknown measurement noise level
Lei et al. A weighted K-SVD-based double sparse representations approach for wireless channels using the modified Takenaka-Malmquist basis
Burnaev et al. Adaptive design of experiments for sobol indices estimation based on quadratic metamodel
Lee Generalized Bernoulli process: simulation, estimation, and application
CN114842236B (en) Image classification method, image classification device, computer readable storage medium and electronic device
Li et al. Goodness-of-fit tests of a parametric density functions: Monte Carlo simulation studies

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant