CN106021388A - Classifying method of WeChat official accounts based on LDA topic clustering - Google Patents

Classifying method of WeChat official accounts based on LDA topic clustering Download PDF

Info

Publication number
CN106021388A
CN106021388A CN201610312725.3A CN201610312725A CN106021388A CN 106021388 A CN106021388 A CN 106021388A CN 201610312725 A CN201610312725 A CN 201610312725A CN 106021388 A CN106021388 A CN 106021388A
Authority
CN
China
Prior art keywords
article
public number
wechat public
word
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610312725.3A
Other languages
Chinese (zh)
Inventor
郭泽豪
王振宇
李风环
戴瑾如
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201610312725.3A priority Critical patent/CN106021388A/en
Publication of CN106021388A publication Critical patent/CN106021388A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a classifying method of WeChat official accounts based on LDA topic clustering. The method comprises the following steps: acquiring an article pushed by each active WeChat official account; performing word segmentation on each acquired article using a word segmentation tool, filtering stop words, counting the word frequency inverse document frequency of residual words; selecting the residual word with the word frequency inverse document frequency value greater than the threshold value as the feature word of the article; performing the hidden topic discovery on feature words in all active articles using an article topic generation model, constructing an article-topic feature vector, and reducing dimension of the article-topic feature vector using a principle component analysis method; clustering the dimension-reduced article-topic feature vector using a Level-Panel algorithm to acquire the cluster level and the articles in the cluster level; determining the type of the WeChat official account according to the cluster level information of the article pushed by the WeChat official account. Through the adoption of the method provided by the invention, the type of the WeChat official account can be accurately determined; an advertiser can conveniently select correct WeChat official account to advertise.

Description

The sorting technique of wechat public number based on LDA Subject Clustering
Technical field
The present invention relates to text mining field, particularly relate to the classification of a kind of wechat public number based on LDA Subject Clustering Method.
Background technology
Wechat public number was reached the standard grade in August, 2012, ended in August, 2015, and its quantity alreadys more than 10,000,000.The wechat public Number being deeply infiltrated into Long Tail Market with the form of " little and beautiful ", many booking readers measure many and that amount of reading is big wechat public number then Using advertisement or electricity business's cooperation as profit mode, wherein advertisement is one of topmost realization method.Little by little, wechat public number The every field of people's daily life will be contained, including fields such as news, physical culture, education and automobiles.For advertiser For, select the wechat public number of appropriate arts to throw in advertisement to reach profit maximization.
Wechat public number as the on-line off-line wechat interactive marketing mode of a kind of main flow, businessman can on wechat platform with Special group realizes the comprehensive communication of word, picture, voice and video, interaction mode.Wherein, wechat public number includes clothes Business number, subscription number and enterprise number, and subscribe to a number main form by pushing article and provide the user information, attract to use The concern at family, and the article pushed is usually and adapts with its function.Just carried out by the content that subscription number is pushed article Really divide the function of subscription number, can effectively help advertiser to select the suitably number of subscription to throw in advertisement.
In June, 2014, search dog search engine is proposed the search engine for wechat public number, makes user can pass through key Article or search wechat public number searched in word, and can also subscribe to wechat public number on search dog platform simultaneously, realizes first " outer net " of wechat public number is shown.
To sum up, owing to wechat public number is large number of and Covering domain is extensive so that advertiser is difficult to select one properly Public number throw in advertisement.For wechat public number, the article that its function pushes with it is corresponding.Search dog is searched for Engine accesses the data of wechat public number, and acquisition and analysis for data provide possibility.But, dividing in wechat public number Apoplexy due to endogenous wind, there is presently no a kind of efficient and that accuracy rate is high sorting technique.
Summary of the invention
In order to overcome shortcoming that prior art exists with not enough, the present invention provides a kind of wechat based on LDA Subject Clustering The sorting technique of public number, it is possible to more efficient and exactly wechat public number is classified.
For solving above-mentioned technical problem, the present invention provides following technical scheme: a kind of wechat based on LDA Subject Clustering is public The sorting technique of many numbers, comprises the following steps:
S1. obtain, by each wechat public number of enlivening, the article that this wechat public number pushes;
S2. utilize participle instrument that each the article obtained is carried out word segmentation, filter stop words, the word of statistics residue word The most anti-document frequency;
S3. the word frequency anti-document frequency value residue word more than the threshold threshold θ Feature Words as this article is chosen;
S4. select number of topics K, use document subject matter to generate the model Feature Words to all enlivening the article that public number pushes Do latent subject to find, build article-theme feature vector;
S5. principal component analytical method is used, to article-theme feature vector dimensionality reduction;
S6. use Level-Panel algorithm, to the article after dimensionality reduction-theme feature vector clusters, obtain class bunch and class Bunch interior article;
The class bunch information of the article S7. pushed according to wechat public number determines the classification of wechat public number.
Further, described wechat public number of enlivening refers to push certification and every month the quantity of article more than 3 Wechat public number.
Further, described participle instrument refers to participle instrument based on Chinese Academy of Sciences's Chinese Word Automatic Segmentation.
Further, described word frequency refers to that the frequency that a given word occurs in article, described anti-document frequency are Metric form and the described anti-document frequency of one word general importance of finger are the inverse of document frequency.
Further, described article-theme feature vector form of expression isWherein i is i-th literary composition Chapter,Being the probability of the kth theme of i-th article, n is the theme number.
Further, the detailed process of described step S5 is:
(5a) mean normalization, calculates the average value mu of each theme feature value according to whole articles-theme feature vectorj And standard deviation sigmaj, wherein j=1,2 ..., n, n be the theme number, orderTo whole article-theme features Vector is normalized, wherein i=1,2 ..., m, and m is number of files;
(5b) calculating covariance matrix, computational methods are as follows:
C o v = 1 / m × Σ i = 1 m ( p ( i ) ) × ( p ( i ) ) T
Wherein p(i)For the theme feature vector after i-th document normalization, (p(i))TFor p(i)The transposed vector of vector;
(5c) singular value decomposition, it is thus achieved that U, S and V matrix, computational methods are as follows:
[U, S, V]=svd (Cov);
(5d) selecting suitable dimension g after dimensionality reduction according to s-matrix, computational methods are as follows:
1 - &Sigma; i = 1 g S i i / &Sigma; i = 1 m S i i < = 0.01
Wherein, minima g of g is takenmin, gminFor allowing the characteristic loss most suitable dimension g in [0,0.01];
(5e) front g the column vector of U matrix is chosen, it is thus achieved that the matrix U of m × greduce, pass through matrix UreduceCalculate fall Article characteristic vector after dimension, computational methods are:
z ( i ) = p ( i ) &times; U r e d u c e T
Wherein, z(i)For the article characteristic vector after dimensionality reduction,For matrix UreduceTransposed matrix.
Further, the Level-Panel algorithm in described step S6 is text cluster side based on vector space model Method, its concrete method step is as follows:
According to described article-theme feature vector, given article set D={d1,d2,...,dm, wherein diIt it is i-th Article after article dimensionality reduction-theme feature vector,
(6a) every article in article set D is regarded as bunch C comprising single memberi={ di, wherein i=1, 2、...、m;
(6b) optionally one of them comprises bunch C of single memberkStarting point as cluster;
(6c) find in the sample not clustered and bunch CkThe distance point less than or equal to threshold threshold θ, i.e. similarity sim (Ck,Ci) any C of >=θi, by itself and CkMerge and form new bunch Ck=simCk∪Ci
(6d) step 6c is repeated, until the sample all not clustered and CkDistance be above threshold threshold θ, now gathered One class;
(6e) step 6b is repeated until whole single member bunch CiIt is involved in cluster.
Further, described threshold threshold θ is set to θ=0.025, and described number of topics K is set to K=100.
After using technique scheme, the present invention at least has the advantages that
1, the present invention is on the basis of word frequency-anti-document frequency (TF-IDF) feature of statistics wechat public number propelling movement article On, TF-IDF value is filtered less than the word of threshold threshold, remains the principal character of article, it is to avoid secondary feature dry Disturb.
2, use document subject matter to generate model (LDA) and the Feature Words in article is done latent subject discovery, obtain article master Topic characteristic vector, describes the feature of article, reduces calculating cost simultaneously from semantic level.
3, use principal component analysis (PCA) to article theme feature vector dimensionality reduction, find the dependency relation between theme, from And find more suitably number of topics.
4, using Level-Panel algorithm to the article theme feature vector clusters after dimensionality reduction, Level-Panel algorithm is Model based on space vector, shows the highest superiority on text cluster.
5, the present invention can classify effectively and exactly to wechat public number, helps advertiser to select suitable wechat public Throw in advertisement, there is good practicality for many numbers.
Accompanying drawing explanation
Fig. 1 is the flow chart of the sorting technique of present invention wechat based on LDA Subject Clustering public number;
Fig. 2 is the principal component analysis flow chart of the sorting technique of present invention wechat based on LDA Subject Clustering public number.
Fig. 3 is the Level-Panel algorithm flow of the sorting technique of present invention wechat based on LDA Subject Clustering public number Figure.
Detailed description of the invention
It should be noted that in the case of not conflicting, the embodiment in the application and the feature in embodiment can phases Combine mutually, with specific embodiment, the application is described in further detail below in conjunction with the accompanying drawings.
Embodiment
Fig. 1 is the sorting technique of the wechat public number based on LDA Subject Clustering disclosed in the present embodiment and each is corresponding The flow chart of step.As it is shown in figure 1, said method comprising the steps of:
S1, by each enliven wechat public number obtain this wechat public number push article;
S2, utilize participle instrument that each article carry out word segmentation, filter stop words, add up the anti-document frequency of its word frequency (TF-IDF);
S3, choose the TF-IDF value word more than threshold threshold θ (θ=0.025) as the Feature Words of this article;
S4, selection number of topics K (K=100), use document subject matter to generate model (LDA) and push all enlivening public number The Feature Words of article do latent subject and find, build article-theme feature vector;
S5, employing principal component analysis (PCA) method are to article-theme feature vector dimensionality reduction;
S6, employing Level-Panel algorithm, to the article after dimensionality reduction-theme feature vector clusters, obtain class bunch and class Bunch interior article;
S7, the class bunch information of article pushed according to wechat public number determine the classification of wechat public number.
Said method pushes word frequency-anti-document frequency (TF-IDF) feature of article by statistics wechat public number, builds The word feature vector of every article, uses Gaussian function to word feature vector normalization;Document subject matter is used to generate model (LDA) Feature Words in article does latent subject find, obtain the probability distribution of word-theme, build according to word-theme distribution probability Article theme feature vector, describes the feature of article, reduces calculating cost simultaneously from semantic level;Use principal component analysis (PCA) to article theme feature vector dimensionality reduction, find the dependency relation between theme, thus find more suitably number of topics;Adopt With Level-Panel algorithm to the article theme feature vector clusters after dimensionality reduction;Class belonging to article is pushed according to wechat public number Bunch information determine the classification of wechat public number.The present invention can accurately determine the classification of wechat public number, facilitates advertiser Correct wechat public number is selected to throw in advertisement.
Wherein, described enliven wechat public number to refer to push certification and every month the article number wechat more than 3 public Many numbers.
Further, described participle instrument refers to participle instrument (Ansj) based on Chinese Academy of Sciences's Chinese Word Automatic Segmentation.
Further, described word frequency refers to that the frequency that each word occurs in article, described anti-document frequency are certain words The metric form of general importance, if certain word is high in the frequency occurring in other documents, the most anti-document frequency is low, the most instead Document frequency is high, and described anti-document frequency is the inverse of document frequency.
Further, described article-theme feature vector form of expression isWherein i is i-th literary composition Chapter,Being the probability of the kth theme of i-th article, n is the theme number.
Further, as in figure 2 it is shown, described step S5 uses principal component analysis (PCA) to article-theme feature vector fall The detailed process of dimension is:
(5a) first step is mean normalization, calculates each theme feature value according to whole articles-theme feature vector Average value mujAnd standard deviation sigmaj, wherein j=1,2 ..., n, n be the theme number, orderTo whole articles- Theme feature vector is normalized, wherein i=1,2 ..., m, and m is number of files;
(5b) second step is to calculate covariance matrix Cov, and computational methods are as follows:
C o v = 1 / m &times; &Sigma; i = 1 m ( p ( i ) ) &times; ( p ( i ) ) T
Wherein p(i)For the theme feature vector after i-th document normalization, (p(i))TFor p(i)The transposed vector of vector;
(5c) the 3rd step is singular value decomposition, it is thus achieved that U, S and V matrix, and computational methods are as follows:
[U, S, V]=svd (Cov);
(5d) the 4th step is to select suitable dimension g after dimensionality reduction according to s-matrix, and computational methods are as follows:
1 - &Sigma; i = 1 g S i i / &Sigma; i = 1 m S i i < = 0.01
Take minima g of gmin, wherein gminFor allowing the characteristic loss most suitable dimension in the range of [0,0.01] g;
(5e) the 5th step chooses front g the column vector of U matrix, it is thus achieved that the matrix U of m × greduce, pass through UreduceCalculate Article characteristic vector after dimensionality reduction, computational methods are as follows:
z ( i ) = p ( i ) &times; U r e d u c e T ,
WhereinFor UreduceThe transposed matrix of matrix.
Further, as it is shown on figure 3, described Level-Panel algorithm is text cluster side based on vector space model Method, its concrete grammar is as follows:
According to described article-theme feature vector, given article set D={d1,d2,...,dm, wherein diIt it is i-th Particular subject vector after article dimensionality reduction,
(6a) every article in D is regarded as bunch C comprising single memberi={ di, wherein i=1,2 ..., m;
(6b) optionally one of them comprises bunch C of single memberkStarting point as cluster;
(6c) find and C in the sample not clusteredkThe distance point less than threshold threshold θ, i.e. similarity sim (Ck,Ci) Any C of >=θi, by itself and CkMerge and form new bunch Ck=simCk∪Ci
(6d) step 6c is repeated, until the sample all not clustered and CkDistance be above threshold threshold θ, now gathered One class;
(6e) step 6b is repeated until whole single member bunch CiIt is involved in cluster.
Wechat public number can be classified, have good availability by said method effectively exactly.
Although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, permissible It is understood by, these embodiments can be carried out the change of multiple equivalence without departing from the principles and spirit of the present invention Changing, revise, replace and modification, the scope of the present invention is limited by claims and equivalency range thereof.

Claims (8)

1. the sorting technique of a wechat public number based on LDA Subject Clustering, it is characterised in that described method includes following step Rapid:
S1. obtain, by each wechat public number of enlivening, the article that this wechat public number pushes;
S2. utilizing participle instrument that each the article obtained is carried out word segmentation, filter stop words, the word frequency of statistics residue word is anti- Document frequency;
S3. the word frequency anti-document frequency value residue word more than the threshold threshold θ Feature Words as this article is chosen;
S4. select number of topics K, use document subject matter to generate model and the Feature Words all enlivening the article that public number pushes is done hidden Sexual Themes finds, builds article-theme feature vector;
S5. principal component analytical method is used, to article-theme feature vector dimensionality reduction;
S6. use Level-Panel algorithm, to the article after dimensionality reduction-theme feature vector clusters, obtain in class bunch and class bunch Article;
The class bunch information of the article S7. pushed according to wechat public number determines the classification of wechat public number.
The sorting technique of wechat public number based on LDA Subject Clustering the most according to claim 1, it is characterised in that institute State and enliven wechat public number and refer to push certification and every month the quantity of the article wechat public number more than 3.
The sorting technique of wechat public number based on LDA Subject Clustering the most according to claim 1, it is characterised in that institute State participle instrument and refer to participle instrument based on Chinese Academy of Sciences's Chinese Word Automatic Segmentation.
The sorting technique of wechat public number based on LDA Subject Clustering the most according to claim 1, it is characterised in that institute Predicate frequency refers to that the frequency that a given word occurs in article, described anti-document frequency refer to a word general importance Metric form and described anti-document frequency are the inverse of document frequency.
The sorting technique of wechat public number based on LDA Subject Clustering the most according to claim 1, it is characterised in that institute Stating article-theme feature vector form of expression isWherein i is i-th article,It is the of i-th article The probability of k theme, n is the theme number.
The sorting technique of wechat public number based on LDA Subject Clustering the most according to claim 1, it is characterised in that institute The detailed process stating step S5 is:
(5a) mean normalization, calculates the average value mu of each theme feature value according to whole articles-theme feature vectorjAnd standard Difference σj, wherein j=1,2 ..., n, n be the theme number, orderWhole articles-theme feature vector is entered Row normalization, wherein i=1,2 ..., m, m is number of files;
(5b) calculating covariance matrix, computational methods are as follows:
C o v = 1 / m &times; &Sigma; i = 1 m ( p ( i ) ) &times; ( p ( i ) ) T
Wherein p(i)For the theme feature vector after i-th document normalization, (p(i))TFor p(i)The transposed vector of vector;
(5c) singular value decomposition, it is thus achieved that U, S and V matrix, computational methods are as follows:
[U, S, V]=svd (Cov);
(5d) selecting suitable dimension g after dimensionality reduction according to s-matrix, computational methods are as follows:
1 - &Sigma; i = 1 g S i i / &Sigma; i = 1 m S i i < = 0.01
Wherein, minima g of g is takenmin, gminFor allowing the characteristic loss most suitable dimension g in [0,0.01];
(5e) front g the column vector of U matrix is chosen, it is thus achieved that the matrix U of m × greduce, pass through matrix UreduceAfter calculating dimensionality reduction Article characteristic vector, computational methods are:
z ( i ) = p ( i ) &times; U r e d u c e T
Wherein, z(i)For the article characteristic vector after dimensionality reduction,For matrix UreduceTransposed matrix.
The sorting technique of wechat public number based on LDA Subject Clustering the most according to claim 1, it is characterised in that institute Stating the Level-Panel algorithm in step S6 is Text Clustering Method based on vector space model, its concrete method step As follows:
According to described article-theme feature vector, given article set D={d1,d2,...,dm, wherein diIt is i-th article fall Article after dimension-theme feature vector,
(6a) every article in article set D is regarded as bunch C comprising single memberi={ di, wherein i=1,2 ..., m;
(6b) optionally one of them comprises bunch C of single memberkStarting point as cluster;
(6c) find in the sample not clustered and bunch CkThe distance point less than or equal to threshold threshold θ, i.e. similarity sim (Ck, Ci) any C of >=θi, by itself and CkMerge and form new bunch Ck=simCk∪Ci
(6d) step 6c is repeated, until the sample all not clustered and CkDistance be above threshold threshold θ, now gathered one Class;
(6e) step 6b is repeated until whole single member bunch CiIt is involved in cluster.
The sorting technique of wechat public number based on LDA Subject Clustering the most according to claim 1, it is characterised in that institute Stating threshold threshold θ and be set to θ=0.025, described number of topics K is set to K=100.
CN201610312725.3A 2016-05-11 2016-05-11 Classifying method of WeChat official accounts based on LDA topic clustering Pending CN106021388A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610312725.3A CN106021388A (en) 2016-05-11 2016-05-11 Classifying method of WeChat official accounts based on LDA topic clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610312725.3A CN106021388A (en) 2016-05-11 2016-05-11 Classifying method of WeChat official accounts based on LDA topic clustering

Publications (1)

Publication Number Publication Date
CN106021388A true CN106021388A (en) 2016-10-12

Family

ID=57100124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610312725.3A Pending CN106021388A (en) 2016-05-11 2016-05-11 Classifying method of WeChat official accounts based on LDA topic clustering

Country Status (1)

Country Link
CN (1) CN106021388A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446287A (en) * 2016-11-08 2017-02-22 北京邮电大学 Answer aggregation method and system facing crowdsourcing scene question-answering system
CN106844424A (en) * 2016-12-09 2017-06-13 宁波大学 A kind of file classification method based on LDA
CN107229614A (en) * 2017-06-29 2017-10-03 百度在线网络技术(北京)有限公司 Method and apparatus for grouped data
CN107832418A (en) * 2017-11-08 2018-03-23 郑州云海信息技术有限公司 A kind of much-talked-about topic finds method, system and a kind of much-talked-about topic discovering device
CN108062610A (en) * 2016-11-08 2018-05-22 北京国双科技有限公司 The analysis method and device of job relatedness
CN108108346A (en) * 2016-11-25 2018-06-01 广东亿迅科技有限公司 The theme feature word abstracting method and device of document
CN108108345A (en) * 2016-11-25 2018-06-01 上海掌门科技有限公司 For determining the method and apparatus of theme of news
CN109308607A (en) * 2018-09-17 2019-02-05 田歌 The method and device of book of final entry event
CN109614476A (en) * 2018-12-11 2019-04-12 平安科技(深圳)有限公司 Customer service system answering method, device, computer equipment and storage medium
CN110275745A (en) * 2019-04-11 2019-09-24 上海盛付通电子支付服务有限公司 A kind of method and apparatus of overhead account
CN111324729A (en) * 2020-02-18 2020-06-23 上海华鑫股份有限公司 Visual analysis method for financial public number data
CN111339251A (en) * 2020-02-25 2020-06-26 上海昌投网络科技有限公司 Method and device for detecting whether WeChat public number has sensitive words or not
CN111353019A (en) * 2020-02-25 2020-06-30 上海昌投网络科技有限公司 WeChat public number topic classification method and device
CN111738341A (en) * 2020-06-24 2020-10-02 佳都新太科技股份有限公司 Distributed large-scale face clustering method and device
CN111832289A (en) * 2020-07-13 2020-10-27 重庆大学 Service discovery method based on clustering and Gaussian LDA
CN112215288A (en) * 2020-10-13 2021-01-12 中国光大银行股份有限公司 Target enterprise category determination method and device, storage medium and electronic device
CN113055481A (en) * 2021-03-17 2021-06-29 杭州遥望网络科技有限公司 Message pushing method, device, equipment and computer readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942340A (en) * 2014-05-09 2014-07-23 电子科技大学 Microblog user interest recognizing method based on text mining
CN104408440A (en) * 2014-12-10 2015-03-11 重庆邮电大学 Identification method for human facial expression based on two-step dimensionality reduction and parallel feature fusion

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942340A (en) * 2014-05-09 2014-07-23 电子科技大学 Microblog user interest recognizing method based on text mining
CN104408440A (en) * 2014-12-10 2015-03-11 重庆邮电大学 Identification method for human facial expression based on two-step dimensionality reduction and parallel feature fusion

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHUNCHU RAMBABU ET AL: "EEG Signal with Feature Extraction using SVM and ICA Classifiers", 《INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS》 *
刘海旭: "基于PCA和LDA的文本分类系统设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
姚清耘等: "基于向量空间模型的文本聚类算法", 《计算机工程》 *
陆红艳: "基于奇异值分解与稀疏表示的人脸识别方法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446287A (en) * 2016-11-08 2017-02-22 北京邮电大学 Answer aggregation method and system facing crowdsourcing scene question-answering system
CN108062610A (en) * 2016-11-08 2018-05-22 北京国双科技有限公司 The analysis method and device of job relatedness
CN108108346A (en) * 2016-11-25 2018-06-01 广东亿迅科技有限公司 The theme feature word abstracting method and device of document
CN108108345A (en) * 2016-11-25 2018-06-01 上海掌门科技有限公司 For determining the method and apparatus of theme of news
CN106844424A (en) * 2016-12-09 2017-06-13 宁波大学 A kind of file classification method based on LDA
CN107229614A (en) * 2017-06-29 2017-10-03 百度在线网络技术(北京)有限公司 Method and apparatus for grouped data
CN107832418A (en) * 2017-11-08 2018-03-23 郑州云海信息技术有限公司 A kind of much-talked-about topic finds method, system and a kind of much-talked-about topic discovering device
CN109308607A (en) * 2018-09-17 2019-02-05 田歌 The method and device of book of final entry event
CN109614476A (en) * 2018-12-11 2019-04-12 平安科技(深圳)有限公司 Customer service system answering method, device, computer equipment and storage medium
CN110275745A (en) * 2019-04-11 2019-09-24 上海盛付通电子支付服务有限公司 A kind of method and apparatus of overhead account
CN111324729A (en) * 2020-02-18 2020-06-23 上海华鑫股份有限公司 Visual analysis method for financial public number data
CN111324729B (en) * 2020-02-18 2023-04-28 上海华鑫股份有限公司 Visual analysis method for financial public number data
CN111339251A (en) * 2020-02-25 2020-06-26 上海昌投网络科技有限公司 Method and device for detecting whether WeChat public number has sensitive words or not
CN111353019A (en) * 2020-02-25 2020-06-30 上海昌投网络科技有限公司 WeChat public number topic classification method and device
CN111738341A (en) * 2020-06-24 2020-10-02 佳都新太科技股份有限公司 Distributed large-scale face clustering method and device
CN111832289A (en) * 2020-07-13 2020-10-27 重庆大学 Service discovery method based on clustering and Gaussian LDA
CN111832289B (en) * 2020-07-13 2023-08-11 重庆大学 Service discovery method based on clustering and Gaussian LDA
CN112215288A (en) * 2020-10-13 2021-01-12 中国光大银行股份有限公司 Target enterprise category determination method and device, storage medium and electronic device
CN112215288B (en) * 2020-10-13 2024-04-30 中国光大银行股份有限公司 Method and device for determining category of target enterprise, storage medium and electronic device
CN113055481A (en) * 2021-03-17 2021-06-29 杭州遥望网络科技有限公司 Message pushing method, device, equipment and computer readable storage medium
CN113055481B (en) * 2021-03-17 2022-04-19 杭州遥望网络科技有限公司 Message pushing method, device, equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN106021388A (en) Classifying method of WeChat official accounts based on LDA topic clustering
CN101620596B (en) Multi-document auto-abstracting method facing to inquiry
Weng et al. Event detection in twitter
CN103886067B (en) Method for recommending books through label implied topic
CN106407484B (en) Video tag extraction method based on barrage semantic association
CN103207899B (en) Text recommends method and system
CN105787025B (en) Network platform public account classification method and device
CN103177024A (en) Method and device of topic information show
CN107862070B (en) Online classroom discussion short text instant grouping method and system based on text clustering
CN103186612B (en) A kind of method of classified vocabulary, system and implementation method
CN103812872B (en) A kind of network navy behavioral value method and system based on mixing Di Li Cray process
CN101980199A (en) Method and system for discovering network hot topic based on situation assessment
CN104239539A (en) Microblog information filtering method based on multi-information fusion
CN104484343A (en) Topic detection and tracking method for microblog
CN100511214C (en) Method and system for abstracting batch single document for document set
CN104317784A (en) Cross-platform user identification method and cross-platform user identification system
CN106168953A (en) Blog article towards weak relation social networks recommends method
CN106202053A (en) A kind of microblogging theme sentiment analysis method that social networks drives
CN103095849B (en) A method and a system of spervised web service finding based on attribution forecast and error correction of quality of service (QoS)
CN104573070A (en) Text clustering method special for mixed length text sets
CN103425686A (en) Information publishing method and device
Celikyilmaz et al. Leveraging web query logs to learn user intent via bayesian latent variable model
Sabbah et al. Hybrid support vector machine based feature selection method for text classification.
CN104572915A (en) User event relevance calculation method based on content environment enhancement
CN102360436A (en) Identification method for on-line handwritten Tibetan characters based on components

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20161012