CN106021388A - Classifying method of WeChat official accounts based on LDA topic clustering - Google Patents
Classifying method of WeChat official accounts based on LDA topic clustering Download PDFInfo
- Publication number
- CN106021388A CN106021388A CN201610312725.3A CN201610312725A CN106021388A CN 106021388 A CN106021388 A CN 106021388A CN 201610312725 A CN201610312725 A CN 201610312725A CN 106021388 A CN106021388 A CN 106021388A
- Authority
- CN
- China
- Prior art keywords
- article
- public number
- wechat public
- word
- feature vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a classifying method of WeChat official accounts based on LDA topic clustering. The method comprises the following steps: acquiring an article pushed by each active WeChat official account; performing word segmentation on each acquired article using a word segmentation tool, filtering stop words, counting the word frequency inverse document frequency of residual words; selecting the residual word with the word frequency inverse document frequency value greater than the threshold value as the feature word of the article; performing the hidden topic discovery on feature words in all active articles using an article topic generation model, constructing an article-topic feature vector, and reducing dimension of the article-topic feature vector using a principle component analysis method; clustering the dimension-reduced article-topic feature vector using a Level-Panel algorithm to acquire the cluster level and the articles in the cluster level; determining the type of the WeChat official account according to the cluster level information of the article pushed by the WeChat official account. Through the adoption of the method provided by the invention, the type of the WeChat official account can be accurately determined; an advertiser can conveniently select correct WeChat official account to advertise.
Description
Technical field
The present invention relates to text mining field, particularly relate to the classification of a kind of wechat public number based on LDA Subject Clustering
Method.
Background technology
Wechat public number was reached the standard grade in August, 2012, ended in August, 2015, and its quantity alreadys more than 10,000,000.The wechat public
Number being deeply infiltrated into Long Tail Market with the form of " little and beautiful ", many booking readers measure many and that amount of reading is big wechat public number then
Using advertisement or electricity business's cooperation as profit mode, wherein advertisement is one of topmost realization method.Little by little, wechat public number
The every field of people's daily life will be contained, including fields such as news, physical culture, education and automobiles.For advertiser
For, select the wechat public number of appropriate arts to throw in advertisement to reach profit maximization.
Wechat public number as the on-line off-line wechat interactive marketing mode of a kind of main flow, businessman can on wechat platform with
Special group realizes the comprehensive communication of word, picture, voice and video, interaction mode.Wherein, wechat public number includes clothes
Business number, subscription number and enterprise number, and subscribe to a number main form by pushing article and provide the user information, attract to use
The concern at family, and the article pushed is usually and adapts with its function.Just carried out by the content that subscription number is pushed article
Really divide the function of subscription number, can effectively help advertiser to select the suitably number of subscription to throw in advertisement.
In June, 2014, search dog search engine is proposed the search engine for wechat public number, makes user can pass through key
Article or search wechat public number searched in word, and can also subscribe to wechat public number on search dog platform simultaneously, realizes first
" outer net " of wechat public number is shown.
To sum up, owing to wechat public number is large number of and Covering domain is extensive so that advertiser is difficult to select one properly
Public number throw in advertisement.For wechat public number, the article that its function pushes with it is corresponding.Search dog is searched for
Engine accesses the data of wechat public number, and acquisition and analysis for data provide possibility.But, dividing in wechat public number
Apoplexy due to endogenous wind, there is presently no a kind of efficient and that accuracy rate is high sorting technique.
Summary of the invention
In order to overcome shortcoming that prior art exists with not enough, the present invention provides a kind of wechat based on LDA Subject Clustering
The sorting technique of public number, it is possible to more efficient and exactly wechat public number is classified.
For solving above-mentioned technical problem, the present invention provides following technical scheme: a kind of wechat based on LDA Subject Clustering is public
The sorting technique of many numbers, comprises the following steps:
S1. obtain, by each wechat public number of enlivening, the article that this wechat public number pushes;
S2. utilize participle instrument that each the article obtained is carried out word segmentation, filter stop words, the word of statistics residue word
The most anti-document frequency;
S3. the word frequency anti-document frequency value residue word more than the threshold threshold θ Feature Words as this article is chosen;
S4. select number of topics K, use document subject matter to generate the model Feature Words to all enlivening the article that public number pushes
Do latent subject to find, build article-theme feature vector;
S5. principal component analytical method is used, to article-theme feature vector dimensionality reduction;
S6. use Level-Panel algorithm, to the article after dimensionality reduction-theme feature vector clusters, obtain class bunch and class
Bunch interior article;
The class bunch information of the article S7. pushed according to wechat public number determines the classification of wechat public number.
Further, described wechat public number of enlivening refers to push certification and every month the quantity of article more than 3
Wechat public number.
Further, described participle instrument refers to participle instrument based on Chinese Academy of Sciences's Chinese Word Automatic Segmentation.
Further, described word frequency refers to that the frequency that a given word occurs in article, described anti-document frequency are
Metric form and the described anti-document frequency of one word general importance of finger are the inverse of document frequency.
Further, described article-theme feature vector form of expression isWherein i is i-th literary composition
Chapter,Being the probability of the kth theme of i-th article, n is the theme number.
Further, the detailed process of described step S5 is:
(5a) mean normalization, calculates the average value mu of each theme feature value according to whole articles-theme feature vectorj
And standard deviation sigmaj, wherein j=1,2 ..., n, n be the theme number, orderTo whole article-theme features
Vector is normalized, wherein i=1,2 ..., m, and m is number of files;
(5b) calculating covariance matrix, computational methods are as follows:
Wherein p(i)For the theme feature vector after i-th document normalization, (p(i))TFor p(i)The transposed vector of vector;
(5c) singular value decomposition, it is thus achieved that U, S and V matrix, computational methods are as follows:
[U, S, V]=svd (Cov);
(5d) selecting suitable dimension g after dimensionality reduction according to s-matrix, computational methods are as follows:
Wherein, minima g of g is takenmin, gminFor allowing the characteristic loss most suitable dimension g in [0,0.01];
(5e) front g the column vector of U matrix is chosen, it is thus achieved that the matrix U of m × greduce, pass through matrix UreduceCalculate fall
Article characteristic vector after dimension, computational methods are:
Wherein, z(i)For the article characteristic vector after dimensionality reduction,For matrix UreduceTransposed matrix.
Further, the Level-Panel algorithm in described step S6 is text cluster side based on vector space model
Method, its concrete method step is as follows:
According to described article-theme feature vector, given article set D={d1,d2,...,dm, wherein diIt it is i-th
Article after article dimensionality reduction-theme feature vector,
(6a) every article in article set D is regarded as bunch C comprising single memberi={ di, wherein i=1,
2、...、m;
(6b) optionally one of them comprises bunch C of single memberkStarting point as cluster;
(6c) find in the sample not clustered and bunch CkThe distance point less than or equal to threshold threshold θ, i.e. similarity sim
(Ck,Ci) any C of >=θi, by itself and CkMerge and form new bunch Ck=simCk∪Ci;
(6d) step 6c is repeated, until the sample all not clustered and CkDistance be above threshold threshold θ, now gathered
One class;
(6e) step 6b is repeated until whole single member bunch CiIt is involved in cluster.
Further, described threshold threshold θ is set to θ=0.025, and described number of topics K is set to K=100.
After using technique scheme, the present invention at least has the advantages that
1, the present invention is on the basis of word frequency-anti-document frequency (TF-IDF) feature of statistics wechat public number propelling movement article
On, TF-IDF value is filtered less than the word of threshold threshold, remains the principal character of article, it is to avoid secondary feature dry
Disturb.
2, use document subject matter to generate model (LDA) and the Feature Words in article is done latent subject discovery, obtain article master
Topic characteristic vector, describes the feature of article, reduces calculating cost simultaneously from semantic level.
3, use principal component analysis (PCA) to article theme feature vector dimensionality reduction, find the dependency relation between theme, from
And find more suitably number of topics.
4, using Level-Panel algorithm to the article theme feature vector clusters after dimensionality reduction, Level-Panel algorithm is
Model based on space vector, shows the highest superiority on text cluster.
5, the present invention can classify effectively and exactly to wechat public number, helps advertiser to select suitable wechat public
Throw in advertisement, there is good practicality for many numbers.
Accompanying drawing explanation
Fig. 1 is the flow chart of the sorting technique of present invention wechat based on LDA Subject Clustering public number;
Fig. 2 is the principal component analysis flow chart of the sorting technique of present invention wechat based on LDA Subject Clustering public number.
Fig. 3 is the Level-Panel algorithm flow of the sorting technique of present invention wechat based on LDA Subject Clustering public number
Figure.
Detailed description of the invention
It should be noted that in the case of not conflicting, the embodiment in the application and the feature in embodiment can phases
Combine mutually, with specific embodiment, the application is described in further detail below in conjunction with the accompanying drawings.
Embodiment
Fig. 1 is the sorting technique of the wechat public number based on LDA Subject Clustering disclosed in the present embodiment and each is corresponding
The flow chart of step.As it is shown in figure 1, said method comprising the steps of:
S1, by each enliven wechat public number obtain this wechat public number push article;
S2, utilize participle instrument that each article carry out word segmentation, filter stop words, add up the anti-document frequency of its word frequency
(TF-IDF);
S3, choose the TF-IDF value word more than threshold threshold θ (θ=0.025) as the Feature Words of this article;
S4, selection number of topics K (K=100), use document subject matter to generate model (LDA) and push all enlivening public number
The Feature Words of article do latent subject and find, build article-theme feature vector;
S5, employing principal component analysis (PCA) method are to article-theme feature vector dimensionality reduction;
S6, employing Level-Panel algorithm, to the article after dimensionality reduction-theme feature vector clusters, obtain class bunch and class
Bunch interior article;
S7, the class bunch information of article pushed according to wechat public number determine the classification of wechat public number.
Said method pushes word frequency-anti-document frequency (TF-IDF) feature of article by statistics wechat public number, builds
The word feature vector of every article, uses Gaussian function to word feature vector normalization;Document subject matter is used to generate model (LDA)
Feature Words in article does latent subject find, obtain the probability distribution of word-theme, build according to word-theme distribution probability
Article theme feature vector, describes the feature of article, reduces calculating cost simultaneously from semantic level;Use principal component analysis
(PCA) to article theme feature vector dimensionality reduction, find the dependency relation between theme, thus find more suitably number of topics;Adopt
With Level-Panel algorithm to the article theme feature vector clusters after dimensionality reduction;Class belonging to article is pushed according to wechat public number
Bunch information determine the classification of wechat public number.The present invention can accurately determine the classification of wechat public number, facilitates advertiser
Correct wechat public number is selected to throw in advertisement.
Wherein, described enliven wechat public number to refer to push certification and every month the article number wechat more than 3 public
Many numbers.
Further, described participle instrument refers to participle instrument (Ansj) based on Chinese Academy of Sciences's Chinese Word Automatic Segmentation.
Further, described word frequency refers to that the frequency that each word occurs in article, described anti-document frequency are certain words
The metric form of general importance, if certain word is high in the frequency occurring in other documents, the most anti-document frequency is low, the most instead
Document frequency is high, and described anti-document frequency is the inverse of document frequency.
Further, described article-theme feature vector form of expression isWherein i is i-th literary composition
Chapter,Being the probability of the kth theme of i-th article, n is the theme number.
Further, as in figure 2 it is shown, described step S5 uses principal component analysis (PCA) to article-theme feature vector fall
The detailed process of dimension is:
(5a) first step is mean normalization, calculates each theme feature value according to whole articles-theme feature vector
Average value mujAnd standard deviation sigmaj, wherein j=1,2 ..., n, n be the theme number, orderTo whole articles-
Theme feature vector is normalized, wherein i=1,2 ..., m, and m is number of files;
(5b) second step is to calculate covariance matrix Cov, and computational methods are as follows:
Wherein p(i)For the theme feature vector after i-th document normalization, (p(i))TFor p(i)The transposed vector of vector;
(5c) the 3rd step is singular value decomposition, it is thus achieved that U, S and V matrix, and computational methods are as follows:
[U, S, V]=svd (Cov);
(5d) the 4th step is to select suitable dimension g after dimensionality reduction according to s-matrix, and computational methods are as follows:
Take minima g of gmin, wherein gminFor allowing the characteristic loss most suitable dimension in the range of [0,0.01]
g;
(5e) the 5th step chooses front g the column vector of U matrix, it is thus achieved that the matrix U of m × greduce, pass through UreduceCalculate
Article characteristic vector after dimensionality reduction, computational methods are as follows:
WhereinFor UreduceThe transposed matrix of matrix.
Further, as it is shown on figure 3, described Level-Panel algorithm is text cluster side based on vector space model
Method, its concrete grammar is as follows:
According to described article-theme feature vector, given article set D={d1,d2,...,dm, wherein diIt it is i-th
Particular subject vector after article dimensionality reduction,
(6a) every article in D is regarded as bunch C comprising single memberi={ di, wherein i=1,2 ..., m;
(6b) optionally one of them comprises bunch C of single memberkStarting point as cluster;
(6c) find and C in the sample not clusteredkThe distance point less than threshold threshold θ, i.e. similarity sim (Ck,Ci)
Any C of >=θi, by itself and CkMerge and form new bunch Ck=simCk∪Ci;
(6d) step 6c is repeated, until the sample all not clustered and CkDistance be above threshold threshold θ, now gathered
One class;
(6e) step 6b is repeated until whole single member bunch CiIt is involved in cluster.
Wechat public number can be classified, have good availability by said method effectively exactly.
Although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, permissible
It is understood by, these embodiments can be carried out the change of multiple equivalence without departing from the principles and spirit of the present invention
Changing, revise, replace and modification, the scope of the present invention is limited by claims and equivalency range thereof.
Claims (8)
1. the sorting technique of a wechat public number based on LDA Subject Clustering, it is characterised in that described method includes following step
Rapid:
S1. obtain, by each wechat public number of enlivening, the article that this wechat public number pushes;
S2. utilizing participle instrument that each the article obtained is carried out word segmentation, filter stop words, the word frequency of statistics residue word is anti-
Document frequency;
S3. the word frequency anti-document frequency value residue word more than the threshold threshold θ Feature Words as this article is chosen;
S4. select number of topics K, use document subject matter to generate model and the Feature Words all enlivening the article that public number pushes is done hidden
Sexual Themes finds, builds article-theme feature vector;
S5. principal component analytical method is used, to article-theme feature vector dimensionality reduction;
S6. use Level-Panel algorithm, to the article after dimensionality reduction-theme feature vector clusters, obtain in class bunch and class bunch
Article;
The class bunch information of the article S7. pushed according to wechat public number determines the classification of wechat public number.
The sorting technique of wechat public number based on LDA Subject Clustering the most according to claim 1, it is characterised in that institute
State and enliven wechat public number and refer to push certification and every month the quantity of the article wechat public number more than 3.
The sorting technique of wechat public number based on LDA Subject Clustering the most according to claim 1, it is characterised in that institute
State participle instrument and refer to participle instrument based on Chinese Academy of Sciences's Chinese Word Automatic Segmentation.
The sorting technique of wechat public number based on LDA Subject Clustering the most according to claim 1, it is characterised in that institute
Predicate frequency refers to that the frequency that a given word occurs in article, described anti-document frequency refer to a word general importance
Metric form and described anti-document frequency are the inverse of document frequency.
The sorting technique of wechat public number based on LDA Subject Clustering the most according to claim 1, it is characterised in that institute
Stating article-theme feature vector form of expression isWherein i is i-th article,It is the of i-th article
The probability of k theme, n is the theme number.
The sorting technique of wechat public number based on LDA Subject Clustering the most according to claim 1, it is characterised in that institute
The detailed process stating step S5 is:
(5a) mean normalization, calculates the average value mu of each theme feature value according to whole articles-theme feature vectorjAnd standard
Difference σj, wherein j=1,2 ..., n, n be the theme number, orderWhole articles-theme feature vector is entered
Row normalization, wherein i=1,2 ..., m, m is number of files;
(5b) calculating covariance matrix, computational methods are as follows:
Wherein p(i)For the theme feature vector after i-th document normalization, (p(i))TFor p(i)The transposed vector of vector;
(5c) singular value decomposition, it is thus achieved that U, S and V matrix, computational methods are as follows:
[U, S, V]=svd (Cov);
(5d) selecting suitable dimension g after dimensionality reduction according to s-matrix, computational methods are as follows:
Wherein, minima g of g is takenmin, gminFor allowing the characteristic loss most suitable dimension g in [0,0.01];
(5e) front g the column vector of U matrix is chosen, it is thus achieved that the matrix U of m × greduce, pass through matrix UreduceAfter calculating dimensionality reduction
Article characteristic vector, computational methods are:
Wherein, z(i)For the article characteristic vector after dimensionality reduction,For matrix UreduceTransposed matrix.
The sorting technique of wechat public number based on LDA Subject Clustering the most according to claim 1, it is characterised in that institute
Stating the Level-Panel algorithm in step S6 is Text Clustering Method based on vector space model, its concrete method step
As follows:
According to described article-theme feature vector, given article set D={d1,d2,...,dm, wherein diIt is i-th article fall
Article after dimension-theme feature vector,
(6a) every article in article set D is regarded as bunch C comprising single memberi={ di, wherein i=1,2 ..., m;
(6b) optionally one of them comprises bunch C of single memberkStarting point as cluster;
(6c) find in the sample not clustered and bunch CkThe distance point less than or equal to threshold threshold θ, i.e. similarity sim (Ck,
Ci) any C of >=θi, by itself and CkMerge and form new bunch Ck=simCk∪Ci;
(6d) step 6c is repeated, until the sample all not clustered and CkDistance be above threshold threshold θ, now gathered one
Class;
(6e) step 6b is repeated until whole single member bunch CiIt is involved in cluster.
The sorting technique of wechat public number based on LDA Subject Clustering the most according to claim 1, it is characterised in that institute
Stating threshold threshold θ and be set to θ=0.025, described number of topics K is set to K=100.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610312725.3A CN106021388A (en) | 2016-05-11 | 2016-05-11 | Classifying method of WeChat official accounts based on LDA topic clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610312725.3A CN106021388A (en) | 2016-05-11 | 2016-05-11 | Classifying method of WeChat official accounts based on LDA topic clustering |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106021388A true CN106021388A (en) | 2016-10-12 |
Family
ID=57100124
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610312725.3A Pending CN106021388A (en) | 2016-05-11 | 2016-05-11 | Classifying method of WeChat official accounts based on LDA topic clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106021388A (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106446287A (en) * | 2016-11-08 | 2017-02-22 | 北京邮电大学 | Answer aggregation method and system facing crowdsourcing scene question-answering system |
CN106844424A (en) * | 2016-12-09 | 2017-06-13 | 宁波大学 | A kind of file classification method based on LDA |
CN107229614A (en) * | 2017-06-29 | 2017-10-03 | 百度在线网络技术(北京)有限公司 | Method and apparatus for grouped data |
CN107832418A (en) * | 2017-11-08 | 2018-03-23 | 郑州云海信息技术有限公司 | A kind of much-talked-about topic finds method, system and a kind of much-talked-about topic discovering device |
CN108062610A (en) * | 2016-11-08 | 2018-05-22 | 北京国双科技有限公司 | The analysis method and device of job relatedness |
CN108108346A (en) * | 2016-11-25 | 2018-06-01 | 广东亿迅科技有限公司 | The theme feature word abstracting method and device of document |
CN108108345A (en) * | 2016-11-25 | 2018-06-01 | 上海掌门科技有限公司 | For determining the method and apparatus of theme of news |
CN109308607A (en) * | 2018-09-17 | 2019-02-05 | 田歌 | The method and device of book of final entry event |
CN109614476A (en) * | 2018-12-11 | 2019-04-12 | 平安科技(深圳)有限公司 | Customer service system answering method, device, computer equipment and storage medium |
CN110275745A (en) * | 2019-04-11 | 2019-09-24 | 上海盛付通电子支付服务有限公司 | A kind of method and apparatus of overhead account |
CN111324729A (en) * | 2020-02-18 | 2020-06-23 | 上海华鑫股份有限公司 | Visual analysis method for financial public number data |
CN111339251A (en) * | 2020-02-25 | 2020-06-26 | 上海昌投网络科技有限公司 | Method and device for detecting whether WeChat public number has sensitive words or not |
CN111353019A (en) * | 2020-02-25 | 2020-06-30 | 上海昌投网络科技有限公司 | WeChat public number topic classification method and device |
CN111738341A (en) * | 2020-06-24 | 2020-10-02 | 佳都新太科技股份有限公司 | Distributed large-scale face clustering method and device |
CN111832289A (en) * | 2020-07-13 | 2020-10-27 | 重庆大学 | Service discovery method based on clustering and Gaussian LDA |
CN112215288A (en) * | 2020-10-13 | 2021-01-12 | 中国光大银行股份有限公司 | Target enterprise category determination method and device, storage medium and electronic device |
CN113055481A (en) * | 2021-03-17 | 2021-06-29 | 杭州遥望网络科技有限公司 | Message pushing method, device, equipment and computer readable storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103942340A (en) * | 2014-05-09 | 2014-07-23 | 电子科技大学 | Microblog user interest recognizing method based on text mining |
CN104408440A (en) * | 2014-12-10 | 2015-03-11 | 重庆邮电大学 | Identification method for human facial expression based on two-step dimensionality reduction and parallel feature fusion |
-
2016
- 2016-05-11 CN CN201610312725.3A patent/CN106021388A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103942340A (en) * | 2014-05-09 | 2014-07-23 | 电子科技大学 | Microblog user interest recognizing method based on text mining |
CN104408440A (en) * | 2014-12-10 | 2015-03-11 | 重庆邮电大学 | Identification method for human facial expression based on two-step dimensionality reduction and parallel feature fusion |
Non-Patent Citations (4)
Title |
---|
CHUNCHU RAMBABU ET AL: "EEG Signal with Feature Extraction using SVM and ICA Classifiers", 《INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS》 * |
刘海旭: "基于PCA和LDA的文本分类系统设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
姚清耘等: "基于向量空间模型的文本聚类算法", 《计算机工程》 * |
陆红艳: "基于奇异值分解与稀疏表示的人脸识别方法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106446287A (en) * | 2016-11-08 | 2017-02-22 | 北京邮电大学 | Answer aggregation method and system facing crowdsourcing scene question-answering system |
CN108062610A (en) * | 2016-11-08 | 2018-05-22 | 北京国双科技有限公司 | The analysis method and device of job relatedness |
CN108108346A (en) * | 2016-11-25 | 2018-06-01 | 广东亿迅科技有限公司 | The theme feature word abstracting method and device of document |
CN108108345A (en) * | 2016-11-25 | 2018-06-01 | 上海掌门科技有限公司 | For determining the method and apparatus of theme of news |
CN106844424A (en) * | 2016-12-09 | 2017-06-13 | 宁波大学 | A kind of file classification method based on LDA |
CN107229614A (en) * | 2017-06-29 | 2017-10-03 | 百度在线网络技术(北京)有限公司 | Method and apparatus for grouped data |
CN107832418A (en) * | 2017-11-08 | 2018-03-23 | 郑州云海信息技术有限公司 | A kind of much-talked-about topic finds method, system and a kind of much-talked-about topic discovering device |
CN109308607A (en) * | 2018-09-17 | 2019-02-05 | 田歌 | The method and device of book of final entry event |
CN109614476A (en) * | 2018-12-11 | 2019-04-12 | 平安科技(深圳)有限公司 | Customer service system answering method, device, computer equipment and storage medium |
CN110275745A (en) * | 2019-04-11 | 2019-09-24 | 上海盛付通电子支付服务有限公司 | A kind of method and apparatus of overhead account |
CN111324729A (en) * | 2020-02-18 | 2020-06-23 | 上海华鑫股份有限公司 | Visual analysis method for financial public number data |
CN111324729B (en) * | 2020-02-18 | 2023-04-28 | 上海华鑫股份有限公司 | Visual analysis method for financial public number data |
CN111339251A (en) * | 2020-02-25 | 2020-06-26 | 上海昌投网络科技有限公司 | Method and device for detecting whether WeChat public number has sensitive words or not |
CN111353019A (en) * | 2020-02-25 | 2020-06-30 | 上海昌投网络科技有限公司 | WeChat public number topic classification method and device |
CN111738341A (en) * | 2020-06-24 | 2020-10-02 | 佳都新太科技股份有限公司 | Distributed large-scale face clustering method and device |
CN111832289A (en) * | 2020-07-13 | 2020-10-27 | 重庆大学 | Service discovery method based on clustering and Gaussian LDA |
CN111832289B (en) * | 2020-07-13 | 2023-08-11 | 重庆大学 | Service discovery method based on clustering and Gaussian LDA |
CN112215288A (en) * | 2020-10-13 | 2021-01-12 | 中国光大银行股份有限公司 | Target enterprise category determination method and device, storage medium and electronic device |
CN112215288B (en) * | 2020-10-13 | 2024-04-30 | 中国光大银行股份有限公司 | Method and device for determining category of target enterprise, storage medium and electronic device |
CN113055481A (en) * | 2021-03-17 | 2021-06-29 | 杭州遥望网络科技有限公司 | Message pushing method, device, equipment and computer readable storage medium |
CN113055481B (en) * | 2021-03-17 | 2022-04-19 | 杭州遥望网络科技有限公司 | Message pushing method, device, equipment and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106021388A (en) | Classifying method of WeChat official accounts based on LDA topic clustering | |
CN101620596B (en) | Multi-document auto-abstracting method facing to inquiry | |
Weng et al. | Event detection in twitter | |
CN103886067B (en) | Method for recommending books through label implied topic | |
CN106407484B (en) | Video tag extraction method based on barrage semantic association | |
CN103207899B (en) | Text recommends method and system | |
CN105787025B (en) | Network platform public account classification method and device | |
CN103177024A (en) | Method and device of topic information show | |
CN107862070B (en) | Online classroom discussion short text instant grouping method and system based on text clustering | |
CN103186612B (en) | A kind of method of classified vocabulary, system and implementation method | |
CN103812872B (en) | A kind of network navy behavioral value method and system based on mixing Di Li Cray process | |
CN101980199A (en) | Method and system for discovering network hot topic based on situation assessment | |
CN104239539A (en) | Microblog information filtering method based on multi-information fusion | |
CN104484343A (en) | Topic detection and tracking method for microblog | |
CN100511214C (en) | Method and system for abstracting batch single document for document set | |
CN104317784A (en) | Cross-platform user identification method and cross-platform user identification system | |
CN106168953A (en) | Blog article towards weak relation social networks recommends method | |
CN106202053A (en) | A kind of microblogging theme sentiment analysis method that social networks drives | |
CN103095849B (en) | A method and a system of spervised web service finding based on attribution forecast and error correction of quality of service (QoS) | |
CN104573070A (en) | Text clustering method special for mixed length text sets | |
CN103425686A (en) | Information publishing method and device | |
Celikyilmaz et al. | Leveraging web query logs to learn user intent via bayesian latent variable model | |
Sabbah et al. | Hybrid support vector machine based feature selection method for text classification. | |
CN104572915A (en) | User event relevance calculation method based on content environment enhancement | |
CN102360436A (en) | Identification method for on-line handwritten Tibetan characters based on components |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161012 |