CN106599181B - A kind of hot news detection method based on topic model - Google Patents

A kind of hot news detection method based on topic model Download PDF

Info

Publication number
CN106599181B
CN106599181B CN201611145855.9A CN201611145855A CN106599181B CN 106599181 B CN106599181 B CN 106599181B CN 201611145855 A CN201611145855 A CN 201611145855A CN 106599181 B CN106599181 B CN 106599181B
Authority
CN
China
Prior art keywords
theme
word
article
similarity
temperature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611145855.9A
Other languages
Chinese (zh)
Other versions
CN106599181A (en
Inventor
庄郭冕
黄乔
彭志宇
付晗
王忆诗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Insigma Hengtian Software Ltd
Original Assignee
Insigma Hengtian Software Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Insigma Hengtian Software Ltd filed Critical Insigma Hengtian Software Ltd
Priority to CN201611145855.9A priority Critical patent/CN106599181B/en
Publication of CN106599181A publication Critical patent/CN106599181A/en
Application granted granted Critical
Publication of CN106599181B publication Critical patent/CN106599181B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The hot news detection method based on topic model that the invention discloses a kind of, news stream is crawled by web crawlers orientation, article is segmented first, remove the pretreatment such as stop words and meaningless character string, then feature extraction is carried out to pretreated article, construct text model, then the high text of similarity degree is added in most like classification by Text Clustering Algorithm, obtain theme library, then similarity calculation is carried out to the old and new's theme, the old and new theme high for similarity merges, finally carry out the calculating of theme temperature, most hot theme is selected by sequence.The present invention innovatively applies LDA algorithm in hot spot motif discovery, and propose fulminant concept, it can timely and effectively find newest hot news, theme temperature decaying concept is proposed simultaneously, tracking theme temperature can be recorded in real time, the development and change of hot news have been truly reflected, have been of great significance for the tracking displaying of hot news.

Description

A kind of hot news detection method based on topic model
Technical field
The hot news detection method based on topic model that the present invention provides a kind of, is related to web crawlers, clustering, The core technologies such as Text similarity computing and algorithm, timely and effectively detect hot news, and tracking hot news develops.
Background technique
With the development of internet technology, the massive information epoch have arrived, and various information is full of in internet, but only A small number of news can create much of a stir, i.e., so-called top news, hot news, and timely hot news discovery can help people real When pay close attention to social state.
On the other hand, the outburst of a hot news is not to die at a flash, is usually associated with a hair of flowing rhythm Exhibition process, and cause the generation of other potential problems, so the development process of tracking hot news has research social concern Significance.
The development of internet, the rise of big data, internet are flooded with bulk information, send out in these low-quality information Existing hot news becomes of crucial importance.
Summary of the invention
It is an object of the invention to provide a kind of based on web crawlers, cluster point for many and diverse of nowadays internet information The hot news detection method of analysis and topic model.
The purpose of the present invention is achieved through the following technical solutions: a kind of hot news detection based on topic model Method crawls news stream by web crawlers orientation, segments first to article, remove stop words and meaningless character string etc. Pretreatment then carries out feature extraction to pretreated article, constructs text model, then will by Text Clustering Algorithm The high text of similarity degree is added in most like classification, obtains topic library, then carries out similarity meter to new familiar topic Calculate, the new familiar topic high for similarity merge, finally carry out the calculating of topic temperature, by sequence select it is most hot if Topic.Specifically includes the following steps:
(1) it is oriented by the way of web crawlers and crawls news stream, every arrival N new articles carry out a batch processing, right Crawl data progress data cleansing, article segments to obtain pretreated article;
(2) it constructs vector space model: passing through pretreatment operation, original document can be regarded as to be made of a pile word , if document being regarded as a vector, each word is exactly one-dimensional characteristic, by transform a document to Amount, text data just become the structural data that can be subsequently can by computer, and the Similarity Problem between two documents just converts For the Similarity Problem between two vectors;When calculating document vector per one-dimensional weight, calculated using improved B-TFIDF Method, algorithmic formula are as follows:
W represents word in formula (1), and A indicates that the article number in new article comprising word w, B indicate not including in new article The article number of word w, C indicate that the article number in history article comprising word w, D indicate the text for not including word w in history article Chapter number, d in formula (2)iIndicate i-th new article, N indicates that new article sum, tf (d, w) indicate word of the word w in article d Frequently, df (w) indicates the article number comprising word w;B-TFIDF algorithm takes into account the burst of word, burst i.e. one Word is a large amount of suddenly in a short time to be occurred;The weight for constituting each word of document is calculated by algorithm above, and then generates text The vector space model D of chapteri=(weight (di,w1),weight(di,w2),weight(di,w3)…..weight(di, wn)), wherein n is total word number;
(3) article clusters: passing through step 2, text is represented as the form of vector, clusters to text vector;Using LDA topic model clustering algorithm, specifically:
LDA cluster process: LDA is three layers of bayesian probability model, comprising word, theme and three layers of document, by one Such a process is regarded in the generation of article as: selecting some theme with certain probability, and with certain probability in this theme Some word is selected, document to theme obeys multinomial distribution, and theme to word obeys multinomial distribution, clusters to obtain by LDA " main Topic-word " probability matrix phi and " document-theme " probability matrix theta, according to " document-theme " probability matrix theta It obtains m theme and m theme corresponds to the probability of N articles, every a line i of theta represents an article, and each column j is represented One theme, homography value thetaijIt is the probability that article i belongs to theme j;Setting screening threshold value is thresholdT, if thetaij> thresholdT then thinks that article i belongs to theme j, thus selects the corresponding article of each theme;
The determination of LDA cluster number m: it is that N/10-N/5 repeats LDA clustering algorithm that cluster number, which is respectively set, then Similarity between the theme of implementing result each time is calculated, the corresponding theme of implementing result that similarity is minimum between theme is selected Number;The calculating of similarity is represented according to every a line j of LDA " theme-word " the probability matrix phi, phi clustered between theme One theme Tj, each column k represents a word wk, phijkRepresent theme TjInclude word wkProbability;A line of Phi can be with Regard theme T asjVector form Tj=(w1,w2,w3,…wk…wn), n is total word number;The similarity of theme between any two is calculated, Similarity average value is sought, is minimized as similarity between final theme;The calculating of similarity uses the meter of cosine similarity Calculation method, calculation formula are as follows:
T in formula (3)iAnd TjRepresent two themes, ωk(Ti) represent theme TiValue on dimension k, n indicate total word Number;
(4) subject key words are extracted: being extracted keyword from the topic of articles all under theme, first carried out to title of article Participle, filters out stop-word, it is not intended to which adopted word and punctuation mark, remaining word is as subject key words;
(5) topic merges: m theme article corresponding with its is obtained by step 3, next by m new themes and old master Topic merges, and calculates similarity f1 between theme, thinks that two themes are similar if f1 > 0.5, and merges two themes;Theme Between similarity f1 calculation formula it is as follows:
F1=2*vectorSim*keywordSim/ (vectorSim+keywordSim) (4)
It is similar that vectorSim in formula (4) represents the calculating theme cosine using all words that theme includes as dimension Degree, keywordSim, which is represented, calculates theme cosine similarity by dimension of subject key words, and the calculation formula of cosine similarity is same Formula (3);
(6) temperature calculates: obtaining final all themes by step 5, next calculates theme temperature h, filter out heat It spends high theme, removes that temperature is low, i.e., out-of-date theme;According to the feature of hot spot theme news concentration class s high, temperature calculates public Formula is as follows:
ht=∑ sim (di,t)(5)
D in formula (5)iIndicate the article that theme T includes, the temperature h of theme TtIt is similar to theme equal to article under theme The sum of degree, the same formula of sim function (3);
As time go on, the temperature of a theme can constantly decay, until being rejected lower than the threshold value theme;Temperature Decaying, in each batch process, if there is new article to arrive below theme T, the temperature h of theme TtIt can increase accordingly Add, ht=ht* Up, if not new article is added to theme T, temperature htIt can decay, ht=ht* Down, wherein Up > 1, Down < 1.
The beneficial effects of the present invention are: the present invention innovatively applies LDA algorithm in hot spot motif discovery, and propose Fulminant concept, can timely and effectively find newest hot news, while propose theme temperature decaying concept, can The theme temperature of record tracking in real time, has been truly reflected the development and change of hot news, shows tool for the tracking of hot news Have very important significance.
Detailed description of the invention
Fig. 1 is the hot news testing process schematic diagram based on topic model;
Fig. 2 is article modeling process schematic diagram;
Fig. 3 is LDA cluster process schematic diagram;
Fig. 4 is that new old theme merges schematic diagram;
Fig. 5 is that theme temperature calculates schematic diagram.
Specific embodiment
Invention is further described in detail in the following with reference to the drawings and specific embodiments.
As shown in Figure 1, a kind of hot news detection method based on topic model proposed by the present invention, including following step It is rapid:
(1) it is oriented by the way of web crawlers and crawls news stream, every arrival N articles carry out a batch processing, to climbing Access segments to obtain pretreated article according to progress data cleansing, article;
(2) construct vector space model: as shown in Fig. 2, by pretreatment operation, original document can be regarded as by a pile What word was constituted, if document being regarded as a vector, each word is exactly one-dimensional characteristic, by turning document Vector is turned to, text data just becomes the structural data that can be subsequently can by computer, the Similarity Problem between two documents The Similarity Problem between two vectors is translated into.When calculating document vector per one-dimensional weight, use improved B-TFIDF algorithm, algorithmic formula are as follows:
W represents word in formula 1, and A indicates that the article number in new article comprising word w, B indicate not including in new article single The article number of word w, C indicate that the article number in history article comprising word w, D indicate the article for not including word w in history article It counts, d in formula 2iIndicating i-th new article, N indicates that new article sum, tf (d, w) indicate word frequency of the word w in article d, Df (w) indicates the article number comprising word w.The algorithm takes into account the burst of word, and burst is a word short It is a large amount of suddenly in phase to occur.The weight for constituting each word of document is calculated by algorithm above, and then generates the vector of article Spatial model Di=(weight (di,w1),weight(di,w2),weight(di,w3)…..weight(di,wn)), wherein n is Total word number.
(3) article clusters: passing through step 2, text is represented as the form of vector, clusters to text vector;Such as Fig. 3 It is shown, LDA topic model clustering algorithm is employed herein, LDA is three layers of bayesian probability model, comprising word, theme and Three layers of document, regard the generation of an article as such a process: some theme being selected with certain probability, and in this master Some word is selected with certain probability in topic, document to theme obeys multinomial distribution, and theme to word is obeyed multinomial distribution, passed through LDA clustering obtains " theme-word " probability matrix and " document-theme " probability matrix, detailed process see below description.
LDA cluster process: LDA is three layers of bayesian probability model, clusters available " theme-word by LDA Language " probability matrix phi and " document-theme " probability matrix theta obtains m according to " document-theme " probability matrix theta A theme and m theme correspond to the probability of N articles, and every a line i of theta represents an article, and each column j represents a master Topic, homography value thetaijIt is the probability that article i belongs to theme j.Setting screening threshold value is thresholdT (preferred value 0.32), if thetaij> thresholdT then thinks that article i belongs to theme j, thus selects the corresponding article of each theme.
The determination of LDA cluster number m: since N article cluster numbers relatively meet reality between N/10 to N/5 (for example, cluster number relatively meets reality between 15 to 30 when new article sum N is 150), so being respectively set poly- Class number is that N/10-N/5 repeats LDA clustering algorithm, then calculates similarity between the theme of implementing result each time, selects The corresponding theme number of the minimum implementing result of similarity between theme.The calculating needs of similarity are clustered according to LDA between theme Every a line j of " theme-word " probability matrix phi, phi that arrive represent a theme Tj, each column k represents a word wk, phijkRepresent theme TjInclude word wkProbability.A line of Phi can regard theme T asjVector form Tj=(w1,w2, w3,…wk…wn), n is total word number.The similarity of theme between any two is calculated, similarity average value is sought, is minimized as most Similarity between whole theme.The calculating of similarity uses the calculation method of cosine similarity, and calculation formula is as follows:
T in formula 3iAnd TjRepresent two themes, ωk(Ti) represent theme TiValue on dimension k, n indicate total word number.
(4) subject key words are extracted: being extracted keyword from the topic of articles all under theme, first carried out to title of article Participle, filters out stop-word, it is not intended to which adopted word and punctuation mark, remaining word is as subject key words.
(5) topic merges: m theme article corresponding with its is obtained by step 3, next by m new themes and old master Topic merges, as shown in figure 4, calculating similarity f1 between theme, thinks that two themes are similar if f1 > 0.5, and merges two Theme.Similarity f1 calculation formula is as follows between theme:
F1=2*vectorSim*keywordSim/ (vectorSim+keywordSim) (4)
VectorSim in formula 4, which is represented, calculates theme cosine similarity using all words that theme includes as dimension, KeywordSim, which is represented, calculates theme cosine similarity, the same formula of the calculation formula of cosine similarity by dimension of subject key words 3。
(6) temperature calculates: as shown in figure 5, obtaining final all themes by step 5, next calculating theme temperature H filters out the high theme of temperature, removes that temperature is low, i.e., out-of-date theme.According to the feature of hot spot theme news concentration class s high, Temperature calculation formula is as follows:
ht=∑ sim (di,t)(5)
Di in formula 5 indicates the article that theme T includes, the temperature h of theme TtIt is similar to theme equal to article under theme The sum of degree, sim function is the same as formula 3.
As time go on, the temperature of a theme can constantly decay, until being rejected lower than the threshold value theme.Temperature Decaying, in each batch process, if there is new article to arrive below theme T, the temperature h of theme TtIt can increase accordingly Add, ht=ht* Up (preferred value 1.05), if not new article is added to theme T, temperature htIt can decay, ht=ht* Down (preferred value 0.9).

Claims (1)

1. a kind of hot news detection method based on topic model, which comprises the following steps:
(1) it is oriented by the way of web crawlers and crawls news stream, every arrival N new articles carry out a batch processing, to crawling Data carry out data cleansing, article segments to obtain pretreated article;
(2) it constructs vector space model: passing through pretreatment operation, original document, which can be regarded as, to be made of a pile word, such as If document is regarded as a vector by fruit, then each word is exactly one-dimensional characteristic, by transforming a document to vector, text Data just become the structural data that can be subsequently can by computer, and the Similarity Problem between two documents has translated into two Similarity Problem between vector;When calculating document vector per one-dimensional weight, using improved B-TFIDF algorithm, algorithm Formula is as follows:
W represents word in formula (1), and A indicates that the article number in new article comprising word w, B indicate not including word in new article The article number of w, C indicate that the article number in history article comprising word w, D indicate the article for not including word w in history article It counts, d in formula (2)iIndicate i-th new article, N indicates that new article sum, tf (d, w) indicate word of the word w in article d Frequently, df (w) indicates the article number comprising word w;B-TFIDF algorithm takes into account the burst of word, burst i.e. one Word is a large amount of suddenly in a short time to be occurred;The weight for constituting each word of document is calculated by algorithm above, and then generates text The vector space model D of chapteri=(weight (di,w1),weight(di,w2),weight(di,w3)…..weight(di, wn)), wherein n is total word number;
(3) article clusters: passing through step 2, text is represented as the form of vector, clusters to text vector;Using LDA master Model tying algorithm is inscribed, specifically:
LDA cluster process: LDA is three layers of bayesian probability model, comprising word, theme and three layers of document, by an article Generation regard such a process as: some theme is selected with certain probability, and selected with certain probability in this theme Some word, document to theme obey multinomial distribution, and theme to word obeys multinomial distribution, clusters to obtain " theme-by LDA Word " probability matrix phi and " document-theme " probability matrix theta, obtains according to " document-theme " probability matrix theta M theme and m theme correspond to the probability of N articles, and every a line i of theta represents an article, and each column j represents one Theme, homography value thetaijIt is the probability that article i belongs to theme j;Setting screening threshold value is thresholdT, if thetaij> thresholdT then thinks that article i belongs to theme j, thus selects the corresponding article of each theme;
The determination of LDA cluster number m: it is that N/10-N/5 repeats LDA clustering algorithm that cluster number, which is respectively set, is then calculated Similarity between the theme of implementing result each time selects the minimum corresponding theme number of implementing result of similarity between theme;It is main Every a line j of the calculating of similarity is clustered according to LDA between topic " theme-word " probability matrix phi, phi represent one Theme Tj, each column k represents a word wk, phijkRepresent theme TjInclude word wkProbability;A line of Phi can be regarded as Theme TjVector form
Tj=(w1,w2,w3,…wk…wn), n is total word number;The similarity of theme between any two is calculated, similarity average value is sought, It is minimized as similarity between final theme;The calculating of similarity uses the calculation method of cosine similarity, calculation formula It is as follows:
T in formula (3)iAnd TjRepresent two themes, ωk(Ti) represent theme TiValue on dimension k, n indicate total word number;
(4) subject key words are extracted: being extracted keyword from the topic of articles all under theme, first divided title of article Word filters out stop-word, it is not intended to which adopted word and punctuation mark, remaining word is as subject key words;
(5) topic merges: m theme article corresponding with its is obtained by step 3, next by a newly themes of m and old theme into Row merges, and calculates similarity f1 between theme, thinks that two themes are similar if f1 > 0.5, and merges two themes;Phase between theme It is as follows like degree f1 calculation formula:
F1=2*vectorSim*keywordSim/ (vectorSim+keywordSim) (4)
VectorSim in formula (4), which is represented, calculates theme cosine similarity using all words that theme includes as dimension, KeywordSim, which is represented, calculates theme cosine similarity, the same formula of the calculation formula of cosine similarity by dimension of subject key words (3);
(6) temperature calculates: obtaining final all themes by step 5, next calculates theme temperature h, filter out temperature height Theme, remove that temperature is low, i.e., out-of-date theme;According to the feature of hot spot theme news concentration class s high, temperature calculation formula is such as Under:
ht=∑ sim (di,t) (5)
D in formula (5)iIndicate the article that theme T includes, the temperature h of theme TtEqual to article under theme and Topic Similarity With the same formula of sim function (3);
As time go on, the temperature of a theme can constantly decay, until being rejected lower than the threshold value theme;Temperature declines Subtract, in each batch process, if there is new article to arrive below theme T, the temperature h of theme TtIt can increase accordingly Add, ht=ht* Up, if not new article is added to theme T, temperature htIt can decay, ht=ht* Down, wherein Up > 1, Down < 1.
CN201611145855.9A 2016-12-13 2016-12-13 A kind of hot news detection method based on topic model Active CN106599181B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611145855.9A CN106599181B (en) 2016-12-13 2016-12-13 A kind of hot news detection method based on topic model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611145855.9A CN106599181B (en) 2016-12-13 2016-12-13 A kind of hot news detection method based on topic model

Publications (2)

Publication Number Publication Date
CN106599181A CN106599181A (en) 2017-04-26
CN106599181B true CN106599181B (en) 2019-06-18

Family

ID=58802054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611145855.9A Active CN106599181B (en) 2016-12-13 2016-12-13 A kind of hot news detection method based on topic model

Country Status (1)

Country Link
CN (1) CN106599181B (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423337A (en) * 2017-04-27 2017-12-01 天津大学 News topic detection method based on LDA Fusion Models and multi-level clustering
CN107239497B (en) * 2017-05-02 2020-11-03 广东万丈金数信息技术股份有限公司 Hot content search method and system
CN107203632B (en) * 2017-06-01 2019-08-16 中国人民解放军国防科学技术大学 Topic Popularity prediction method based on similarity relation and cooccurrence relation
CN107330049B (en) * 2017-06-28 2020-05-22 北京搜狐新媒体信息技术有限公司 News popularity estimation method and system
CN107835113B (en) * 2017-07-05 2020-09-08 中山大学 Method for detecting abnormal user in social network based on network mapping
CN107451224A (en) * 2017-07-17 2017-12-08 广州特道信息科技有限公司 A kind of clustering method and system based on big data parallel computation
CN107563725B (en) * 2017-08-25 2021-04-06 浙江网新恒天软件有限公司 Recruitment system for optimizing fussy talent recruitment process
CN107656919B (en) * 2017-09-12 2018-10-26 中国软件与技术服务股份有限公司 A kind of optimal L DA Automatic Model Selection methods based on minimum average B configuration similarity between theme
CN107918644B (en) * 2017-10-31 2020-12-08 北京锐思爱特咨询股份有限公司 News topic analysis method and implementation system in reputation management framework
CN107832418A (en) * 2017-11-08 2018-03-23 郑州云海信息技术有限公司 A kind of much-talked-about topic finds method, system and a kind of much-talked-about topic discovering device
CN107992542A (en) * 2017-11-27 2018-05-04 中山大学 A kind of similar article based on topic model recommends method
CN108153818B (en) * 2017-11-29 2021-08-10 成都东方盛行电子有限责任公司 Big data based clustering method
CN107784127A (en) * 2017-11-30 2018-03-09 杭州数梦工场科技有限公司 A kind of focus localization method and device
CN107862089B (en) * 2017-12-02 2020-03-13 北京工业大学 Label extraction method based on perception data
CN108090157B (en) * 2017-12-12 2018-11-06 百度在线网络技术(北京)有限公司 A kind of hot news method for digging, device and server
CN110888978A (en) * 2018-09-06 2020-03-17 北京京东金融科技控股有限公司 Article clustering method and device, electronic equipment and storage medium
CN110096649B (en) * 2019-05-14 2021-07-30 武汉斗鱼网络科技有限公司 Post extraction method, device, equipment and storage medium
CN110532388B (en) * 2019-08-15 2022-07-01 企查查科技有限公司 Text clustering method, equipment and storage medium
CN110609938A (en) * 2019-08-15 2019-12-24 平安科技(深圳)有限公司 Text hotspot discovery method and device and computer-readable storage medium
CN113127611B (en) * 2019-12-31 2024-05-14 北京中关村科金技术有限公司 Method, device and storage medium for processing question corpus
CN111343467B (en) * 2020-02-10 2021-10-26 腾讯科技(深圳)有限公司 Live broadcast data processing method and device, electronic equipment and storage medium
CN112100372B (en) * 2020-08-20 2022-08-30 西南电子技术研究所(中国电子科技集团公司第十研究所) Head news prediction classification method
US11436287B2 (en) 2020-12-07 2022-09-06 International Business Machines Corporation Computerized grouping of news articles by activity and associated phase of focus
CN112612889B (en) * 2020-12-28 2021-10-29 中科院计算技术研究所大数据研究院 Multilingual document classification method and device and storage medium
CN112784042A (en) * 2021-01-12 2021-05-11 北京明略软件系统有限公司 Text similarity calculation method and system combining article structure and aggregated word vector
CN113360600A (en) * 2021-06-03 2021-09-07 中国科学院计算机网络信息中心 Method and system for screening enterprise performance prediction indexes based on signal attenuation

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140019460A1 (en) * 2012-07-12 2014-01-16 Yahoo! Inc. Targeted search suggestions
CN104699814A (en) * 2015-03-24 2015-06-10 清华大学 Searching method and system of hot spot information
CN106156276B (en) * 2016-06-25 2019-07-19 贵州大学 Hot news based on Pitman-Yor process finds method

Also Published As

Publication number Publication date
CN106599181A (en) 2017-04-26

Similar Documents

Publication Publication Date Title
CN106599181B (en) A kind of hot news detection method based on topic model
Wu et al. A posterior-neighborhood-regularized latent factor model for highly accurate web service QoS prediction
Zhang et al. Multiresolution graph attention networks for relevance matching
Lee Unsupervised and supervised learning to evaluate event relatedness based on content mining from social-media streams
Kumar et al. ESUMM: event summarization on scale-free networks
Tzelepis et al. Learning to detect video events from zero or very few video examples
Trattner et al. Exploring the differences and similarities between hierarchical decentralized search and human navigation in information networks
KR20190124986A (en) Searching Method for Related Law
Trokhymovych et al. Wikicheck: An end-to-end open source automatic fact-checking api based on wikipedia
CN110309355A (en) Generation method, device, equipment and the storage medium of content tab
Huang et al. Tag refinement of micro-videos by learning from multiple data sources
Guo [Retracted] Intelligent Sports Video Classification Based on Deep Neural Network (DNN) Algorithm and Transfer Learning
Zhao et al. Lsif: A system for large-scale information flow detection based on topic-related semantic similarity measurement
Yang et al. Web service clustering method based on word vector and biterm topic model
Yang et al. A hot topic detection approach on Chinese microblogging
Ren et al. Role-explicit query extraction and utilization for quantifying user intents
Yu et al. Learning cross space mapping via DNN using large scale click-through logs
Toriah et al. Semantic-based video retrieval survey
Xu13 et al. BigVid at MediaEval 2016: predicting interestingness in images and videos
Ksibi et al. Flickr-based semantic context to refine automatic photo annotation
Feng et al. Implementation of Short Video Click‐Through Rate Estimation Model Based on Cross‐Media Collaborative Filtering Neural Network
Zezula Similarity searching for database applications
Yu et al. Interpretative topic categorization via deep multiple instance learning
Xu et al. A transformer based multimodal fine-fusion model for false information detection
Hai et al. Improving The Efficiency of Semantic Image Retrieval using A Combined Graph and SOM Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant