CN106599181B - A kind of hot news detection method based on topic model - Google Patents
A kind of hot news detection method based on topic model Download PDFInfo
- Publication number
- CN106599181B CN106599181B CN201611145855.9A CN201611145855A CN106599181B CN 106599181 B CN106599181 B CN 106599181B CN 201611145855 A CN201611145855 A CN 201611145855A CN 106599181 B CN106599181 B CN 106599181B
- Authority
- CN
- China
- Prior art keywords
- theme
- word
- article
- similarity
- temperature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The hot news detection method based on topic model that the invention discloses a kind of, news stream is crawled by web crawlers orientation, article is segmented first, remove the pretreatment such as stop words and meaningless character string, then feature extraction is carried out to pretreated article, construct text model, then the high text of similarity degree is added in most like classification by Text Clustering Algorithm, obtain theme library, then similarity calculation is carried out to the old and new's theme, the old and new theme high for similarity merges, finally carry out the calculating of theme temperature, most hot theme is selected by sequence.The present invention innovatively applies LDA algorithm in hot spot motif discovery, and propose fulminant concept, it can timely and effectively find newest hot news, theme temperature decaying concept is proposed simultaneously, tracking theme temperature can be recorded in real time, the development and change of hot news have been truly reflected, have been of great significance for the tracking displaying of hot news.
Description
Technical field
The hot news detection method based on topic model that the present invention provides a kind of, is related to web crawlers, clustering,
The core technologies such as Text similarity computing and algorithm, timely and effectively detect hot news, and tracking hot news develops.
Background technique
With the development of internet technology, the massive information epoch have arrived, and various information is full of in internet, but only
A small number of news can create much of a stir, i.e., so-called top news, hot news, and timely hot news discovery can help people real
When pay close attention to social state.
On the other hand, the outburst of a hot news is not to die at a flash, is usually associated with a hair of flowing rhythm
Exhibition process, and cause the generation of other potential problems, so the development process of tracking hot news has research social concern
Significance.
The development of internet, the rise of big data, internet are flooded with bulk information, send out in these low-quality information
Existing hot news becomes of crucial importance.
Summary of the invention
It is an object of the invention to provide a kind of based on web crawlers, cluster point for many and diverse of nowadays internet information
The hot news detection method of analysis and topic model.
The purpose of the present invention is achieved through the following technical solutions: a kind of hot news detection based on topic model
Method crawls news stream by web crawlers orientation, segments first to article, remove stop words and meaningless character string etc.
Pretreatment then carries out feature extraction to pretreated article, constructs text model, then will by Text Clustering Algorithm
The high text of similarity degree is added in most like classification, obtains topic library, then carries out similarity meter to new familiar topic
Calculate, the new familiar topic high for similarity merge, finally carry out the calculating of topic temperature, by sequence select it is most hot if
Topic.Specifically includes the following steps:
(1) it is oriented by the way of web crawlers and crawls news stream, every arrival N new articles carry out a batch processing, right
Crawl data progress data cleansing, article segments to obtain pretreated article;
(2) it constructs vector space model: passing through pretreatment operation, original document can be regarded as to be made of a pile word
, if document being regarded as a vector, each word is exactly one-dimensional characteristic, by transform a document to
Amount, text data just become the structural data that can be subsequently can by computer, and the Similarity Problem between two documents just converts
For the Similarity Problem between two vectors;When calculating document vector per one-dimensional weight, calculated using improved B-TFIDF
Method, algorithmic formula are as follows:
W represents word in formula (1), and A indicates that the article number in new article comprising word w, B indicate not including in new article
The article number of word w, C indicate that the article number in history article comprising word w, D indicate the text for not including word w in history article
Chapter number, d in formula (2)iIndicate i-th new article, N indicates that new article sum, tf (d, w) indicate word of the word w in article d
Frequently, df (w) indicates the article number comprising word w;B-TFIDF algorithm takes into account the burst of word, burst i.e. one
Word is a large amount of suddenly in a short time to be occurred;The weight for constituting each word of document is calculated by algorithm above, and then generates text
The vector space model D of chapteri=(weight (di,w1),weight(di,w2),weight(di,w3)…..weight(di,
wn)), wherein n is total word number;
(3) article clusters: passing through step 2, text is represented as the form of vector, clusters to text vector;Using
LDA topic model clustering algorithm, specifically:
LDA cluster process: LDA is three layers of bayesian probability model, comprising word, theme and three layers of document, by one
Such a process is regarded in the generation of article as: selecting some theme with certain probability, and with certain probability in this theme
Some word is selected, document to theme obeys multinomial distribution, and theme to word obeys multinomial distribution, clusters to obtain by LDA " main
Topic-word " probability matrix phi and " document-theme " probability matrix theta, according to " document-theme " probability matrix theta
It obtains m theme and m theme corresponds to the probability of N articles, every a line i of theta represents an article, and each column j is represented
One theme, homography value thetaijIt is the probability that article i belongs to theme j;Setting screening threshold value is thresholdT, if
thetaij> thresholdT then thinks that article i belongs to theme j, thus selects the corresponding article of each theme;
The determination of LDA cluster number m: it is that N/10-N/5 repeats LDA clustering algorithm that cluster number, which is respectively set, then
Similarity between the theme of implementing result each time is calculated, the corresponding theme of implementing result that similarity is minimum between theme is selected
Number;The calculating of similarity is represented according to every a line j of LDA " theme-word " the probability matrix phi, phi clustered between theme
One theme Tj, each column k represents a word wk, phijkRepresent theme TjInclude word wkProbability;A line of Phi can be with
Regard theme T asjVector form Tj=(w1,w2,w3,…wk…wn), n is total word number;The similarity of theme between any two is calculated,
Similarity average value is sought, is minimized as similarity between final theme;The calculating of similarity uses the meter of cosine similarity
Calculation method, calculation formula are as follows:
T in formula (3)iAnd TjRepresent two themes, ωk(Ti) represent theme TiValue on dimension k, n indicate total word
Number;
(4) subject key words are extracted: being extracted keyword from the topic of articles all under theme, first carried out to title of article
Participle, filters out stop-word, it is not intended to which adopted word and punctuation mark, remaining word is as subject key words;
(5) topic merges: m theme article corresponding with its is obtained by step 3, next by m new themes and old master
Topic merges, and calculates similarity f1 between theme, thinks that two themes are similar if f1 > 0.5, and merges two themes;Theme
Between similarity f1 calculation formula it is as follows:
F1=2*vectorSim*keywordSim/ (vectorSim+keywordSim) (4)
It is similar that vectorSim in formula (4) represents the calculating theme cosine using all words that theme includes as dimension
Degree, keywordSim, which is represented, calculates theme cosine similarity by dimension of subject key words, and the calculation formula of cosine similarity is same
Formula (3);
(6) temperature calculates: obtaining final all themes by step 5, next calculates theme temperature h, filter out heat
It spends high theme, removes that temperature is low, i.e., out-of-date theme;According to the feature of hot spot theme news concentration class s high, temperature calculates public
Formula is as follows:
ht=∑ sim (di,t)(5)
D in formula (5)iIndicate the article that theme T includes, the temperature h of theme TtIt is similar to theme equal to article under theme
The sum of degree, the same formula of sim function (3);
As time go on, the temperature of a theme can constantly decay, until being rejected lower than the threshold value theme;Temperature
Decaying, in each batch process, if there is new article to arrive below theme T, the temperature h of theme TtIt can increase accordingly
Add, ht=ht* Up, if not new article is added to theme T, temperature htIt can decay, ht=ht* Down, wherein Up >
1, Down < 1.
The beneficial effects of the present invention are: the present invention innovatively applies LDA algorithm in hot spot motif discovery, and propose
Fulminant concept, can timely and effectively find newest hot news, while propose theme temperature decaying concept, can
The theme temperature of record tracking in real time, has been truly reflected the development and change of hot news, shows tool for the tracking of hot news
Have very important significance.
Detailed description of the invention
Fig. 1 is the hot news testing process schematic diagram based on topic model;
Fig. 2 is article modeling process schematic diagram;
Fig. 3 is LDA cluster process schematic diagram;
Fig. 4 is that new old theme merges schematic diagram;
Fig. 5 is that theme temperature calculates schematic diagram.
Specific embodiment
Invention is further described in detail in the following with reference to the drawings and specific embodiments.
As shown in Figure 1, a kind of hot news detection method based on topic model proposed by the present invention, including following step
It is rapid:
(1) it is oriented by the way of web crawlers and crawls news stream, every arrival N articles carry out a batch processing, to climbing
Access segments to obtain pretreated article according to progress data cleansing, article;
(2) construct vector space model: as shown in Fig. 2, by pretreatment operation, original document can be regarded as by a pile
What word was constituted, if document being regarded as a vector, each word is exactly one-dimensional characteristic, by turning document
Vector is turned to, text data just becomes the structural data that can be subsequently can by computer, the Similarity Problem between two documents
The Similarity Problem between two vectors is translated into.When calculating document vector per one-dimensional weight, use improved
B-TFIDF algorithm, algorithmic formula are as follows:
W represents word in formula 1, and A indicates that the article number in new article comprising word w, B indicate not including in new article single
The article number of word w, C indicate that the article number in history article comprising word w, D indicate the article for not including word w in history article
It counts, d in formula 2iIndicating i-th new article, N indicates that new article sum, tf (d, w) indicate word frequency of the word w in article d,
Df (w) indicates the article number comprising word w.The algorithm takes into account the burst of word, and burst is a word short
It is a large amount of suddenly in phase to occur.The weight for constituting each word of document is calculated by algorithm above, and then generates the vector of article
Spatial model Di=(weight (di,w1),weight(di,w2),weight(di,w3)…..weight(di,wn)), wherein n is
Total word number.
(3) article clusters: passing through step 2, text is represented as the form of vector, clusters to text vector;Such as Fig. 3
It is shown, LDA topic model clustering algorithm is employed herein, LDA is three layers of bayesian probability model, comprising word, theme and
Three layers of document, regard the generation of an article as such a process: some theme being selected with certain probability, and in this master
Some word is selected with certain probability in topic, document to theme obeys multinomial distribution, and theme to word is obeyed multinomial distribution, passed through
LDA clustering obtains " theme-word " probability matrix and " document-theme " probability matrix, detailed process see below description.
LDA cluster process: LDA is three layers of bayesian probability model, clusters available " theme-word by LDA
Language " probability matrix phi and " document-theme " probability matrix theta obtains m according to " document-theme " probability matrix theta
A theme and m theme correspond to the probability of N articles, and every a line i of theta represents an article, and each column j represents a master
Topic, homography value thetaijIt is the probability that article i belongs to theme j.Setting screening threshold value is thresholdT (preferred value
0.32), if thetaij> thresholdT then thinks that article i belongs to theme j, thus selects the corresponding article of each theme.
The determination of LDA cluster number m: since N article cluster numbers relatively meet reality between N/10 to N/5
(for example, cluster number relatively meets reality between 15 to 30 when new article sum N is 150), so being respectively set poly-
Class number is that N/10-N/5 repeats LDA clustering algorithm, then calculates similarity between the theme of implementing result each time, selects
The corresponding theme number of the minimum implementing result of similarity between theme.The calculating needs of similarity are clustered according to LDA between theme
Every a line j of " theme-word " probability matrix phi, phi that arrive represent a theme Tj, each column k represents a word wk,
phijkRepresent theme TjInclude word wkProbability.A line of Phi can regard theme T asjVector form Tj=(w1,w2,
w3,…wk…wn), n is total word number.The similarity of theme between any two is calculated, similarity average value is sought, is minimized as most
Similarity between whole theme.The calculating of similarity uses the calculation method of cosine similarity, and calculation formula is as follows:
T in formula 3iAnd TjRepresent two themes, ωk(Ti) represent theme TiValue on dimension k, n indicate total word number.
(4) subject key words are extracted: being extracted keyword from the topic of articles all under theme, first carried out to title of article
Participle, filters out stop-word, it is not intended to which adopted word and punctuation mark, remaining word is as subject key words.
(5) topic merges: m theme article corresponding with its is obtained by step 3, next by m new themes and old master
Topic merges, as shown in figure 4, calculating similarity f1 between theme, thinks that two themes are similar if f1 > 0.5, and merges two
Theme.Similarity f1 calculation formula is as follows between theme:
F1=2*vectorSim*keywordSim/ (vectorSim+keywordSim) (4)
VectorSim in formula 4, which is represented, calculates theme cosine similarity using all words that theme includes as dimension,
KeywordSim, which is represented, calculates theme cosine similarity, the same formula of the calculation formula of cosine similarity by dimension of subject key words
3。
(6) temperature calculates: as shown in figure 5, obtaining final all themes by step 5, next calculating theme temperature
H filters out the high theme of temperature, removes that temperature is low, i.e., out-of-date theme.According to the feature of hot spot theme news concentration class s high,
Temperature calculation formula is as follows:
ht=∑ sim (di,t)(5)
Di in formula 5 indicates the article that theme T includes, the temperature h of theme TtIt is similar to theme equal to article under theme
The sum of degree, sim function is the same as formula 3.
As time go on, the temperature of a theme can constantly decay, until being rejected lower than the threshold value theme.Temperature
Decaying, in each batch process, if there is new article to arrive below theme T, the temperature h of theme TtIt can increase accordingly
Add, ht=ht* Up (preferred value 1.05), if not new article is added to theme T, temperature htIt can decay, ht=ht*
Down (preferred value 0.9).
Claims (1)
1. a kind of hot news detection method based on topic model, which comprises the following steps:
(1) it is oriented by the way of web crawlers and crawls news stream, every arrival N new articles carry out a batch processing, to crawling
Data carry out data cleansing, article segments to obtain pretreated article;
(2) it constructs vector space model: passing through pretreatment operation, original document, which can be regarded as, to be made of a pile word, such as
If document is regarded as a vector by fruit, then each word is exactly one-dimensional characteristic, by transforming a document to vector, text
Data just become the structural data that can be subsequently can by computer, and the Similarity Problem between two documents has translated into two
Similarity Problem between vector;When calculating document vector per one-dimensional weight, using improved B-TFIDF algorithm, algorithm
Formula is as follows:
W represents word in formula (1), and A indicates that the article number in new article comprising word w, B indicate not including word in new article
The article number of w, C indicate that the article number in history article comprising word w, D indicate the article for not including word w in history article
It counts, d in formula (2)iIndicate i-th new article, N indicates that new article sum, tf (d, w) indicate word of the word w in article d
Frequently, df (w) indicates the article number comprising word w;B-TFIDF algorithm takes into account the burst of word, burst i.e. one
Word is a large amount of suddenly in a short time to be occurred;The weight for constituting each word of document is calculated by algorithm above, and then generates text
The vector space model D of chapteri=(weight (di,w1),weight(di,w2),weight(di,w3)…..weight(di,
wn)), wherein n is total word number;
(3) article clusters: passing through step 2, text is represented as the form of vector, clusters to text vector;Using LDA master
Model tying algorithm is inscribed, specifically:
LDA cluster process: LDA is three layers of bayesian probability model, comprising word, theme and three layers of document, by an article
Generation regard such a process as: some theme is selected with certain probability, and selected with certain probability in this theme
Some word, document to theme obey multinomial distribution, and theme to word obeys multinomial distribution, clusters to obtain " theme-by LDA
Word " probability matrix phi and " document-theme " probability matrix theta, obtains according to " document-theme " probability matrix theta
M theme and m theme correspond to the probability of N articles, and every a line i of theta represents an article, and each column j represents one
Theme, homography value thetaijIt is the probability that article i belongs to theme j;Setting screening threshold value is thresholdT, if
thetaij> thresholdT then thinks that article i belongs to theme j, thus selects the corresponding article of each theme;
The determination of LDA cluster number m: it is that N/10-N/5 repeats LDA clustering algorithm that cluster number, which is respectively set, is then calculated
Similarity between the theme of implementing result each time selects the minimum corresponding theme number of implementing result of similarity between theme;It is main
Every a line j of the calculating of similarity is clustered according to LDA between topic " theme-word " probability matrix phi, phi represent one
Theme Tj, each column k represents a word wk, phijkRepresent theme TjInclude word wkProbability;A line of Phi can be regarded as
Theme TjVector form
Tj=(w1,w2,w3,…wk…wn), n is total word number;The similarity of theme between any two is calculated, similarity average value is sought,
It is minimized as similarity between final theme;The calculating of similarity uses the calculation method of cosine similarity, calculation formula
It is as follows:
T in formula (3)iAnd TjRepresent two themes, ωk(Ti) represent theme TiValue on dimension k, n indicate total word number;
(4) subject key words are extracted: being extracted keyword from the topic of articles all under theme, first divided title of article
Word filters out stop-word, it is not intended to which adopted word and punctuation mark, remaining word is as subject key words;
(5) topic merges: m theme article corresponding with its is obtained by step 3, next by a newly themes of m and old theme into
Row merges, and calculates similarity f1 between theme, thinks that two themes are similar if f1 > 0.5, and merges two themes;Phase between theme
It is as follows like degree f1 calculation formula:
F1=2*vectorSim*keywordSim/ (vectorSim+keywordSim) (4)
VectorSim in formula (4), which is represented, calculates theme cosine similarity using all words that theme includes as dimension,
KeywordSim, which is represented, calculates theme cosine similarity, the same formula of the calculation formula of cosine similarity by dimension of subject key words
(3);
(6) temperature calculates: obtaining final all themes by step 5, next calculates theme temperature h, filter out temperature height
Theme, remove that temperature is low, i.e., out-of-date theme;According to the feature of hot spot theme news concentration class s high, temperature calculation formula is such as
Under:
ht=∑ sim (di,t) (5)
D in formula (5)iIndicate the article that theme T includes, the temperature h of theme TtEqual to article under theme and Topic Similarity
With the same formula of sim function (3);
As time go on, the temperature of a theme can constantly decay, until being rejected lower than the threshold value theme;Temperature declines
Subtract, in each batch process, if there is new article to arrive below theme T, the temperature h of theme TtIt can increase accordingly
Add, ht=ht* Up, if not new article is added to theme T, temperature htIt can decay, ht=ht* Down, wherein Up >
1, Down < 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611145855.9A CN106599181B (en) | 2016-12-13 | 2016-12-13 | A kind of hot news detection method based on topic model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611145855.9A CN106599181B (en) | 2016-12-13 | 2016-12-13 | A kind of hot news detection method based on topic model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106599181A CN106599181A (en) | 2017-04-26 |
CN106599181B true CN106599181B (en) | 2019-06-18 |
Family
ID=58802054
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611145855.9A Active CN106599181B (en) | 2016-12-13 | 2016-12-13 | A kind of hot news detection method based on topic model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106599181B (en) |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423337A (en) * | 2017-04-27 | 2017-12-01 | 天津大学 | News topic detection method based on LDA Fusion Models and multi-level clustering |
CN107239497B (en) * | 2017-05-02 | 2020-11-03 | 广东万丈金数信息技术股份有限公司 | Hot content search method and system |
CN107203632B (en) * | 2017-06-01 | 2019-08-16 | 中国人民解放军国防科学技术大学 | Topic Popularity prediction method based on similarity relation and cooccurrence relation |
CN107330049B (en) * | 2017-06-28 | 2020-05-22 | 北京搜狐新媒体信息技术有限公司 | News popularity estimation method and system |
CN107835113B (en) * | 2017-07-05 | 2020-09-08 | 中山大学 | Method for detecting abnormal user in social network based on network mapping |
CN107451224A (en) * | 2017-07-17 | 2017-12-08 | 广州特道信息科技有限公司 | A kind of clustering method and system based on big data parallel computation |
CN107563725B (en) * | 2017-08-25 | 2021-04-06 | 浙江网新恒天软件有限公司 | Recruitment system for optimizing fussy talent recruitment process |
CN107656919B (en) * | 2017-09-12 | 2018-10-26 | 中国软件与技术服务股份有限公司 | A kind of optimal L DA Automatic Model Selection methods based on minimum average B configuration similarity between theme |
CN107918644B (en) * | 2017-10-31 | 2020-12-08 | 北京锐思爱特咨询股份有限公司 | News topic analysis method and implementation system in reputation management framework |
CN107832418A (en) * | 2017-11-08 | 2018-03-23 | 郑州云海信息技术有限公司 | A kind of much-talked-about topic finds method, system and a kind of much-talked-about topic discovering device |
CN107992542A (en) * | 2017-11-27 | 2018-05-04 | 中山大学 | A kind of similar article based on topic model recommends method |
CN108153818B (en) * | 2017-11-29 | 2021-08-10 | 成都东方盛行电子有限责任公司 | Big data based clustering method |
CN107784127A (en) * | 2017-11-30 | 2018-03-09 | 杭州数梦工场科技有限公司 | A kind of focus localization method and device |
CN107862089B (en) * | 2017-12-02 | 2020-03-13 | 北京工业大学 | Label extraction method based on perception data |
CN108090157B (en) * | 2017-12-12 | 2018-11-06 | 百度在线网络技术(北京)有限公司 | A kind of hot news method for digging, device and server |
CN110888978A (en) * | 2018-09-06 | 2020-03-17 | 北京京东金融科技控股有限公司 | Article clustering method and device, electronic equipment and storage medium |
CN110096649B (en) * | 2019-05-14 | 2021-07-30 | 武汉斗鱼网络科技有限公司 | Post extraction method, device, equipment and storage medium |
CN110532388B (en) * | 2019-08-15 | 2022-07-01 | 企查查科技有限公司 | Text clustering method, equipment and storage medium |
CN110609938A (en) * | 2019-08-15 | 2019-12-24 | 平安科技(深圳)有限公司 | Text hotspot discovery method and device and computer-readable storage medium |
CN113127611B (en) * | 2019-12-31 | 2024-05-14 | 北京中关村科金技术有限公司 | Method, device and storage medium for processing question corpus |
CN111343467B (en) * | 2020-02-10 | 2021-10-26 | 腾讯科技(深圳)有限公司 | Live broadcast data processing method and device, electronic equipment and storage medium |
CN112100372B (en) * | 2020-08-20 | 2022-08-30 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Head news prediction classification method |
US11436287B2 (en) | 2020-12-07 | 2022-09-06 | International Business Machines Corporation | Computerized grouping of news articles by activity and associated phase of focus |
CN112612889B (en) * | 2020-12-28 | 2021-10-29 | 中科院计算技术研究所大数据研究院 | Multilingual document classification method and device and storage medium |
CN112784042A (en) * | 2021-01-12 | 2021-05-11 | 北京明略软件系统有限公司 | Text similarity calculation method and system combining article structure and aggregated word vector |
CN113360600A (en) * | 2021-06-03 | 2021-09-07 | 中国科学院计算机网络信息中心 | Method and system for screening enterprise performance prediction indexes based on signal attenuation |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140019460A1 (en) * | 2012-07-12 | 2014-01-16 | Yahoo! Inc. | Targeted search suggestions |
CN104699814A (en) * | 2015-03-24 | 2015-06-10 | 清华大学 | Searching method and system of hot spot information |
CN106156276B (en) * | 2016-06-25 | 2019-07-19 | 贵州大学 | Hot news based on Pitman-Yor process finds method |
-
2016
- 2016-12-13 CN CN201611145855.9A patent/CN106599181B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN106599181A (en) | 2017-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106599181B (en) | A kind of hot news detection method based on topic model | |
Wu et al. | A posterior-neighborhood-regularized latent factor model for highly accurate web service QoS prediction | |
Zhang et al. | Multiresolution graph attention networks for relevance matching | |
Lee | Unsupervised and supervised learning to evaluate event relatedness based on content mining from social-media streams | |
Kumar et al. | ESUMM: event summarization on scale-free networks | |
Tzelepis et al. | Learning to detect video events from zero or very few video examples | |
Trattner et al. | Exploring the differences and similarities between hierarchical decentralized search and human navigation in information networks | |
KR20190124986A (en) | Searching Method for Related Law | |
Trokhymovych et al. | Wikicheck: An end-to-end open source automatic fact-checking api based on wikipedia | |
CN110309355A (en) | Generation method, device, equipment and the storage medium of content tab | |
Huang et al. | Tag refinement of micro-videos by learning from multiple data sources | |
Guo | [Retracted] Intelligent Sports Video Classification Based on Deep Neural Network (DNN) Algorithm and Transfer Learning | |
Zhao et al. | Lsif: A system for large-scale information flow detection based on topic-related semantic similarity measurement | |
Yang et al. | Web service clustering method based on word vector and biterm topic model | |
Yang et al. | A hot topic detection approach on Chinese microblogging | |
Ren et al. | Role-explicit query extraction and utilization for quantifying user intents | |
Yu et al. | Learning cross space mapping via DNN using large scale click-through logs | |
Toriah et al. | Semantic-based video retrieval survey | |
Xu13 et al. | BigVid at MediaEval 2016: predicting interestingness in images and videos | |
Ksibi et al. | Flickr-based semantic context to refine automatic photo annotation | |
Feng et al. | Implementation of Short Video Click‐Through Rate Estimation Model Based on Cross‐Media Collaborative Filtering Neural Network | |
Zezula | Similarity searching for database applications | |
Yu et al. | Interpretative topic categorization via deep multiple instance learning | |
Xu et al. | A transformer based multimodal fine-fusion model for false information detection | |
Hai et al. | Improving The Efficiency of Semantic Image Retrieval using A Combined Graph and SOM Model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |