CN102779190A - Rapid detection method for hot issues of timing sequence massive network news - Google Patents

Rapid detection method for hot issues of timing sequence massive network news Download PDF

Info

Publication number
CN102779190A
CN102779190A CN2012102293775A CN201210229377A CN102779190A CN 102779190 A CN102779190 A CN 102779190A CN 2012102293775 A CN2012102293775 A CN 2012102293775A CN 201210229377 A CN201210229377 A CN 201210229377A CN 102779190 A CN102779190 A CN 102779190A
Authority
CN
China
Prior art keywords
text
clustering cluster
cluster
clustering
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012102293775A
Other languages
Chinese (zh)
Other versions
CN102779190B (en
Inventor
王厚峰
彭楠赟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201210229377.5A priority Critical patent/CN102779190B/en
Publication of CN102779190A publication Critical patent/CN102779190A/en
Application granted granted Critical
Publication of CN102779190B publication Critical patent/CN102779190B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a rapid detection method for hot issues of timing sequence massive network news, comprising the following steps of: dividing a network news text sequence into block sequences according to time intervals; clustering a news text of the first block according to a Dirichlet process to form a clustered set; attenuating and filtering a result of clustering the front block to be used as prior distribution of subsequent blocks; clustering the subsequent blocks according to the Dirichlet process; carrying out hot degree sequencing of issues of each cluster according to reporting amount; and taking T clusters with the highest sequencing value as the hot issues; and selecting M characteristics with the highest tf-idf value in each cluster as keywords of hot spots and displaying the hot spots. According to the rapid detection method for the hot issues of the timing sequence massive network news disclosed by the invention, the efficiency of clustering the network news can be greatly improved; and meanwhile, the occupation of an internal memory is not linearly increased along the increasing of data quantity, and the rapid detection method is suitable for large-scale text data analysis.

Description

A kind of focus incident method for quick of sequential mass network news
Technical field
The invention provides a kind of focus incident discover method of online news, be specifically related to find focus incident fast in the magnanimity newsletter archive of the report of sequential from network, and by temperature to ordering of events, belong to natural language processing and data mining field.
Background technology
Along with the flourish of network technology and thing followed information explosion, people can get access to up-to-date, the most full news major issue on the one hand at any time, and on the other hand, the time cost that the reader obtains key message also increases thereupon.How from the online Internet news of magnanimity, to obtain useful information automatically and become a urgent task.The focus incident detection of the online news of network can be satisfied people and from the Internet news of sequential magnanimity, obtain important information, raising reading efficiency, also can help departments of government to carry out network public-opinion monitoring and accident monitoring simultaneously.
At present, a lot of methods have been used topic model (Topic Model) and affine propagation (Affinity Propagation) algorithm when carrying out detection of Internet news focus and news recommendation.But the problem that these two class methods exist is to need focus number k in the prior given news, and can only handle static data.Actual conditions are, the news major issue quantity that take place every day is also uncertain, and simultaneously, news report is dynamic, real-time.
Except that the problems referred to above, incident itself also can experience the process of generation, development and decay, in focus incident is found, also should consider the rule that these are natural.
Summary of the invention
The Internet news focus incident of indication refers among the present invention: in the Internet news text flow of one group of sequential, exist, in a certain certain period of time by continuously and the wide coverage and the incident of being shown great attention to.Under the situation of not doing to specify, chronomere of the present invention all suppose by " my god " for unit, time span is the interval with " 1 day " also.But method of the present invention is applicable to random time unit.
The purpose of this invention is to provide a kind of new method,, detect focus incident wherein, and press temperature ordering of events through the Internet news text data of fast processing magnanimity.In the face of the newsletter archive of sequential magnanimity, both required the algorithm time efficiency high, can not linearity increase space complexity along with the increase of news data again, simultaneously, can also be to generation, development, the attenuation process modeling of focus incident.
Principle of the present invention is: use the Di Li Cray process of a band time factor to be used for the Internet news cluster; It can represent the dynamic evolutionary process of hot news on the one hand well; On the other hand general Di Li Cray process has been become incremental model; Not taking of internal memory can increase and linear increasing with data volume, is applicable to the processing of large scale network text data.In addition,, the present invention proposes a kind of quick deduction algorithm and replace gibbs sampler, accelerated algorithm speed greatly based on the greed search in order further to improve time efficiency.Afterwards, the focus incident of excavating (being the newsletter archive clustering cluster) is sorted, extract the most popular incident.
Following elder generation makes an explanation to several terms:
-clustering cluster: each class that forms through clustering method is called a clustering cluster.Among the present invention, each clustering cluster is represented a possible incident.
-clustering cluster size: the element number in the clustering cluster; For text cluster, the size of clustering cluster is meant text number wherein.
-Di Li Cray process (Dirichlet Process): be also referred to as Chinese-style restaurant's process (Chinese Restaurant Procss), illustrated in detail (http://en.wikipedia.org/wiki/Dirichlet_process) is arranged on [WIKI].
-tf-idf value: the notion commonly used in the information retrieval is a kind of method that a speech of tolerance (or phrase) characterizes content of text.Suppose that the frequency that certain speech (or phrase) term occurs is tf in a text Text; Occur in the df of this speech (or phrase) in the text collection text; If the text in the text collection adds up to Num, the tf-idf value of this term in text Text calculated (logarithm log gets 10 and is the truth of a matter) by following formula:
tf - idf = tf * log Num df
The operation chart of the present invention's correspondence is as shown in Figure 1, [f in the drawings 1f 2...] and the representation feature set, characteristic is actually the set of speech in the text set by construed.Each clustering cluster is represented an incident, and the clustering cluster set constitutes the focus incident storehouse.
Technical scheme provided by the invention is following:
A kind of focus incident method for quick (flow process is referring to Fig. 2) of sequential mass network news comprises:
A. the Di Li Cray process of using the band time factor comprises following three steps to the online cluster of Internet news text:
A1. the Internet news text sequence is divided into the block sequence by the time interval, each block comprise a plurality of newsletter archives in the time interval (as, be the interval with " 1 day ", each block comprises 1 day newsletter archive).
A2. to first block (as, first day) newsletter archive carry out cluster by Di Li Cray process, form the clustering cluster set.
A3. for each follow-up block, utilize result after the last block cluster also by carrying out cluster, but before cluster by Di Li Cray process, need do filtration treatment again to the cluster result of last block affected attenuation processing earlier.The basic thought of attenuation processing is after last block processes is intact, is that decay factor is implemented decay with a to each clustering cluster that forms, the size of supposing certain clustering cluster be r (promptly; Comprise r text), then, after revising decay; Its size becomes r '=a*r, wherein a ∈ (0,1).The inner characteristic distribution of clustering cluster remains unchanged.The basic thought of filtration treatment is: the deletion size is less than the clustering cluster of certain threshold value t (as t=30 is set, also can be made as other value), and simultaneously, deletion continues the clustering cluster that the report time surpasses certain hour length (as 150 days, also can be set to other value).
B. focus incident is sorted and shows, specifically be divided into following two steps:
B1. to each clustering cluster, calculate this clustering cluster during reporting in averaging time section the report amount, carry out the temperature ordering of incident then according to the report amount;
B2. ranking value is a highest T clustering cluster is as T focus incident (T is the User Defined value); (M can set up on their own to choose the M that the tf-idf value is the highest in each clustering cluster; Like M=20) individual characteristic (promptly representing speech) as the keyword of focus, shows focus.
Utilize technical scheme provided by the invention, can improve the efficient of Internet news cluster greatly; Simultaneously internal memory take linear increasing with the increase of data volume, be applicable to the large scale text data analysis; In addition, the Di Li Cray process mixture model of improved through the joining day factor, can be simulated hot news generation, development, attenuation process, tallies with the actual situation; Filtration to hot news has improved system effectiveness on the one hand, has removed noise on the other hand, has improved the accuracy of system.
Description of drawings
Fig. 1 the method for the invention operation chart.
Fig. 2 the method for the invention process flow diagram.
Embodiment
Through instance the present invention is done further explanation below.
Supposing has continuous three days Internet news report, wherein had in first day 100 pieces about earthquake, 60 pieces about college entrance examination, 30 pieces about national defence, 10 pieces about diplomacy, 5 pieces about economy; Had in second day 70 pieces indeterminate about national defence, 30 pieces of themes about health care, 50 pieces about earthquake, 20 pieces; Had in the 3rd day 80 pieces indeterminate about health care, 50 pieces about tourism, 10 pieces of themes.We also do not know how many total press focus incident numbers is, do not know which type incident every piece of article specifically belongs to yet.
At first, introduce several symbol descriptions:
(1) m is the size of block, that is, textual data is shown by the time sequence table: x 1:m=(x 1, x 2..., x m), wherein, x iRepresent i text, i=1 ... M.
(2) m in the block clustering cluster that text is corresponding is shown with sequence table:
Assign 1:m=(assign 1, assign 2..., assign m), assign wherein j∈ C, C represent clustering cluster collection, i.e. C={c 1, c 2..., c k, the number of clustering cluster is K=|C|;
(3) N jExpression belongs to clustering cluster c jThe text number;
(4) L representes the different speech numbers (each speech all has sequence number) that comprise altogether in the text collection;
(5)
Figure BDA00001845094200041
Expression belongs to clustering cluster c jText collection in sequence number be the number of times that the speech of l occurs altogether;
(6)
Figure BDA00001845094200042
is the ultra parameter corresponding to ; Ultra parameter be given as an initial constant value (as; Each
Figure BDA00001845094200044
all is set to 1), and
Figure BDA00001845094200045
α also is ultra parameter (its value also can be made as 1).
(7) Γ (a) is called gamma function on mathematics.Form is as the one of which:
Figure BDA00001845094200046
when variable a be positive integer
The time, its value is for factorial, that is: Γ (a+1)=a Γ (a)=a! (detailed description see Higher Education Publishing House " the 1st edition p587-589 of mathematics handbook)
Being achieved as follows of A part:
A1. the Internet news text sequence is divided into the block sequence by the time interval; Comprise in each block a plurality of newsletter archives (following with " my god " be chronomere; With " 1 day " is the time interval of block; Therefore each block comprises 1 day newsletter archive, and chronomere also can be set to other value, like " 3 days ", " 1 week ", " January " etc.).
A2. to each text of first block (that is, first day), carry out cluster through following algorithm in chronological order:
Input: an orderly m text is expressed as x 1:m
Output: the clustering cluster that each text is corresponding, that is: sequence assign 1: m
The 1st step: the set of initialization clustering cluster is sky, that is, C={}, the clustering cluster number is 0, K=0
The 2nd step: set an initial value p Max=0;
The 3rd step: (suppose that current is i text x for each text in the block i), repeat ~ the 3.3 step of the 3.1st step
The 3.1st step: newly-increased clustering cluster c New, that is: C'=C ∪ { c New;
The 3.2nd step: for each clustering cluster c j∈ C' repeats 3.2.1 step ~ the 3.2.1 step:
The 3.2.1 step: when text belongs to c jThe time, the Probability p of calculating current block integral is following:
1. calculate current text x iEach text x before r(1≤r≤i) belongs to the probable value of corresponding clustering cluster:
p ( x r ) = ( N assign r Σ k = 1 K N k + α × Π l = 1 L Γ ( n assign r l + β assign r l ) Γ ( Σ l = 1 L n assign r l + β 0 ) )
2. suppose text x iEach text x afterwards r(i<r≤m) belong to independent new clustering cluster, its probable value is:
p ( x r ) = α Σ k = 1 K + 1 N k + α × Π l = 1 L Γ ( n r l + β r l ) Γ ( Σ l = 1 L n r l + β 0 )
3. each text x that calculates above iThe set representations of the probable value of affiliated clustering cluster is:
p = Π r = 1 m p ( x r )
The 3.2.2 step: if Probability p is greater than most probable value p Max, that is: p>p MaxThe time:
The 3.2.2.1 step: i text x iClustering cluster be appointed as c j: assign i=c j
The 3.2.2.2 step: upgrade most probable value, make p Max=p;
The 3.3rd step: if i text x iAffiliated clustering cluster does not belong to set C, that is: assign i=c K+1:
The 3.3.1 step: with new clustering cluster c K+1Join clustering cluster set C:C=C ∪ { c K+1;
The 3.3.2 step: the cluster number of clusters increases 1, that is: K=K+1;
The 4th step: return the corresponding clustering cluster of each text, i.e. assign 1:m
Gone out 5 clustering cluster through the said process cluster, represented earthquake, college entrance examination, national defence, diplomacy and economic respectively, comprised textual data and be respectively 100 pieces, 60 pieces, 30 pieces, 10 pieces and 5 pieces.
A3. when the cluster that gets into second day, earlier first day cluster result is decayed.Suppose that decay factor a is 0.5, after the decay, the size of each clustering cluster becomes 50,30,15,5 and 2.5 respectively.Then filter.Suppose to filter threshold value t=30, have only preceding two clustering cluster, i.e. earthquake (clustering cluster c after then filtering 1) and college entrance examination (clustering cluster c 2) exist, as the focus prior distribution of second day (that is second block) cluster.Second block is carried out above-mentioned similar cluster, with above-mentioned unique different be that the initialization in the 1st step changes into:
The 1st ' step: initialization clustering cluster set C={ c 1, c 2, the clustering cluster number is 2, K=2;
The B part: temperature ordering and the realization of showing about incident are distinguished as follows:
B1. the temperature of incident ordering:
The 1st step: calculate each clustering cluster c jIn the text number, count N j
The 2nd step: calculate each clustering cluster c jThe time span D of Chinese version j(by the unit interval, as " my god ", report time and earliest time the latest at interval, like the fate of follow-up story)
The 3rd step: calculate clustering cluster c jAverage report amount in chronomere: Score j=N j/ D j
The 4th step: press Score jBe worth descendingly, get T the hottest incident of the individual conduct of preceding T (like T=10) clustering cluster collection ordering.
B2. focus incident is sorted and shows:
The 1st step: regard each clustering cluster as one " big text ", so all clustering cluster have formed several " big text " set;
The 2nd step: with " big text " collection is background, calculates the tf-idf value of each characteristic f (characteristic of a construed in the text) in the hottest T incident (clustering cluster);
The 3rd step: get the highest individual characteristic of M (like M=20) of tf-idf value and carry out the focus displaying as the keyword of focus.

Claims (5)

1. the focus incident method for quick of a sequential mass network news comprises:
A. the Di Li Cray process of using the band time factor comprises following three steps to the online cluster of Internet news text:
A1. the Internet news text sequence is divided into the block sequence by the time interval, each block comprises a plurality of newsletter archives in the time interval;
A2. the newsletter archive to first block carries out cluster by Di Li Cray process, forms the clustering cluster set;
A3. decay the result after the last block cluster, filter, as the prior distribution of follow-up block, then to follow-up block by carrying out cluster by Di Li Cray process;
B. focus incident is sorted and shows, comprising:
B1. to each clustering cluster, calculate this clustering cluster during reporting in averaging time section the report amount, carry out the temperature ordering of incident then according to the report amount;
B2. ranking value is a highest T clustering cluster is as focus incident, chooses the tf-idf value is the highest in each clustering cluster M the characteristic keyword as focus, focus is showed,
Wherein, T, M are the User Defined value;
Figure FDA00001845094100011
tf is that certain speech or phrase term are at one
The frequency that occurs among the text Text, df occurs in this speech or what texts of phrase in text collection, and Num is a literary composition
Text sum in this set, logarithm log are got 10 and are the truth of a matter.
2. focus incident method for quick as claimed in claim 1 is characterized in that, in the steps A 1, the said time interval was a unit with 1 day, and each block comprises 1 day newsletter archive.
3. focus incident method for quick as claimed in claim 1 is characterized in that, in the steps A 3; The disposal route of said decay is following: after last block processes is intact, be that decay factor is implemented decay with a to each clustering cluster that forms, the size of supposing certain clustering cluster is r; Then, after revising decay, its size becomes r '=a*r; A ∈ (0,1) wherein, the inner characteristic distribution of clustering cluster remains unchanged.
4. focus incident method for quick as claimed in claim 1 is characterized in that, in the steps A 3, the disposal route of said filtration is following: the deletion size is less than the clustering cluster of certain threshold value t, and simultaneously, deletion continues the clustering cluster that the report time surpasses certain hour length.
5. focus incident method for quick as claimed in claim 1 is characterized in that, the implementation method of steps A 2 is following:
The 1st step: initialization clustering cluster set C is empty, and the clustering cluster number K is 0;
The 2nd step: set a peaked initial value p of probability Max=0;
The 3rd step: for each text x in the block i, repeat ~ the 3.3 step of the 3.1st step:
The 3.1st step: newly-increased clustering cluster c New, note C'=C ∪ { c New;
The 3.2nd step: for each clustering cluster c j∈ C' repeats 3.2.1 step ~ the 3.2.1 step:
The 3.2.1 step: when text belongs to c jThe time, the Probability p of calculating current block integral is following:
1. calculate current text x iEach text x before r, 1≤r≤i belongs to the probable value of corresponding clustering cluster:
p ( x r ) = ( N assign r Σ k = 1 K N k + α × Π l = 1 L Γ ( n assign r l + β assign r l ) Γ ( Σ l = 1 L n assign r l + β 0 ) )
2. suppose text x iEach text x afterwards r, i<r≤m belongs to independent new clustering cluster, and its probable value then is:
p ( x r ) = α Σ k = 1 K + 1 N k + α × Π l = 1 L Γ ( n r l + β r l ) Γ ( Σ l = 1 L n r l + β 0 )
3. the whole probability of current block is top each text x iThe probable value of affiliated clustering cluster is long-pending:
p = Π r = 1 m p ( x r )
The 3.2.2 step: if Probability p is greater than most probable value p Max, that is: p>p MaxThe time:
The 3.2.2.1 step: i text x iClustering cluster be appointed as c j: assign i=c j
The 3.2.2.2 step: upgrade most probable value, make p Max=p;
The 3.3rd step: if i text x iAffiliated clustering cluster does not belong to set C, that is: assigni=c K+1:
The 3.3.1 step: with new clustering cluster c K+1Join clustering cluster set C:C=C ∪ { c K+1;
The 3.3.2 step: the cluster number of clusters increases 1, that is: K=K+1;
The 4th step: return the corresponding clustering cluster of each text, i.e. assign 1:m
Wherein, the m in the block the clustering cluster that text is corresponding is shown with sequence table: assign 1: m=(assign 1, assign 2..., assign m), assign wherein j∈ C, C represent clustering cluster collection, i.e. C={c 1, c 2..., c k, the number of clustering cluster is K=|C|; N jExpression belongs to clustering cluster c jThe text number; L representes the different speech numbers that comprise altogether in the text collection;
Figure FDA00001845094100031
Expression belongs to clustering cluster c jText collection in sequence number be the number of times that the speech of l occurs altogether; Be corresponding to
Figure FDA00001845094100033
Ultra parameter, and
Figure FDA00001845094100034
α also is ultra parameter, and ultra parameter is given as an initial constant value.
CN201210229377.5A 2012-07-03 2012-07-03 Rapid detection method for hot issues of timing sequence massive network news Expired - Fee Related CN102779190B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210229377.5A CN102779190B (en) 2012-07-03 2012-07-03 Rapid detection method for hot issues of timing sequence massive network news

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210229377.5A CN102779190B (en) 2012-07-03 2012-07-03 Rapid detection method for hot issues of timing sequence massive network news

Publications (2)

Publication Number Publication Date
CN102779190A true CN102779190A (en) 2012-11-14
CN102779190B CN102779190B (en) 2014-12-03

Family

ID=47124102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210229377.5A Expired - Fee Related CN102779190B (en) 2012-07-03 2012-07-03 Rapid detection method for hot issues of timing sequence massive network news

Country Status (1)

Country Link
CN (1) CN102779190B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336847A (en) * 2013-07-22 2013-10-02 厦门市美亚柏科信息股份有限公司 Generation method and system for hot news tag
CN103870474A (en) * 2012-12-11 2014-06-18 北京百度网讯科技有限公司 News topic organizing method and device
CN104268297A (en) * 2014-10-28 2015-01-07 江苏惠居乐信息科技有限公司 Big data analysis system on basis of news
CN104281663A (en) * 2014-09-24 2015-01-14 北京航空航天大学 Method and system for analyzing events on basis of non-negative matrix factorization
CN104516975A (en) * 2014-12-29 2015-04-15 中国科学院电子学研究所 Automatic correlation method facing multivariate data
CN105335476A (en) * 2015-10-08 2016-02-17 北京邮电大学 Method and device for classifying hot event
CN105574185A (en) * 2015-12-22 2016-05-11 北京奇虎科技有限公司 Method and device for providing clustering type intelligent summaries
CN103336847B (en) * 2013-07-22 2016-11-30 厦门市美亚柏科信息股份有限公司 A kind of generation method and system of hot news label
CN106202487A (en) * 2016-07-19 2016-12-07 西北工业大学 Based on user post behavioral pattern multi thread social events sum up method
CN106294861A (en) * 2016-08-23 2017-01-04 武汉烽火普天信息技术有限公司 Intelligence channel Chinese version towards large-scale data is polymerized and exhibiting method and system
CN107784010A (en) * 2016-08-29 2018-03-09 上海掌门科技有限公司 A kind of method and apparatus for being used to determine the temperature information of theme of news
CN107992619A (en) * 2017-12-21 2018-05-04 联想(北京)有限公司 A kind of clustering method, server cluster and virtual bench
CN109299461A (en) * 2018-09-19 2019-02-01 昆明理工大学 A method of the bilingual parallel segment of comparable corpus based on Dirichlet process extracts
CN109325198A (en) * 2018-08-17 2019-02-12 腾讯科技(深圳)有限公司 A kind of resource exhibition method, device and storage medium
CN111694949A (en) * 2019-03-14 2020-09-22 京东数字科技控股有限公司 Multi-text classification method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101174273A (en) * 2007-12-04 2008-05-07 清华大学 News event detecting method based on metadata analysis
CN101826114A (en) * 2010-05-26 2010-09-08 南京大学 Multi Markov chain-based content recommendation method
CN101887573A (en) * 2010-06-11 2010-11-17 北京邮电大学 Social network clustering correlation analysis method and system based on core point

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101174273A (en) * 2007-12-04 2008-05-07 清华大学 News event detecting method based on metadata analysis
CN101826114A (en) * 2010-05-26 2010-09-08 南京大学 Multi Markov chain-based content recommendation method
CN101887573A (en) * 2010-06-11 2010-11-17 北京邮电大学 Social network clustering correlation analysis method and system based on core point

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
彭楠赟,王厚峰,凌晨添: "基于层次聚类的网络新闻热点发现", 《中国计算语言学研究前沿发展(2009-2011)会议论文》, 31 December 2011 (2011-12-31) *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870474B (en) * 2012-12-11 2018-06-08 北京百度网讯科技有限公司 A kind of news topic method for organizing and device
CN103870474A (en) * 2012-12-11 2014-06-18 北京百度网讯科技有限公司 News topic organizing method and device
CN103336847A (en) * 2013-07-22 2013-10-02 厦门市美亚柏科信息股份有限公司 Generation method and system for hot news tag
CN103336847B (en) * 2013-07-22 2016-11-30 厦门市美亚柏科信息股份有限公司 A kind of generation method and system of hot news label
CN104281663A (en) * 2014-09-24 2015-01-14 北京航空航天大学 Method and system for analyzing events on basis of non-negative matrix factorization
CN104268297A (en) * 2014-10-28 2015-01-07 江苏惠居乐信息科技有限公司 Big data analysis system on basis of news
CN104516975A (en) * 2014-12-29 2015-04-15 中国科学院电子学研究所 Automatic correlation method facing multivariate data
CN104516975B (en) * 2014-12-29 2019-03-22 中国科学院电子学研究所 Automatic correlation method towards multivariate data
CN105335476A (en) * 2015-10-08 2016-02-17 北京邮电大学 Method and device for classifying hot event
CN105335476B (en) * 2015-10-08 2019-06-04 北京邮电大学 A kind of focus incident classification method and device
CN105574185A (en) * 2015-12-22 2016-05-11 北京奇虎科技有限公司 Method and device for providing clustering type intelligent summaries
CN106202487A (en) * 2016-07-19 2016-12-07 西北工业大学 Based on user post behavioral pattern multi thread social events sum up method
CN106202487B (en) * 2016-07-19 2019-06-21 西北工业大学 Based on user post behavior pattern multi thread social event summarize method
CN106294861A (en) * 2016-08-23 2017-01-04 武汉烽火普天信息技术有限公司 Intelligence channel Chinese version towards large-scale data is polymerized and exhibiting method and system
CN106294861B (en) * 2016-08-23 2019-08-09 武汉烽火普天信息技术有限公司 Text polymerize and shows method and system in intelligence channel towards large-scale data
CN107784010A (en) * 2016-08-29 2018-03-09 上海掌门科技有限公司 A kind of method and apparatus for being used to determine the temperature information of theme of news
CN107784010B (en) * 2016-08-29 2021-12-17 南京尚网网络科技有限公司 Method and equipment for determining popularity information of news theme
CN107992619A (en) * 2017-12-21 2018-05-04 联想(北京)有限公司 A kind of clustering method, server cluster and virtual bench
CN109325198A (en) * 2018-08-17 2019-02-12 腾讯科技(深圳)有限公司 A kind of resource exhibition method, device and storage medium
CN109299461A (en) * 2018-09-19 2019-02-01 昆明理工大学 A method of the bilingual parallel segment of comparable corpus based on Dirichlet process extracts
CN109299461B (en) * 2018-09-19 2021-07-16 昆明理工大学 Method for extracting bilingual parallel segments of comparable corpus based on Dirichlet process
CN111694949A (en) * 2019-03-14 2020-09-22 京东数字科技控股有限公司 Multi-text classification method and device
CN111694949B (en) * 2019-03-14 2023-12-05 京东科技控股股份有限公司 Multi-text classification method and device

Also Published As

Publication number Publication date
CN102779190B (en) 2014-12-03

Similar Documents

Publication Publication Date Title
CN102779190B (en) Rapid detection method for hot issues of timing sequence massive network news
KR101911466B1 (en) Analysis system for predicting future risks
To et al. On identifying disaster-related tweets: Matching-based or learning-based?
CN103617169B (en) A kind of hot microblog topic extracting method based on Hadoop
Ratkiewicz et al. Detecting and tracking the spread of astroturf memes in microblog streams
CN101661513B (en) Detection method of network focus and public sentiment
CN103984681B (en) News event evolution analysis method based on time sequence distribution information and topic model
CN104182389B (en) A kind of big data analyzing business intelligence service system based on semanteme
TWI501097B (en) System and method of analyzing text stream message
CN107943905B (en) Hot topic analysis method and system
CN105335349A (en) Time window based LDA microblog topic trend detection method and apparatus
Weiler et al. Event identification and tracking in social media streaming data
CN109543110A (en) A kind of microblog emotional analysis method and system
CN102663139A (en) Method and system for constructing emotional dictionary
CN103399891A (en) Method, device and system for automatic recommendation of network content
CN105550216A (en) Searching method and device of academic research information and excavating method and device of academic research information
CN103927297A (en) Evidence theory based Chinese microblog credibility evaluation method
CN103577404A (en) Microblog-oriented discovery method for new emergencies
Shook et al. The socio-environmental data explorer (SEDE): a social media–enhanced decision support system to explore risk perception to hazard events
Fujiki et al. Identification of bursts in a document stream
CN106874419B (en) A kind of real-time hot spot polymerization of more granularities
Lee et al. An automatic topic ranking approach for event detection on microblogging messages
CN103246728A (en) Emergency detection method based on document lexical feature variations
Wadhwa et al. An approach for dynamic identification of online radicalization in social networks
Bayarsaikhan et al. Toward sustainable development? Trend analysis of environmental policy in Korea from 1987 to 2040

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20141203

Termination date: 20170703