CN108874974A - Parallelization Topic Tracking method based on frequent term set - Google Patents

Parallelization Topic Tracking method based on frequent term set Download PDF

Info

Publication number
CN108874974A
CN108874974A CN201810585627.6A CN201810585627A CN108874974A CN 108874974 A CN108874974 A CN 108874974A CN 201810585627 A CN201810585627 A CN 201810585627A CN 108874974 A CN108874974 A CN 108874974A
Authority
CN
China
Prior art keywords
text
frequent
term vector
topic
frequent term
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810585627.6A
Other languages
Chinese (zh)
Inventor
孙健
许强
陆川
张明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Cloud Future Information Science Co Ltd
Original Assignee
Chengdu Cloud Future Information Science Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Cloud Future Information Science Co Ltd filed Critical Chengdu Cloud Future Information Science Co Ltd
Priority to CN201810585627.6A priority Critical patent/CN108874974A/en
Publication of CN108874974A publication Critical patent/CN108874974A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Abstract

The parallelization Topic Tracking method based on frequent term set that the invention discloses a kind of, including multiple topic text sets are calculated by Text Clustering Algorithm in certain amount or multiple texts in a period of time in report stream;It is calculated by parallelization and frequent term set excavation is carried out to topic text set;Pass through the parallel similarity that frequent term set is converted to frequent term vector collection, calculates between the frequent term vector collection of report stream and the frequent term vector collection of priori report of term vector model;Similarity and the Topic Tracking threshold value of setting are compared, topic ownership is determined, completes Topic Tracking.The present invention indicates topic text set using word set, reduces similarity calculation expense;It proposes to calculate the similarity between word set based on the similarity calculating method of Word2vec term vector model, the accuracy of similarity-rough set between word set can be improved;The advantages of being carried out frequent term set excavation and term vector conversion process using the calculation of parallelization, parallel computation is utilized, improves the efficiency of Topic Tracking.

Description

Parallelization Topic Tracking method based on frequent term set
Technical field
The present invention relates to network information processing field more particularly to a kind of parallelization Topic Tracking sides based on frequent term set Method.
Background technique
With the rapid development of information network technique and further popularizing for internet, the data on network present geometric Growth, data " explosion " have become one of the feature in current network epoch.The internet information of magnanimity is difficult to user fastly Speed and therefrom obtain useful information at a glance and specific information given more sustained attention, in order to alleviate existing information amount The problem of overload, people, being capable of the current hot topics of quick obtaining there is an urgent need to a kind of more efficiently information acquiring pattern With the subsequent relevant report of oneself content of interest.
Topic Tracking technology can collect in relative text according to known topic from follow-up text information flow Hold, can help people obtain associated topic follow-up report, Topic Tracking be divided into traditional Topic Tracking and adaptive topic with Track.
Traditional Topic Tracking mainly according to the subsequent relevant report of topic model tracking of priori, is divided into knowledge based and is based on Two kinds of research directions are counted, the former finds topic belonging to report based on specific domain knowledge, and the latter mainly passes through report The probability distribution and statistical method of feature determine topic ownership the similarity degree between topic to judge to report in turn.
Adaptive Topic Tracking can update topic mould according to the report dynamic traced on the basis of traditional Topic Tracking Type recycles model treatment follow-up report adjusted.
With the increasingly mature of Topic Tracking technology, in network public-opinion monitoring, hot news recommendation, financial market analysis Etc. multiple fields played great function.Traditional topic tracing task only provides 1 to 4 report relevant to topic as first Data are tested, in the case where no topic other correlated characteristics, Topic Tracking needs to establish words for so sparse priori data Inscribe model and trace model, then will newly arrive in follow-up report stream report and existing topic model carry out similarity calculation and Threshold value comparison identifies the report of associated topic.Actualite tracking be often used text classification algorithm come to newly arrive report into Jargon topic ownership, such as KNN, decision tree, support vector machines sorting algorithm, wherein KNN sorting algorithm is theoretical mature, in priori number Relatively satisfactory result can be also obtained according in the case where less.But when facing large-scale data, traditional topic tracking will Face following problem:
(1) under the situation of large-scale data, the relevant report sample of single topic is far more than one to four, this is to needs Other sorting algorithms of relatively more training dataset provide application scenarios.For traditional KNN algorithm, when the report of tracking Road could will therefrom select K arest neighbors with all priori data progress similarity calculations after entering system and then determine topic Ownership, when system long-play or there are multiple topics simultaneously be tracked when, a large amount of priori data will lead to calculating Complexity steeply rises, and tracing task executes slow.
(2) when in face of extensive report stream, it is far above one into systematic account in a bit of time, if by this The a little modes of data traditionally carry out serial process, first is that treatment effeciency is lower, every data will wait previous data Being disposed can just be executed, second is that the data first handled are less compared with the information that the data of post-processing obtain, because first handling Data and the information of unbonded post-processing data handled, increase tracking result for reporting the sensitivity of order of arrival Property.
Summary of the invention
The object of the invention is that providing a kind of parallelization topic based on frequent term set to solve the above-mentioned problems Tracking.
The present invention is achieved through the following technical solutions above-mentioned purpose:
A kind of parallelization Topic Tracking method based on frequent term set, includes the following steps:
S1, certain amount in report stream or multiple texts in a period of time are calculated by Text Clustering Algorithm it is more A topic text set;
S2, it is calculated by parallelization to the progress frequent term set excavation of topic text set;
S3, frequent term set is concurrently converted to by frequent term vector collection by term vector model, calculates the frequent of report stream Similarity between term vector collection and the frequent term vector collection of priori report;
S4, similarity and the Topic Tracking threshold value of setting are compared, determines topic ownership, completes Topic Tracking.
Specifically, the Text Clustering Algorithm in above-mentioned steps S1 includes the following steps:
A1, to text carry out word segmentation processing, remove word segmentation result in punctuation mark, onomatopoeia, interjection, auxiliary word, conjunction, Preposition, adverbial word, number and quantifier;
A2, the feature vector that each text is calculated using the vector space model based on TF-IDF weight mechanism;
A3, Text eigenvector is clustered using Single-Pass algorithm, obtains multiple topic text sets.
Specifically, the acquisition methods of the topic text set in above-mentioned steps A3 include the following steps:
A1, all texts of traversal create a text set if the text is first text;
If not a2, first text, the cosine similarity of the text and all processed texts is calculated, calculation formula is:
In formula:dj=(ω1,j2,j,…ωi,j…ωt,j), ωi,jIndicate text djThe TF-IDF of middle ith feature word Weight;
A3, cosine similarity maximum value Max, and the cluster threshold comparison with setting are taken;
A4, Max are greater than cluster threshold value, and two texts for obtaining Max are attributed to one text collection, otherwise newly-built for the text Text set;
A5, the a1-a4 that repeats the above steps obtain multiple topic text sets.
Specifically, the method for digging of frequent term set includes the following steps in above-mentioned steps S2:
B1, text set is averagely distributed on each distributed node;
Its word frequency is expanded as original 1.5 if this time appears in title by the word frequency in B2, each text of statistics Times, finally retain the high preceding n word of word frequency;
B3, amount of text in text set is judged, if amount of text is 1, using the high frequency words that it retains as frequent word Collection;
If the amount of text in B4, text set is greater than 1, using each text as a things, the high frequency words of reservation are made For the item in things, the frequent item set of text set is excavated using FP-Growth algorithm and obtains the word frequency word set of text set, specifically Method includes:
If obtaining no less than 5 or more frequent item sets, by the branch of the number of lexical item in frequent item set and frequent item set The product of degree of holding counting extracts maximum 3 frequent item sets of the measurement standard as measurement standard from all frequent item sets, and A set is merged into as the last frequent term set for indicating topic text set, otherwise, text collection is rejected, it is not allowed to join With subsequent Topic Tracking.
Specifically, the method for digging of frequent term set includes the following steps in above-mentioned steps S3:
C1, term vector expression is carried out to the word in frequent term set using Word2vec term vector model, by frequent word Collection is converted to frequent term vector collection;
C2, the cosine similarity for calculating frequent term vector collection in report stream and priori report, calculation formula are as follows:
In formula:X is frequent term vector collection, X=(x in report stream1,x2,…,xn), xiIt indicates i-th in frequent term vector collection X A term vector;
Y is frequent term vector collection, Y=(y in priori report1,y2,…,ym), yjIt indicates in frequent term vector collection Y j-th Term vector;
C3, similarity matrix S is obtained, wherein Si,jIndicate xiWith yjCosine similarity;
C4, the maximum value for calculating similarity matrix S each row of data, and it is weighed according to the term vector of frequent term vector collection X It sums again, obtains l1, calculating formula is as follows:
In formula:XiIndicate xiWeight in frequent term vector collection X;
C5, the maximum value for calculating the every column data of similarity matrix S, and it is weighed according to the term vector of frequent term vector collection Y It sums again, obtains l2, calculating formula is as follows:
In formula:YjIndicate yjWeight in frequent term vector collection Y;
C6, to l1And l2Average, obtain report stream frequent term vector collection and priori report frequent term vector collection it Between similarity.
Preferably, the Word2vec term vector model used in above-mentioned steps C1 is by Skip-gram model and level Softmax method is obtained by a large amount of corpus of training.
Specifically, have in above-mentioned steps S4 and include the following steps:
D1, calculate a frequent term vector collection X and the priori in report stream report in all frequent term vector collection Y it is similar Degree takes out maximum similarity maxX,Y
D2, compare maxX,YWith the size of Topic Tracking threshold value, work as maxX,YIt, will be in report stream when greater than Topic Tracking threshold value The corresponding text set of frequent term vector collection X is included into the text set in the corresponding priori report of frequent term vector collection Y, complete topic with Track.
The present invention is based on the beneficial effects of the parallelization Topic Tracking method of frequent term set to be:
1, topic text set is indicated using word set, greatly reduces similarity calculation expense;
2, it proposes based on the similarity calculating method of Word2vec term vector model to calculate the similarity between word set, it can be with The accuracy of similarity-rough set between raising word set;
3, frequent term set excavation and term vector conversion process are carried out using the calculation of parallelization, taken full advantage of parallel The efficiency of Topic Tracking can be improved in the advantages of calculating.
Detailed description of the invention
Fig. 1 is the flow chart of the parallelization Topic Tracking method of the present invention based on frequent term set;
Fig. 2 is the flow chart of step A3 in the present invention.
Specific embodiment
The present invention will be further explained below with reference to the attached drawings:
As shown in Figure 1, a kind of parallelization Topic Tracking method based on frequent term set of the present invention, includes the following steps:
1, select a certain number of report streams or the report streams in a period of time, quantity or time by user from Row setting, does not do Compulsory Feature.
2, to text carry out word segmentation processing, remove word segmentation result in punctuation mark, onomatopoeia, interjection, auxiliary word, conjunction, Preposition, adverbial word, number and quantifier, using the vector space model based on TF-IDF weight mechanism calculate the feature of each text to Amount;
3, all texts are traversed, if the text is first text, create a text set, if not first text, meter The cosine similarity of the text and all processed texts is calculated, calculation formula is:
In formula:dj=(ω1,j2,j,…ωi,j…ωt,j), ωi,jIndicate text djThe TF-IDF of middle ith feature word Weight.
4, cosine similarity maximum value Max, and the cluster threshold comparison with setting, are taken, Max is greater than cluster threshold value, will Two texts to Max are attributed to one text collection, otherwise create text set for the text, as shown in Figure 2.
5, text set is averagely distributed on each distributed node, each node counts that be distributed to the node each respectively Word frequency in text realizes parallelization processing, if this time appears in title, its word frequency is expanded as original 1.5 times, most Retain the high preceding n word of word frequency afterwards, is set as 10 in the present embodiment;
6, the amount of text in text set is judged, if amount of text is 1, using the high frequency words that it retains as frequent word Collection, if the amount of text in text set is greater than 1, using each text as a things, the high frequency words of reservation are as in things Item, setting minimum support is 0.5 in the present embodiment, using FP-Growth algorithm excavate text set frequent item set and To the word frequency word set of text set, specific method includes:
If obtaining no less than 5 or more frequent item sets, by the branch of the number of lexical item in frequent item set and frequent item set The product of degree of holding counting extracts maximum 3 frequent item sets of the measurement standard as measurement standard from all frequent item sets, and A set is merged into as the last frequent term set for indicating topic text set, otherwise, text collection is rejected, it is not allowed to join With subsequent Topic Tracking.
7, term vector expression is carried out to the word in frequent term set using Word2vec term vector model, by frequent term set It is converted to frequent term vector collection, Word2vec term vector model passes through instruction by Skip-gram model and level Softmax method Practice a large amount of corpus to obtain.
8, the cosine similarity of frequent term vector collection in report stream and priori report is calculated, calculation formula is as follows:
In formula:X is frequent term vector collection, X=(x in report stream1,x2,…,xn), xiIt indicates i-th in frequent term vector collection X A term vector;Y is frequent term vector collection, Y=(y in priori report1,y2,…,ym), yjIt indicates in frequent term vector collection Y j-th Term vector;
9, similarity matrix S is obtained, wherein Si,jIndicate xiWith yjCosine similarity;
The maximum value of similarity matrix S each row of data is calculated, and it is asked according to the term vector weight of frequent term vector collection X With obtain l1, the maximum value of the every column data of similarity matrix S is calculated, and by it according to the term vector weight of frequent term vector collection Y Summation, obtains l2, calculating formula is as follows:
In formula:XiIndicate xiWeight in frequent term vector collection X, YjIndicate yjWeight in frequent term vector collection Y,
10, to l1And l2Average, obtain report stream frequent term vector collection and priori report frequent term vector collection it Between similarity.
11, calculate a frequent term vector collection X and priori in report stream report in all frequent term vector collection Y it is similar Degree takes out maximum similarity maxX,Y, compare maxX,YWith the size of Topic Tracking threshold value (user sets according to demand), when maxX,YWhen greater than Topic Tracking threshold value, by frequently the corresponding text set of term vector collection X is included into frequent term vector collection Y in report stream Text set in corresponding priori report, completes Topic Tracking.
The limitation that technical solution of the present invention is not limited to the above specific embodiments, it is all to do according to the technique and scheme of the present invention Technology deformation out, falls within the scope of protection of the present invention.

Claims (7)

1. a kind of parallelization Topic Tracking method based on frequent term set, which is characterized in that include the following steps:
S1, multiple words are calculated by Text Clustering Algorithm in certain amount or multiple texts in a period of time in report stream Inscribe text set;
S2, it is calculated by parallelization to the progress frequent term set excavation of topic text set;
S3, frequent term set is concurrently converted to by frequent term vector collection by term vector model, calculate the frequent word of report stream to Similarity between quantity set and the frequent term vector collection of priori report;
S4, similarity and the Topic Tracking threshold value of setting are compared, determines topic ownership, completes Topic Tracking.
2. the parallelization Topic Tracking method according to claim 1 based on frequent term set, it is characterised in that:Above-mentioned steps Text Clustering Algorithm in S1 includes the following steps:
A1, word segmentation processing is carried out to text, removes punctuation mark, onomatopoeia, interjection, auxiliary word, conjunction, Jie in word segmentation result Word, adverbial word, number and quantifier;
A2, the feature vector that each text is calculated using the vector space model based on TF-IDF weight mechanism;
A3, Text eigenvector is clustered using Single-Pass algorithm, obtains multiple topic text sets.
3. the parallelization Topic Tracking method according to claim 2 based on frequent term set, it is characterised in that:Above-mentioned steps The acquisition methods of topic text set in A3 include the following steps:
A1, all texts of traversal create a text set if the text is first text;
If not a2, first text, the cosine similarity of the text and all processed texts is calculated, calculation formula is:
In formula:dj=(ω1,j2,j,…ωi,j…ωt,j), ωi,jIndicate text djThe TF-IDF weight of middle ith feature word;
A3, cosine similarity maximum value Max, and the cluster threshold comparison with setting are taken;
A4, Max are greater than cluster threshold value, and two texts for obtaining Max are attributed to one text collection, otherwise create text for the text Collection;
A5, the a1-a4 that repeats the above steps obtain multiple topic text sets.
4. the parallelization Topic Tracking method according to claim 1 based on frequent term set, it is characterised in that:Above-mentioned steps The method for digging of frequent term set includes the following steps in S2:
B1, text set is averagely distributed on each distributed node;
Its word frequency is expanded as original 1.5 times, most if this time appears in title by the word frequency in B2, each text of statistics Retain the high preceding n word of word frequency afterwards;
B3, amount of text in text set is judged, if amount of text is 1, using the high frequency words that it retains as frequent term set;
If the amount of text in B4, text set is greater than 1, using each text as a things, the high frequency words of reservation are as thing Item in object excavates the frequent item set of text set using FP-Growth algorithm and obtains the word frequency word set of text set, specific method Including:
If obtaining no less than 5 or more frequent item sets, by the support of the number of lexical item in frequent item set and frequent item set The product of counting extracts maximum 3 frequent item sets of the measurement standard as measurement standard from all frequent item sets, and by its A set is merged into as the last frequent term set for indicating topic text set, otherwise, text collection is rejected, after not allowing it to participate in Continuous Topic Tracking.
5. the parallelization Topic Tracking method according to claim 1 based on frequent term set, it is characterised in that:Above-mentioned steps The method for digging of frequent term set includes the following steps in S3:
C1, term vector expression is carried out to the word in frequent term set using Word2vec term vector model, frequent term set is turned Get frequent term vector collection in return;
C2, the cosine similarity for calculating frequent term vector collection in report stream and priori report, calculation formula are as follows:
In formula:X is frequent term vector collection, X=(x in report stream1,x2,…,xn), xiIndicate i-th of word in frequent term vector collection X Vector;
Y is frequent term vector collection, Y=(y in priori report1,y2,…,ym), yjIndicate in frequent term vector collection Y j-th of word to Amount;
C3, similarity matrix S is obtained, wherein Si,jIndicate xiWith yjCosine similarity;
C4, the maximum value for calculating similarity matrix S each row of data, and it is asked according to the term vector weight of frequent term vector collection X With obtain l1, calculating formula is as follows:
In formula:XiIndicate xiWeight in frequent term vector collection X;
C5, the maximum value for calculating the every column data of similarity matrix S, and it is asked according to the term vector weight of frequent term vector collection Y With obtain l2, calculating formula is as follows:
In formula:YjIndicate yjWeight in frequent term vector collection Y;
C6, to l1And l2It averages, obtains between the frequent term vector collection of report stream and the frequent term vector collection of priori report Similarity.
6. the parallelization Topic Tracking method according to claim 5 based on frequent term set, it is characterised in that:Above-mentioned steps The Word2vec term vector model used in C1 passes through a large amount of corpus of training by Skip-gram model and level Softmax method It obtains.
7. the parallelization Topic Tracking method according to claim 6 based on frequent term set, it is characterised in that:Above-mentioned steps Have in S4 and includes the following steps:
D1, the similarity for reporting all frequent term vector collection Y in a frequent term vector collection X and priori report in stream is calculated, Take out maximum similarity maxX,Y
D2, compare maxX,YWith the size of Topic Tracking threshold value, work as maxX,YIt, will be frequent in report stream when greater than Topic Tracking threshold value The corresponding text set of term vector collection X is included into the text set in the corresponding priori report of frequent term vector collection Y, completes Topic Tracking.
CN201810585627.6A 2018-06-08 2018-06-08 Parallelization Topic Tracking method based on frequent term set Pending CN108874974A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810585627.6A CN108874974A (en) 2018-06-08 2018-06-08 Parallelization Topic Tracking method based on frequent term set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810585627.6A CN108874974A (en) 2018-06-08 2018-06-08 Parallelization Topic Tracking method based on frequent term set

Publications (1)

Publication Number Publication Date
CN108874974A true CN108874974A (en) 2018-11-23

Family

ID=64338708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810585627.6A Pending CN108874974A (en) 2018-06-08 2018-06-08 Parallelization Topic Tracking method based on frequent term set

Country Status (1)

Country Link
CN (1) CN108874974A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444337A (en) * 2020-02-27 2020-07-24 桂林电子科技大学 Topic tracking method based on improved K L divergence
CN111767730A (en) * 2020-07-07 2020-10-13 腾讯科技(深圳)有限公司 Event type identification method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202518A (en) * 2016-07-22 2016-12-07 桂林电子科技大学 Based on CHI and the short text classification method of sub-category association rule algorithm
CN106886613A (en) * 2017-05-03 2017-06-23 成都云数未来信息科学有限公司 A kind of Text Clustering Method of parallelization
CN107862070A (en) * 2017-11-22 2018-03-30 华南理工大学 Online class based on text cluster discusses the instant group technology of short text and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202518A (en) * 2016-07-22 2016-12-07 桂林电子科技大学 Based on CHI and the short text classification method of sub-category association rule algorithm
CN106886613A (en) * 2017-05-03 2017-06-23 成都云数未来信息科学有限公司 A kind of Text Clustering Method of parallelization
CN107862070A (en) * 2017-11-22 2018-03-30 华南理工大学 Online class based on text cluster discusses the instant group technology of short text and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吕伟: "微博热点话题检测与跟踪技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444337A (en) * 2020-02-27 2020-07-24 桂林电子科技大学 Topic tracking method based on improved K L divergence
CN111444337B (en) * 2020-02-27 2022-07-19 桂林电子科技大学 Topic tracking method based on improved KL divergence
CN111767730A (en) * 2020-07-07 2020-10-13 腾讯科技(深圳)有限公司 Event type identification method and device
CN111767730B (en) * 2020-07-07 2023-09-22 腾讯科技(深圳)有限公司 Event type identification method and device

Similar Documents

Publication Publication Date Title
CN110825877A (en) Semantic similarity analysis method based on text clustering
CN104462184B (en) A kind of large-scale data abnormality recognition method based on two-way sampling combination
CN109359172B (en) Entity alignment optimization method based on graph partitioning
CN109886294A (en) Knowledge fusion method, apparatus, computer equipment and storage medium
CN100416560C (en) Method and apparatus for clustered evolving data flow through on-line and off-line assembly
CN111639497A (en) Abnormal behavior discovery method based on big data machine learning
CN103294817A (en) Text feature extraction method based on categorical distribution probability
CN110633371A (en) Log classification method and system
CN102184186A (en) Multi-feature adaptive fusion-based image retrieval method
CN104239553A (en) Entity recognition method based on Map-Reduce framework
CN110619084B (en) Method for recommending books according to borrowing behaviors of library readers
CN106599915A (en) Vehicle-mounted laser point cloud classification method
CN112087316B (en) Network anomaly root cause positioning method based on anomaly data analysis
CN105320764A (en) 3D model retrieval method and 3D model retrieval apparatus based on slow increment features
CN103778206A (en) Method for providing network service resources
CN114997344B (en) Multi-source data planning method and system based on urban brain
CN113052225A (en) Alarm convergence method and device based on clustering algorithm and time sequence association rule
CN114817575B (en) Large-scale electric power affair map processing method based on extended model
CN111090811A (en) Method and system for extracting massive news hot topics
Ye et al. Hydrologic time series anomaly detection based on flink
CN108874974A (en) Parallelization Topic Tracking method based on frequent term set
CN109947948B (en) Knowledge graph representation learning method and system based on tensor
CN109977131A (en) A kind of house type matching system
CN110929509B (en) Domain event trigger word clustering method based on louvain community discovery algorithm
CN113344128A (en) Micro-cluster-based industrial Internet of things adaptive stream clustering method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181123

RJ01 Rejection of invention patent application after publication