CN108874974A - Parallelization Topic Tracking method based on frequent term set - Google Patents
Parallelization Topic Tracking method based on frequent term set Download PDFInfo
- Publication number
- CN108874974A CN108874974A CN201810585627.6A CN201810585627A CN108874974A CN 108874974 A CN108874974 A CN 108874974A CN 201810585627 A CN201810585627 A CN 201810585627A CN 108874974 A CN108874974 A CN 108874974A
- Authority
- CN
- China
- Prior art keywords
- text
- frequent
- term vector
- topic
- frequent term
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
Abstract
The parallelization Topic Tracking method based on frequent term set that the invention discloses a kind of, including multiple topic text sets are calculated by Text Clustering Algorithm in certain amount or multiple texts in a period of time in report stream;It is calculated by parallelization and frequent term set excavation is carried out to topic text set;Pass through the parallel similarity that frequent term set is converted to frequent term vector collection, calculates between the frequent term vector collection of report stream and the frequent term vector collection of priori report of term vector model;Similarity and the Topic Tracking threshold value of setting are compared, topic ownership is determined, completes Topic Tracking.The present invention indicates topic text set using word set, reduces similarity calculation expense;It proposes to calculate the similarity between word set based on the similarity calculating method of Word2vec term vector model, the accuracy of similarity-rough set between word set can be improved;The advantages of being carried out frequent term set excavation and term vector conversion process using the calculation of parallelization, parallel computation is utilized, improves the efficiency of Topic Tracking.
Description
Technical field
The present invention relates to network information processing field more particularly to a kind of parallelization Topic Tracking sides based on frequent term set
Method.
Background technique
With the rapid development of information network technique and further popularizing for internet, the data on network present geometric
Growth, data " explosion " have become one of the feature in current network epoch.The internet information of magnanimity is difficult to user fastly
Speed and therefrom obtain useful information at a glance and specific information given more sustained attention, in order to alleviate existing information amount
The problem of overload, people, being capable of the current hot topics of quick obtaining there is an urgent need to a kind of more efficiently information acquiring pattern
With the subsequent relevant report of oneself content of interest.
Topic Tracking technology can collect in relative text according to known topic from follow-up text information flow
Hold, can help people obtain associated topic follow-up report, Topic Tracking be divided into traditional Topic Tracking and adaptive topic with
Track.
Traditional Topic Tracking mainly according to the subsequent relevant report of topic model tracking of priori, is divided into knowledge based and is based on
Two kinds of research directions are counted, the former finds topic belonging to report based on specific domain knowledge, and the latter mainly passes through report
The probability distribution and statistical method of feature determine topic ownership the similarity degree between topic to judge to report in turn.
Adaptive Topic Tracking can update topic mould according to the report dynamic traced on the basis of traditional Topic Tracking
Type recycles model treatment follow-up report adjusted.
With the increasingly mature of Topic Tracking technology, in network public-opinion monitoring, hot news recommendation, financial market analysis
Etc. multiple fields played great function.Traditional topic tracing task only provides 1 to 4 report relevant to topic as first
Data are tested, in the case where no topic other correlated characteristics, Topic Tracking needs to establish words for so sparse priori data
Inscribe model and trace model, then will newly arrive in follow-up report stream report and existing topic model carry out similarity calculation and
Threshold value comparison identifies the report of associated topic.Actualite tracking be often used text classification algorithm come to newly arrive report into
Jargon topic ownership, such as KNN, decision tree, support vector machines sorting algorithm, wherein KNN sorting algorithm is theoretical mature, in priori number
Relatively satisfactory result can be also obtained according in the case where less.But when facing large-scale data, traditional topic tracking will
Face following problem:
(1) under the situation of large-scale data, the relevant report sample of single topic is far more than one to four, this is to needs
Other sorting algorithms of relatively more training dataset provide application scenarios.For traditional KNN algorithm, when the report of tracking
Road could will therefrom select K arest neighbors with all priori data progress similarity calculations after entering system and then determine topic
Ownership, when system long-play or there are multiple topics simultaneously be tracked when, a large amount of priori data will lead to calculating
Complexity steeply rises, and tracing task executes slow.
(2) when in face of extensive report stream, it is far above one into systematic account in a bit of time, if by this
The a little modes of data traditionally carry out serial process, first is that treatment effeciency is lower, every data will wait previous data
Being disposed can just be executed, second is that the data first handled are less compared with the information that the data of post-processing obtain, because first handling
Data and the information of unbonded post-processing data handled, increase tracking result for reporting the sensitivity of order of arrival
Property.
Summary of the invention
The object of the invention is that providing a kind of parallelization topic based on frequent term set to solve the above-mentioned problems
Tracking.
The present invention is achieved through the following technical solutions above-mentioned purpose:
A kind of parallelization Topic Tracking method based on frequent term set, includes the following steps:
S1, certain amount in report stream or multiple texts in a period of time are calculated by Text Clustering Algorithm it is more
A topic text set;
S2, it is calculated by parallelization to the progress frequent term set excavation of topic text set;
S3, frequent term set is concurrently converted to by frequent term vector collection by term vector model, calculates the frequent of report stream
Similarity between term vector collection and the frequent term vector collection of priori report;
S4, similarity and the Topic Tracking threshold value of setting are compared, determines topic ownership, completes Topic Tracking.
Specifically, the Text Clustering Algorithm in above-mentioned steps S1 includes the following steps:
A1, to text carry out word segmentation processing, remove word segmentation result in punctuation mark, onomatopoeia, interjection, auxiliary word, conjunction,
Preposition, adverbial word, number and quantifier;
A2, the feature vector that each text is calculated using the vector space model based on TF-IDF weight mechanism;
A3, Text eigenvector is clustered using Single-Pass algorithm, obtains multiple topic text sets.
Specifically, the acquisition methods of the topic text set in above-mentioned steps A3 include the following steps:
A1, all texts of traversal create a text set if the text is first text;
If not a2, first text, the cosine similarity of the text and all processed texts is calculated, calculation formula is:
In formula:dj=(ω1,j,ω2,j,…ωi,j…ωt,j), ωi,jIndicate text djThe TF-IDF of middle ith feature word
Weight;
A3, cosine similarity maximum value Max, and the cluster threshold comparison with setting are taken;
A4, Max are greater than cluster threshold value, and two texts for obtaining Max are attributed to one text collection, otherwise newly-built for the text
Text set;
A5, the a1-a4 that repeats the above steps obtain multiple topic text sets.
Specifically, the method for digging of frequent term set includes the following steps in above-mentioned steps S2:
B1, text set is averagely distributed on each distributed node;
Its word frequency is expanded as original 1.5 if this time appears in title by the word frequency in B2, each text of statistics
Times, finally retain the high preceding n word of word frequency;
B3, amount of text in text set is judged, if amount of text is 1, using the high frequency words that it retains as frequent word
Collection;
If the amount of text in B4, text set is greater than 1, using each text as a things, the high frequency words of reservation are made
For the item in things, the frequent item set of text set is excavated using FP-Growth algorithm and obtains the word frequency word set of text set, specifically
Method includes:
If obtaining no less than 5 or more frequent item sets, by the branch of the number of lexical item in frequent item set and frequent item set
The product of degree of holding counting extracts maximum 3 frequent item sets of the measurement standard as measurement standard from all frequent item sets, and
A set is merged into as the last frequent term set for indicating topic text set, otherwise, text collection is rejected, it is not allowed to join
With subsequent Topic Tracking.
Specifically, the method for digging of frequent term set includes the following steps in above-mentioned steps S3:
C1, term vector expression is carried out to the word in frequent term set using Word2vec term vector model, by frequent word
Collection is converted to frequent term vector collection;
C2, the cosine similarity for calculating frequent term vector collection in report stream and priori report, calculation formula are as follows:
In formula:X is frequent term vector collection, X=(x in report stream1,x2,…,xn), xiIt indicates i-th in frequent term vector collection X
A term vector;
Y is frequent term vector collection, Y=(y in priori report1,y2,…,ym), yjIt indicates in frequent term vector collection Y j-th
Term vector;
C3, similarity matrix S is obtained, wherein Si,jIndicate xiWith yjCosine similarity;
C4, the maximum value for calculating similarity matrix S each row of data, and it is weighed according to the term vector of frequent term vector collection X
It sums again, obtains l1, calculating formula is as follows:
In formula:XiIndicate xiWeight in frequent term vector collection X;
C5, the maximum value for calculating the every column data of similarity matrix S, and it is weighed according to the term vector of frequent term vector collection Y
It sums again, obtains l2, calculating formula is as follows:
In formula:YjIndicate yjWeight in frequent term vector collection Y;
C6, to l1And l2Average, obtain report stream frequent term vector collection and priori report frequent term vector collection it
Between similarity.
Preferably, the Word2vec term vector model used in above-mentioned steps C1 is by Skip-gram model and level
Softmax method is obtained by a large amount of corpus of training.
Specifically, have in above-mentioned steps S4 and include the following steps:
D1, calculate a frequent term vector collection X and the priori in report stream report in all frequent term vector collection Y it is similar
Degree takes out maximum similarity maxX,Y;
D2, compare maxX,YWith the size of Topic Tracking threshold value, work as maxX,YIt, will be in report stream when greater than Topic Tracking threshold value
The corresponding text set of frequent term vector collection X is included into the text set in the corresponding priori report of frequent term vector collection Y, complete topic with
Track.
The present invention is based on the beneficial effects of the parallelization Topic Tracking method of frequent term set to be:
1, topic text set is indicated using word set, greatly reduces similarity calculation expense;
2, it proposes based on the similarity calculating method of Word2vec term vector model to calculate the similarity between word set, it can be with
The accuracy of similarity-rough set between raising word set;
3, frequent term set excavation and term vector conversion process are carried out using the calculation of parallelization, taken full advantage of parallel
The efficiency of Topic Tracking can be improved in the advantages of calculating.
Detailed description of the invention
Fig. 1 is the flow chart of the parallelization Topic Tracking method of the present invention based on frequent term set;
Fig. 2 is the flow chart of step A3 in the present invention.
Specific embodiment
The present invention will be further explained below with reference to the attached drawings:
As shown in Figure 1, a kind of parallelization Topic Tracking method based on frequent term set of the present invention, includes the following steps:
1, select a certain number of report streams or the report streams in a period of time, quantity or time by user from
Row setting, does not do Compulsory Feature.
2, to text carry out word segmentation processing, remove word segmentation result in punctuation mark, onomatopoeia, interjection, auxiliary word, conjunction,
Preposition, adverbial word, number and quantifier, using the vector space model based on TF-IDF weight mechanism calculate the feature of each text to
Amount;
3, all texts are traversed, if the text is first text, create a text set, if not first text, meter
The cosine similarity of the text and all processed texts is calculated, calculation formula is:
In formula:dj=(ω1,j,ω2,j,…ωi,j…ωt,j), ωi,jIndicate text djThe TF-IDF of middle ith feature word
Weight.
4, cosine similarity maximum value Max, and the cluster threshold comparison with setting, are taken, Max is greater than cluster threshold value, will
Two texts to Max are attributed to one text collection, otherwise create text set for the text, as shown in Figure 2.
5, text set is averagely distributed on each distributed node, each node counts that be distributed to the node each respectively
Word frequency in text realizes parallelization processing, if this time appears in title, its word frequency is expanded as original 1.5 times, most
Retain the high preceding n word of word frequency afterwards, is set as 10 in the present embodiment;
6, the amount of text in text set is judged, if amount of text is 1, using the high frequency words that it retains as frequent word
Collection, if the amount of text in text set is greater than 1, using each text as a things, the high frequency words of reservation are as in things
Item, setting minimum support is 0.5 in the present embodiment, using FP-Growth algorithm excavate text set frequent item set and
To the word frequency word set of text set, specific method includes:
If obtaining no less than 5 or more frequent item sets, by the branch of the number of lexical item in frequent item set and frequent item set
The product of degree of holding counting extracts maximum 3 frequent item sets of the measurement standard as measurement standard from all frequent item sets, and
A set is merged into as the last frequent term set for indicating topic text set, otherwise, text collection is rejected, it is not allowed to join
With subsequent Topic Tracking.
7, term vector expression is carried out to the word in frequent term set using Word2vec term vector model, by frequent term set
It is converted to frequent term vector collection, Word2vec term vector model passes through instruction by Skip-gram model and level Softmax method
Practice a large amount of corpus to obtain.
8, the cosine similarity of frequent term vector collection in report stream and priori report is calculated, calculation formula is as follows:
In formula:X is frequent term vector collection, X=(x in report stream1,x2,…,xn), xiIt indicates i-th in frequent term vector collection X
A term vector;Y is frequent term vector collection, Y=(y in priori report1,y2,…,ym), yjIt indicates in frequent term vector collection Y j-th
Term vector;
9, similarity matrix S is obtained, wherein Si,jIndicate xiWith yjCosine similarity;
The maximum value of similarity matrix S each row of data is calculated, and it is asked according to the term vector weight of frequent term vector collection X
With obtain l1, the maximum value of the every column data of similarity matrix S is calculated, and by it according to the term vector weight of frequent term vector collection Y
Summation, obtains l2, calculating formula is as follows:
In formula:XiIndicate xiWeight in frequent term vector collection X, YjIndicate yjWeight in frequent term vector collection Y,
10, to l1And l2Average, obtain report stream frequent term vector collection and priori report frequent term vector collection it
Between similarity.
11, calculate a frequent term vector collection X and priori in report stream report in all frequent term vector collection Y it is similar
Degree takes out maximum similarity maxX,Y, compare maxX,YWith the size of Topic Tracking threshold value (user sets according to demand), when
maxX,YWhen greater than Topic Tracking threshold value, by frequently the corresponding text set of term vector collection X is included into frequent term vector collection Y in report stream
Text set in corresponding priori report, completes Topic Tracking.
The limitation that technical solution of the present invention is not limited to the above specific embodiments, it is all to do according to the technique and scheme of the present invention
Technology deformation out, falls within the scope of protection of the present invention.
Claims (7)
1. a kind of parallelization Topic Tracking method based on frequent term set, which is characterized in that include the following steps:
S1, multiple words are calculated by Text Clustering Algorithm in certain amount or multiple texts in a period of time in report stream
Inscribe text set;
S2, it is calculated by parallelization to the progress frequent term set excavation of topic text set;
S3, frequent term set is concurrently converted to by frequent term vector collection by term vector model, calculate the frequent word of report stream to
Similarity between quantity set and the frequent term vector collection of priori report;
S4, similarity and the Topic Tracking threshold value of setting are compared, determines topic ownership, completes Topic Tracking.
2. the parallelization Topic Tracking method according to claim 1 based on frequent term set, it is characterised in that:Above-mentioned steps
Text Clustering Algorithm in S1 includes the following steps:
A1, word segmentation processing is carried out to text, removes punctuation mark, onomatopoeia, interjection, auxiliary word, conjunction, Jie in word segmentation result
Word, adverbial word, number and quantifier;
A2, the feature vector that each text is calculated using the vector space model based on TF-IDF weight mechanism;
A3, Text eigenvector is clustered using Single-Pass algorithm, obtains multiple topic text sets.
3. the parallelization Topic Tracking method according to claim 2 based on frequent term set, it is characterised in that:Above-mentioned steps
The acquisition methods of topic text set in A3 include the following steps:
A1, all texts of traversal create a text set if the text is first text;
If not a2, first text, the cosine similarity of the text and all processed texts is calculated, calculation formula is:
In formula:dj=(ω1,j,ω2,j,…ωi,j…ωt,j), ωi,jIndicate text djThe TF-IDF weight of middle ith feature word;
A3, cosine similarity maximum value Max, and the cluster threshold comparison with setting are taken;
A4, Max are greater than cluster threshold value, and two texts for obtaining Max are attributed to one text collection, otherwise create text for the text
Collection;
A5, the a1-a4 that repeats the above steps obtain multiple topic text sets.
4. the parallelization Topic Tracking method according to claim 1 based on frequent term set, it is characterised in that:Above-mentioned steps
The method for digging of frequent term set includes the following steps in S2:
B1, text set is averagely distributed on each distributed node;
Its word frequency is expanded as original 1.5 times, most if this time appears in title by the word frequency in B2, each text of statistics
Retain the high preceding n word of word frequency afterwards;
B3, amount of text in text set is judged, if amount of text is 1, using the high frequency words that it retains as frequent term set;
If the amount of text in B4, text set is greater than 1, using each text as a things, the high frequency words of reservation are as thing
Item in object excavates the frequent item set of text set using FP-Growth algorithm and obtains the word frequency word set of text set, specific method
Including:
If obtaining no less than 5 or more frequent item sets, by the support of the number of lexical item in frequent item set and frequent item set
The product of counting extracts maximum 3 frequent item sets of the measurement standard as measurement standard from all frequent item sets, and by its
A set is merged into as the last frequent term set for indicating topic text set, otherwise, text collection is rejected, after not allowing it to participate in
Continuous Topic Tracking.
5. the parallelization Topic Tracking method according to claim 1 based on frequent term set, it is characterised in that:Above-mentioned steps
The method for digging of frequent term set includes the following steps in S3:
C1, term vector expression is carried out to the word in frequent term set using Word2vec term vector model, frequent term set is turned
Get frequent term vector collection in return;
C2, the cosine similarity for calculating frequent term vector collection in report stream and priori report, calculation formula are as follows:
In formula:X is frequent term vector collection, X=(x in report stream1,x2,…,xn), xiIndicate i-th of word in frequent term vector collection X
Vector;
Y is frequent term vector collection, Y=(y in priori report1,y2,…,ym), yjIndicate in frequent term vector collection Y j-th of word to
Amount;
C3, similarity matrix S is obtained, wherein Si,jIndicate xiWith yjCosine similarity;
C4, the maximum value for calculating similarity matrix S each row of data, and it is asked according to the term vector weight of frequent term vector collection X
With obtain l1, calculating formula is as follows:
In formula:XiIndicate xiWeight in frequent term vector collection X;
C5, the maximum value for calculating the every column data of similarity matrix S, and it is asked according to the term vector weight of frequent term vector collection Y
With obtain l2, calculating formula is as follows:
In formula:YjIndicate yjWeight in frequent term vector collection Y;
C6, to l1And l2It averages, obtains between the frequent term vector collection of report stream and the frequent term vector collection of priori report
Similarity.
6. the parallelization Topic Tracking method according to claim 5 based on frequent term set, it is characterised in that:Above-mentioned steps
The Word2vec term vector model used in C1 passes through a large amount of corpus of training by Skip-gram model and level Softmax method
It obtains.
7. the parallelization Topic Tracking method according to claim 6 based on frequent term set, it is characterised in that:Above-mentioned steps
Have in S4 and includes the following steps:
D1, the similarity for reporting all frequent term vector collection Y in a frequent term vector collection X and priori report in stream is calculated,
Take out maximum similarity maxX,Y;
D2, compare maxX,YWith the size of Topic Tracking threshold value, work as maxX,YIt, will be frequent in report stream when greater than Topic Tracking threshold value
The corresponding text set of term vector collection X is included into the text set in the corresponding priori report of frequent term vector collection Y, completes Topic Tracking.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810585627.6A CN108874974A (en) | 2018-06-08 | 2018-06-08 | Parallelization Topic Tracking method based on frequent term set |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810585627.6A CN108874974A (en) | 2018-06-08 | 2018-06-08 | Parallelization Topic Tracking method based on frequent term set |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108874974A true CN108874974A (en) | 2018-11-23 |
Family
ID=64338708
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810585627.6A Pending CN108874974A (en) | 2018-06-08 | 2018-06-08 | Parallelization Topic Tracking method based on frequent term set |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108874974A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111444337A (en) * | 2020-02-27 | 2020-07-24 | 桂林电子科技大学 | Topic tracking method based on improved K L divergence |
CN111767730A (en) * | 2020-07-07 | 2020-10-13 | 腾讯科技(深圳)有限公司 | Event type identification method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202518A (en) * | 2016-07-22 | 2016-12-07 | 桂林电子科技大学 | Based on CHI and the short text classification method of sub-category association rule algorithm |
CN106886613A (en) * | 2017-05-03 | 2017-06-23 | 成都云数未来信息科学有限公司 | A kind of Text Clustering Method of parallelization |
CN107862070A (en) * | 2017-11-22 | 2018-03-30 | 华南理工大学 | Online class based on text cluster discusses the instant group technology of short text and system |
-
2018
- 2018-06-08 CN CN201810585627.6A patent/CN108874974A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202518A (en) * | 2016-07-22 | 2016-12-07 | 桂林电子科技大学 | Based on CHI and the short text classification method of sub-category association rule algorithm |
CN106886613A (en) * | 2017-05-03 | 2017-06-23 | 成都云数未来信息科学有限公司 | A kind of Text Clustering Method of parallelization |
CN107862070A (en) * | 2017-11-22 | 2018-03-30 | 华南理工大学 | Online class based on text cluster discusses the instant group technology of short text and system |
Non-Patent Citations (1)
Title |
---|
吕伟: "微博热点话题检测与跟踪技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111444337A (en) * | 2020-02-27 | 2020-07-24 | 桂林电子科技大学 | Topic tracking method based on improved K L divergence |
CN111444337B (en) * | 2020-02-27 | 2022-07-19 | 桂林电子科技大学 | Topic tracking method based on improved KL divergence |
CN111767730A (en) * | 2020-07-07 | 2020-10-13 | 腾讯科技(深圳)有限公司 | Event type identification method and device |
CN111767730B (en) * | 2020-07-07 | 2023-09-22 | 腾讯科技(深圳)有限公司 | Event type identification method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110825877A (en) | Semantic similarity analysis method based on text clustering | |
CN104462184B (en) | A kind of large-scale data abnormality recognition method based on two-way sampling combination | |
CN109359172B (en) | Entity alignment optimization method based on graph partitioning | |
CN109886294A (en) | Knowledge fusion method, apparatus, computer equipment and storage medium | |
CN100416560C (en) | Method and apparatus for clustered evolving data flow through on-line and off-line assembly | |
CN111639497A (en) | Abnormal behavior discovery method based on big data machine learning | |
CN103294817A (en) | Text feature extraction method based on categorical distribution probability | |
CN110633371A (en) | Log classification method and system | |
CN102184186A (en) | Multi-feature adaptive fusion-based image retrieval method | |
CN104239553A (en) | Entity recognition method based on Map-Reduce framework | |
CN110619084B (en) | Method for recommending books according to borrowing behaviors of library readers | |
CN106599915A (en) | Vehicle-mounted laser point cloud classification method | |
CN112087316B (en) | Network anomaly root cause positioning method based on anomaly data analysis | |
CN105320764A (en) | 3D model retrieval method and 3D model retrieval apparatus based on slow increment features | |
CN103778206A (en) | Method for providing network service resources | |
CN114997344B (en) | Multi-source data planning method and system based on urban brain | |
CN113052225A (en) | Alarm convergence method and device based on clustering algorithm and time sequence association rule | |
CN114817575B (en) | Large-scale electric power affair map processing method based on extended model | |
CN111090811A (en) | Method and system for extracting massive news hot topics | |
Ye et al. | Hydrologic time series anomaly detection based on flink | |
CN108874974A (en) | Parallelization Topic Tracking method based on frequent term set | |
CN109947948B (en) | Knowledge graph representation learning method and system based on tensor | |
CN109977131A (en) | A kind of house type matching system | |
CN110929509B (en) | Domain event trigger word clustering method based on louvain community discovery algorithm | |
CN113344128A (en) | Micro-cluster-based industrial Internet of things adaptive stream clustering method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181123 |
|
RJ01 | Rejection of invention patent application after publication |