CN108170799A - A kind of Frequent episodes method for digging of mass data - Google Patents

A kind of Frequent episodes method for digging of mass data Download PDF

Info

Publication number
CN108170799A
CN108170799A CN201711457785.5A CN201711457785A CN108170799A CN 108170799 A CN108170799 A CN 108170799A CN 201711457785 A CN201711457785 A CN 201711457785A CN 108170799 A CN108170799 A CN 108170799A
Authority
CN
China
Prior art keywords
sequence
frequent
digging
item
frequent episodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711457785.5A
Other languages
Chinese (zh)
Inventor
王宏志
秦谦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Mingtong Tech Co Ltd
Original Assignee
Jiangsu Mingtong Tech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Mingtong Tech Co Ltd filed Critical Jiangsu Mingtong Tech Co Ltd
Priority to CN201711457785.5A priority Critical patent/CN108170799A/en
Publication of CN108170799A publication Critical patent/CN108170799A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

The invention discloses a kind of Frequent episodes method for digging of mass data, user's input time sequence data first, calculate the frequency of each, and construct frequent episode set, secondly for all frequent episodes, in the division of Map construction ω equivalences, Frequent episodes then are obtained for the independent excavation of Reduce progress that is divided in of construction, finally all Frequent episodes collection are merged, and filters out the sequence that will be repeated and exports.It, being capable of effective boosting algorithm efficiency the present invention provides a kind of division methods to input database;In excavation phase, any one may be used, and molding mining algorithm has been excavated the present invention now, easy to implement.

Description

A kind of Frequent episodes method for digging of mass data
Technical field
The present invention relates to a kind of Frequent episodes method for digging of mass data, belong to technical field of data processing.
Background technology
The innovatory algorithm based on Apriori is just proposed when earliest occurrence sequence mode excavation concept, such as: AprioriSome, AprioriAll, Dynamic-some.Later, with the improvement of algorithm, based on Apriori thoughts, and have People proposes GSP algorithms, which defines classifies as defined in time restriction, sliding time window and the user of sequence, The Frequent episodes excavated so more meet realistic meaning.Later and in succession it has been proposed that MFS algorithms and PSP algorithms, all The execution efficiency of GSP algorithms is improved.These modified hydrothermal process all derive from the thought of Apriori algorithm.But The shortcomings that Apriori has itself, that is, Multiple-Scan database is required for, when this needs huge for mass data Between efficiency, and a large amount of Candidate Set can be generated, smaller or frequent mode is longer, this is asked if support threshold obtains Topic will become very intractable.
A kind of sequential mode mining method SPADE algorithms based on vertical storage form, base are proposed by M.zaki et al. This thought is exactly:List entries database by certain means is converted into the form of equivalence class first, then utilizes letter Single connection method, using the thought of case theory come Mining Frequent sequence pattern.Its advantage is:It is calculated compared to Apriori series Method, the number of scan database greatly reduce, and mining process only needs 3 scan databases from the beginning to the end.But SPADE algorithms There are some drawbacks, that is, it needs additional memory space when the database for saying horizontal format becomes vertical format With calculate time, and the traversal method used or breadth first traversal method in this algorithm, this just needs huge The cost that Candidate key generates.
Recent years, J.Han, J.Pei et al. had also been proposed the algorithm that a kind of frequent mode based on projection increases --- FreesPan algorithms, this algorithm had evolved into PrefixsPan algorithms by improvement later, and performance further increases substantially. The advantage of FreeSpan algorithms is that it can greatly reduce the generation of candidate sequence, decreases the expense for generating candidate sequence, And it can completely find whole Frequent Sequential Patterns.But there is also some drawbacks for the algorithm, can exactly generate a large amount of Data for projection library, it is contemplated that a kind of special circumstances appear in each in input database if there is a certain pattern In sequence, then the corresponding data for projection library of this pattern would not be reduced compared to original database;Except this it Outside, if length be K subsequence may increase in any one position, then search length be (K+1) candidate sequence just It to consider each possible combination, sizable time cost will be increased.
The characteristics of multi-dimensional sequential pattern excavates is exactly to excavate that user in multidimensional information is interested, significant information, It considers other dimensional informations on the basis of common excavation sequence pattern mode.For example, for consumer spending It is accustomed in this data, the gender of consumer, the age, the information such as occupation just constitute the sequence pattern of multidimensional.This pattern contains There are more valuable information, there is higher application value.There are a variety of multi-dimensional sequential pattern mining algorithms at present, such as: The main thought of UniSeq, Seq-Dim and Dim-Seq wherein UniSeq algorithms is exactly by the multidimensional information in database It is respectively embedded in each sequence, so as to form new sequence spreading database, then can utilize PrefixSpan algorithms pair The sequence library of this new extension carries out Frequent Sequential Patterns and excavates so as to obtain multidimensional Frequent Sequential Patterns.
Frequent episodes excavation is a series of basis of significant data mining tasks, such as in text mining, Frequent episodes It is used to construct statistical language model, data recovery, information extraction and the spam detection of machine translation, word-meaning association is also It can be used for relationship extraction.In webpage usage mining and dialog analysis, Frequent episodes can represent user, and certain is common Or general behavior (Frequent episodes in such as web page access daily record).Above several situations and some simple application programs In, the excavation object that Frequent episodes excavate is huge, and is contained with hundred million as the order of magnitude sequence.Such as Microsoft's offer The right to use of one n dimension data based on hundreds billion of webpages, the expectation library more than 1,000,000,000 dimensions that Google publishes.This In the case of, a kind of Frequent episodes mining algorithm that can handle mass data just seems increasingly important.Existing method is come It says, the size of a forms data collection is huge, then computing overhead and memory are using being still very huge.
Invention content
The technical problems to be solved by the invention are that a kind of frequent sequence of mass data is provided the defects of overcoming the prior art Row method for digging, being capable of effective boosting algorithm efficiency.
In order to solve the above technical problems, the present invention provides a kind of Frequent episodes method for digging of mass data, including following Step:
1) user's input time sequence data obtains the basic statistics information of data, calculates the frequency of each ω ∈ Σ Rate, and construction set F is wanted for frequent episodeσ,0,1(D), wherein, ω represents the subsequence of input, and Σ is complete or collected works, represents input All time series set, D represent input time sequence library, and subscript σ represents support threshold, and 0 is interval threshold, and 1 is Length threshold;The frequent episode refers to, for σ>0, if meeting fγ(S, D) >=σ, then sequence S be (σ, γ)-frequently, In, fγ(S, D) represents the frequency of sequence S;
2) for frequent episodes all in Σ, in the division P of Map construction ω-equivalencesω
3) to the division P of step 2) constructionωIndependent excavation is carried out in Reduce, obtains Fσ,γ,λ(Pω), wherein, PωIt is The division of item, F centered on ωσ,γ,λ(Pω) it is PωAll length is no more than λ and meets (σ, γ)-frequent sequence in the middle;
4) F of each frequent episode for obtaining step 3)σ,γ,λ(Pω) collection merge, by repeat sequence filter fall Up to output to the end.
In aforementioned step 1), the basic statistics information of data includes the average length of time series data, length maximum Value, sequence sum, item sum, different item numbers, total bytes.
Aforementioned step 1) is completed by single MapReduce operations.
In aforementioned step 1), an integer identifiers are stated, and completely with integer identifiers for each Array represents sequence, first, integer identifiers is ranked up according to the frequency descending of item, then changes encoding using byte Item is collapsed into integer by mode.
In aforementioned step 2), construct ω-equivalence division the step of it is as follows:
2-1) examine input time sequence whether related to central term with minimality;If uncorrelated, enableIf Correlation then performs a reverse scan to input time sequence to obtain all right distances of lower target;
2-2) and then a forward scan is performed, need to be performed simultaneously the following:
(a) left distance is calculated;
(b) it carries out not reaching abbreviation;
(c) uncorrelated item is replaced with space;
(d) prefix/postfix abbreviation and space abbreviation are performed;
(e) list entries is split into several subsequences using+1 space of γ, these subsequences can be used for space Method for splitting, so as to form last output Pω
It is aforementioned before being divided, first, pass through and scan set Fσ,0,1(D), wherein the item in set is according to frequency Rate descending arranges, and adjacent item is divided into one group until their frequency and more than setting value m, traverses each, complete Into grouping;Then, it for each grouping, constructs one and individually divides.
Aforementioned uses PrefixSpan algorithms to dividing PωIt is excavated.
The advantageous effect that the present invention is reached:
(1) present invention is the distributed algorithm that the first supports gap constraint;
It (2), being capable of effective boosting algorithm efficiency the present invention provides a kind of division methods to input database;
(3) present invention compresses intermediate generation sequence the time cost that can substantially reduce algorithm;Item is divided into Group rather than a data for projection library is generated for each central term, efficiency of algorithm can be improved in this way;In excavation phase, Any one may be used, and molding mining algorithm has been excavated now.
Description of the drawings
Fig. 1 is MapReduce model schematic diagram;
Fig. 2 is example of the present invention using the processing of MapReduce programming models.
Specific embodiment
The invention will be further described below.Following embodiment is only used for the technical side for clearly illustrating the present invention Case, and be not intended to limit the protection scope of the present invention and limit the scope of the invention.
The present invention uses MapReduce programming models, comprising a Map function and a Reduce function, wherein, Map Function is used for, and one group of key-value (key-value) is right to being mapped to one group of new key-value (key-value), and Reduce functions are used Each key key to ensure the key-value of all mappings (key-value) centering shares identical key group, basic thought such as Fig. 1 It is shown.
MapReduce can handle mass data collection, and Map functions are specified by user, and key- is handled by this Map function Value (key-value) is right, and it is right to generate a series of middle k ey-value (key-value), is closed finally by Reduce functions And the value value parts of all intermediate key assignments centerings with identical key values, Fig. 2 are using at MapReduce programming models Manage the example of an example problem.The example is word number in statistic document, passes through the every of Map function statistic documents first Then the number of each word in a piecemeal sums it up the number of word in piecemeal in Reduce functions.
The present invention relates to relational language noun it is as follows:
Sequence library D={ S1... ..., SDBe list entries multiple set.The individual event collection that sequence is ordered into, and it is single Item is contained in complete or collected works' ∑ { ω1... ..., ω|∑|}.Use S=s1s2……s|S|Represent a length be | S | sequence, si∈∑(1 ≤ i≤| S |), ∑+It represents to form all nonempty sequences by the element in Σ.
In general, a list entries in input database is represented with symbol T, and symbol S represents any bar sequence.
Variable γ >=0 represents spacer maximum value.If S is the subsequence of T, and S is by between an of length no more than γ Every separating, and the sequence of continuous items composition being divided into around here in T, then we claim γ-subsequence that S is T, are expressed as The n dimensions of standard, which are excavated, is equivalent to γ=0.In general,And if only if there are subscript i1<…<inMeet 1, SK=Tik(1≤K ≤n);2、ik+1-ik-1≤γ(1≤k≤n).If for example, T=abcd, S1=acd, S2=bc, then
γ-support, that is, Sup of the sequence S of database Dγ(S, D) passes through following multiple set expression:
fγ(S, D)=| Supγ(S, D) | represent the frequency of sequence S.Here the estimation of frequency is equivalent in text mining The concept of document frequency calculates the number (rather than total degree of S appearance) for occurring the sequence of S in list entries.For σ >0, if meeting fγ(S, D) >=σ, then sequence S be (σ, γ)-frequently.
MG-FSM (Frequent episodes excavation) algorithm of the present invention is divided into three phases:1st, pretreatment stage obtains data Basic statistics information;2nd, the stage is divided, for frequent episodes all in Σ, constructs the division of ω-equivalence;3rd, excavation phase is right The division of second stage construction carries out independent excavation, can use molding Frequent episodes mining algorithm at this time, Output can be generated by being partitioned into the numerous sequential mining of line frequency to each, finally need that these outputs are filtered to obtain to the end Output.
Each stage is specific as follows:
(1) pretreatment stage:
User's input time sequence data, and it is total to obtain the average length of time series data, length maximum value, sequence Number, item sum, different item numbers, total bytes.
This stage will calculate the frequency of each ω ∈ Σ, and want construction set F for frequent episodeσ,0,1(D), Commonly referred to as f-list.Middle term of the present invention refers to the subsequence of input.Fσ,0,1(D) in, subscript σ represents support threshold, and 0 is Interval threshold, 1 is length threshold.This process can be completed by single MapReduce operations (by performing one The deformation of WordCount algorithms is ignored it and is repeated in list entries middle term).Based on this, length can be exported For 1 Frequent episodes set.For length be more than 1 sequence, using f-list define on a set Σ symbol "<”:
It is denoted as ω<ω ' works as f0(ω,D)>f0(ω′,D)
f0(ω, D) represents the item frequency smaller of the frequency of item ω, i.e. the item frequency bigger of " small " and " big ".
When having ω≤ω ' for all ω ∈ S ', claim S≤ω.The set of all sequences comprising ω, and these sequences Item in row other than ω is expressed as all no more than ω
Finally, the central term of sequence S is expressed as p (S)=minω∈SMaximal term in (S≤ω), i.e. S.It is noted that
For example, work as S=abc, and S≤c and p (S)=c.
The present invention states an integer identifiers, and represented completely with the array of integer identifiers for each Sequence.Byte can be used to change coding mode to array as compression.It compresses and refers to item is represented to become integer, such as Use the method for similar Huffman encoding.In order to make compression more efficient, integer identifiers are carried out according to the frequency descending of item Sequence.In addition to this, irrelevant item (mess code can be regarded as) is replaced, and utilize stroke length with space (identifier is -1) The thought of compression algorithm represents continuous space (such as representing two continuous spaces with identifier -2).
For all examples in the present invention, arrange the size of letter by the sorting representationb of alphabet:a<b<c……
(2) stage (Map) is divided:
Division stage and excavation phase perform in the MapReduce operations of a single.In Map parts structural division Pω (T):For each different item in list entries T ∈ D, a small sequence library P is constructedω(T) and wherein sequence is exported And key assignments.Here, it is desirable that Pω(T) with T it is (σ, γ, λ)-of equal value, wherein, σ is support threshold, and γ is interval threshold, λ It is length threshold.
It is now assumed that Pω(T)={ T }.The emphasis of the present invention, which is that, divides Pω(T) construction.
It is obtained using such a way from list entries T and divides Pω(T):
Examine list entries whether related to central term with minimality first;
If uncorrelated, enable
If related, a reverse scan to list entries is performed to obtain all right distances of lower target, is then held One forward scan of row, needs to be performed simultaneously the following:
(1) left distance is calculated;
(2) it carries out not reaching abbreviation;
(3) uncorrelated item is replaced with space;
(4) prefix/postfix abbreviation and space abbreviation are performed;
(5) list entries is split into several subsequences using+1 space of γ, these subsequences can be used for space Method for splitting, so as to form last output Pω(T)。
The present invention does not construct one for each different central term and individually divides, but for united Several central terms construct one and individually divide, each are thus allowed to divide comprising similar m or more a plurality of sequence, this is just It is grouping.Grouping is by scanning set f-list, and middle term is arranged according to frequency descending, and adjacent item is divided into one Group until they frequency and more than m.Each is traversed in this way, and the division of grouping has just divided.
(3) excavation phase (Reduce):
The input of excavation phase is the P for doing and operating by dividing the result in stageω.At this moment, it takes a kind of general FSM algorithms come to Pω(T) it is handled and can obtain Fσ,γ,λ(Pω), wherein, PωIt is the division of the item centered on ω, Fσ,γ,λ (Pω) it is PωAll length is no more than λ and meets (σ, γ)-frequent sequence in the middle.The present invention uses PrefixSpan algorithms, PrefixSpan algorithms can be referred to as the leading portion tract of prefix, input database be projected on prefix, Ran Houzai The frequent episode in data for projection library is excavated, then it is extended in prefix, is further continued for excavating, it is all frequent until finding Sequence.Whether time efficiency or space efficiency all improve very big than class Apriori algorithm.
Finally F will be obtained for each frequent episodeσ,γ,λ(Pω), by these collection merge will contain it is all Frequent episodes, but exist repeat.It is last the sequence filter repeated only to be fallen.
As it is assumed that Pω(T)={ T }, then this relationship, f are met for all sequence S for meeting ω ∈ Sγ(S, Pω)>fγ(S, D), it is clear that algorithm is correct.
It for example illustrates below, it is assumed that input database D=acb, dacbd, dacbddca, bd, bcaddbd, Addcd } and central term c.
If:If c ∈ T so Pc(T)={ T };
OtherwiseIt obviously can obtain in this way:
Pc={ acb, dacbd, dacbddbca, bcaddbd, addcd }
If using such dividing mode, PcTo be huge, to lead to huge communication-cost.In addition, based on this The P of samplecFrequent episodes mining algorithm can generate a large amount of sequence in excavation phase, but be eventually filtered, be useless Sequence.For example, F1,1,3(Pc) sequence " da, dab, add ... " etc. is contained, these are in the last filtering knot of excavation phase It can be all filtered during fruit.For from view of efficiency, these extra calculating are exactly to waste.So introduce ω-equivalence Definition, ω-equivalence will greatly reduce operation cost and communication cost.
Finally it should be noted that the Frequent episodes that the present invention excavates are not necessarily continuously, we can set one The threshold value at a interval, to excavate discrete Frequent episodes that interval is less than this threshold value.In this way, this feature can also be expanded Zhan Wei:Data redundancy or the database of loss of data mistake can be excavated, as long as the data length of continuous redundancy is little In the interval threshold of setting.
The program of MG-FSM algorithms is as follows:
Input:Sequence library D, σ, γ, λ, f-listFσ,0,1(D)
Output:Meet all sequences S and its frequency that condition is discussed in first segment.
1:Map(T):
2:for all distinctω∈T satisfyω∈Fσ,0,1(D)do
3:Construct a sequence databasePω(T)that is(ω,γ,λ)-equivalent to{T}
4:For eachS∈Pω(T),output(ω,S)
5:end for
6:
7:Reduce(ω,Pω):
8:Fσ,γ,λ(Pω)←FSMσ,γ,λ(Pω)
9:for allS∈Fσ,γ,λ(Pω)do
10:If p (S)=ω andS ≠ ω then
11:Output(s,fγ(s,Pω))
12:end if
13:end for
The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, several improvement and deformation can also be made, these are improved and deformation Also it should be regarded as protection scope of the present invention.

Claims (7)

1. the Frequent episodes method for digging of a kind of mass data, which is characterized in that include the following steps:
1) user's input time sequence data obtains the basic statistics information of data, calculates the frequency of each ω ∈ Σ, and And construction set F is wanted for frequent episodeσ,0,1(D), wherein, ω represents the subsequence of input, and Σ is complete or collected works, represents input institute sometimes Between arrangement set, D represent input time sequence library, subscript σ represent support threshold, 0 is interval threshold, and 1 is length threshold Value;The frequent episode refers to, for σ>0, if meeting fγ(S, D) >=σ, then sequence S be (σ, γ)-frequently, wherein, fγ (S, D) represents the frequency of sequence S;
2) for frequent episodes all in Σ, in the division P of Map construction ω-equivalencesω
3) to the division P of step 2) constructionωIndependent excavation is carried out in Reduce, obtains Fσ,γ,λ(Pω), wherein, PωBe using ω as The division of central term, Fσ,γ,λ(Pω) it is PωAll length is no more than λ and meets (σ, γ)-frequent sequence in the middle;
4) F of each frequent episode for obtaining step 3)σ,γ,λ(Pω) collection merges, the sequence filter repeated is fallen and obtained Last output.
A kind of 2. Frequent episodes method for digging of mass data according to claim 1, which is characterized in that the step 1) In, average length of the basic statistics information including time series data, length maximum value, the sequence of data are total, item is total, no Item number together, total bytes.
A kind of 3. Frequent episodes method for digging of mass data according to claim 1, which is characterized in that the step 1) It is completed by single MapReduce operations.
A kind of 4. Frequent episodes method for digging of mass data according to claim 1, which is characterized in that the step 1) In, an integer identifiers are stated, and represent sequence with the array of integer identifiers completely for each, first, Integer identifiers are ranked up according to the frequency descending of item, then item are collapsed into using byte variation coding mode whole Number.
A kind of 5. Frequent episodes method for digging of mass data according to claim 1, which is characterized in that the step 2) In, construct ω-equivalence division the step of it is as follows:
2-1) examine input time sequence whether related to central term with minimality;If uncorrelated, enableIf related, A reverse scan to input time sequence is then performed to obtain all right distances of lower target;
2-2) and then a forward scan is performed, need to be performed simultaneously the following:
(a) left distance is calculated;
(b) it carries out not reaching abbreviation;
(c) uncorrelated item is replaced with space;
(d) prefix/postfix abbreviation and space abbreviation are performed;
(e) list entries is split into several subsequences using+1 space of γ, these subsequences can be used for space fractionation Method, so as to form last output Pω
6. the Frequent episodes method for digging of a kind of mass data according to claim 5, which is characterized in that divided Before, first, by scanning set Fσ,0,1(D), wherein the item in set is arranged according to frequency descending, by adjacent item It is divided into one group until their frequency and more than setting value m, traverses each, complete grouping;Then, for each grouping, Construction one individually divides.
7. the Frequent episodes method for digging of a kind of mass data according to claim 5, which is characterized in that use PrefixSpan algorithms are to dividing PωIt is excavated.
CN201711457785.5A 2017-12-28 2017-12-28 A kind of Frequent episodes method for digging of mass data Pending CN108170799A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711457785.5A CN108170799A (en) 2017-12-28 2017-12-28 A kind of Frequent episodes method for digging of mass data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711457785.5A CN108170799A (en) 2017-12-28 2017-12-28 A kind of Frequent episodes method for digging of mass data

Publications (1)

Publication Number Publication Date
CN108170799A true CN108170799A (en) 2018-06-15

Family

ID=62519328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711457785.5A Pending CN108170799A (en) 2017-12-28 2017-12-28 A kind of Frequent episodes method for digging of mass data

Country Status (1)

Country Link
CN (1) CN108170799A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299254A (en) * 2018-09-03 2019-02-01 中新网络信息安全股份有限公司 A kind of sorting algorithm based on time series data
CN110275911A (en) * 2019-06-24 2019-09-24 重庆大学 Private car trip hotspot path method for digging based on Frequent Sequential Patterns
CN111078754A (en) * 2019-12-19 2020-04-28 南京柏跃软件有限公司 Frequent trajectory extraction method based on massive space-time data and mining system thereof
CN111309858A (en) * 2020-01-20 2020-06-19 腾讯科技(深圳)有限公司 Information identification method, device, equipment and medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299254A (en) * 2018-09-03 2019-02-01 中新网络信息安全股份有限公司 A kind of sorting algorithm based on time series data
CN110275911A (en) * 2019-06-24 2019-09-24 重庆大学 Private car trip hotspot path method for digging based on Frequent Sequential Patterns
CN110275911B (en) * 2019-06-24 2023-05-23 重庆大学 Private car travel hot spot path mining method based on frequent sequence mode
CN111078754A (en) * 2019-12-19 2020-04-28 南京柏跃软件有限公司 Frequent trajectory extraction method based on massive space-time data and mining system thereof
CN111078754B (en) * 2019-12-19 2020-08-25 南京柏跃软件有限公司 Frequent trajectory extraction method based on massive space-time data and mining system thereof
CN111309858A (en) * 2020-01-20 2020-06-19 腾讯科技(深圳)有限公司 Information identification method, device, equipment and medium
CN111309858B (en) * 2020-01-20 2023-03-07 腾讯科技(深圳)有限公司 Information identification method, device, equipment and medium

Similar Documents

Publication Publication Date Title
Li et al. Network cross-validation by edge sampling
Fournier-Viger et al. VMSP: Efficient vertical mining of maximal sequential patterns
US7912818B2 (en) Web graph compression through scalable pattern mining
CN108170799A (en) A kind of Frequent episodes method for digging of mass data
CN106570128A (en) Mining algorithm based on association rule analysis
CN104376406A (en) Enterprise innovation resource management and analysis system and method based on big data
CN110442618B (en) Convolutional neural network review expert recommendation method fusing expert information association relation
CA2796061A1 (en) Ascribing actionable attributes to data that describes a personal identity
CN106021626A (en) Data search method based on data mining
CN105589908A (en) Association rule computing method for transaction set
WO2016157275A1 (en) Computer and graph data generation method
JP6428615B2 (en) Multidimensional range search apparatus and multidimensional range search method
CN105447004B (en) The excavation of word, relevant inquiring method and device are recommended in inquiry
CN105404677A (en) Tree structure based retrieval method
CN104765852A (en) Data mining method based on fuzzy algorithm under big data background
Hlaoui et al. A direct approach to graph clustering.
JP6434162B2 (en) Data management system, data management method and program
JP5780036B2 (en) Extraction program, extraction method and extraction apparatus
Angeline Association rule generation using apriori mend algorithm for student's placement
CN110825792A (en) High-concurrency distributed data retrieval method based on golang middleware coroutine mode
CN107609110B (en) Mining method and device for maximum multiple frequent patterns based on classification tree
CN105426490A (en) Tree structure based indexing method
Thomas et al. A survey on extracting frequent subgraphs
CN109918564A (en) It is a kind of towards the context autocoding recommended method being cold-started completely and system
Lin et al. Efficient updating of sequential patterns with transaction insertion

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180615