CN111581262A - Order-preserving sequence pattern mining method - Google Patents

Order-preserving sequence pattern mining method Download PDF

Info

Publication number
CN111581262A
CN111581262A CN202010544303.5A CN202010544303A CN111581262A CN 111581262 A CN111581262 A CN 111581262A CN 202010544303 A CN202010544303 A CN 202010544303A CN 111581262 A CN111581262 A CN 111581262A
Authority
CN
China
Prior art keywords
pattern
candidate
frequent
length
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010544303.5A
Other languages
Chinese (zh)
Inventor
武优西
户倩
郭媛
王晓慧
赵晓倩
王珠林
崔文峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei University of Technology
Original Assignee
Hebei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei University of Technology filed Critical Hebei University of Technology
Priority to CN202010544303.5A priority Critical patent/CN111581262A/en
Publication of CN111581262A publication Critical patent/CN111581262A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for mining a sequence preserving sequence mode, which relates to the technical field of electric digital data processing, and is characterized in that a mode fusion method is used for generating candidate modes, the number of the candidate modes is reduced, and the mode support degree of the candidate modes is calculated through a series of conversion and verification steps, so that the defects that in the prior art, aiming at mining frequent modes from a time sequence, the accuracy, the generality and the completeness of solution are difficult to be considered, the important information is difficult to be lost when the time sequence is processed, and the key trend is difficult to be analyzed through mining frequent modes are overcome.

Description

Order-preserving sequence pattern mining method
Technical Field
The technical scheme of the invention relates to the technical field of electric digital data processing, in particular to a method for mining an order-preserving sequence mode.
Background
Sequence pattern mining becomes one of important tasks in data mining, and has wide application in sequence analysis, classification, prediction and the like, and the task is to find frequently-occurring patterns in massive sequence data. Sequence data is currently generally divided into two categories: character sequences and time sequences. Common character sequences comprise DNA sequences, protein sequences and the like, and frequent patterns in the common character sequences can help people to solve the problems in biology; the time series is numerical data measured and recorded over time, such as daily stock price, oil production, daily temperature, etc. are common time series, and the numerical value is not significant, so that people are more interested in the trend presented by the data, for example, in stock market, an analyst may want to know whether there is a period in which the stock price of a company falls for 10 days continuously and then rises within the next 5 days, in which case, the change pattern of the stock price is more significant than the actual value of the stock price. Therefore, frequent trends are found from the time series, people can be helped to know the development law of things, and theoretical basis is provided for prediction and decision-making of people.
The frequent mode refers to a mode with the mode support degree not less than the minimum support degree threshold value min, that is, the occurrence number of the mode in the data set is not less than the minimum support degree threshold value min. At present, many frequent pattern mining methods for character sequences have been proposed, but they cannot be directly applied to time sequence mining, because the time sequence has the significant characteristics of high dimension, continuity, large data volume and the like, before the time sequence mining, one-step preprocessing is usually required to be performed on the time sequence, so that numerical data is converted into data of other domains. The common method is to perform time series symbolization processing, such as the common SAX method, to convert the time series numerical data into character data and then perform mining, but the preprocessing step has the following defects: the method needs to manually set parameters, easily loses some important information in the process, and breaks the continuity of the time sequence to a certain extent. Taking the SAX method as an example, two time series with different trends are symbolized to obtain the same character sequence, as shown in fig. 1 (a) and (b) of the attached drawings, two time series with significant trend differences are symbolized to be "beccde" after being symbolized by the SAX, which is very unfavorable for trend analysis of the time series, so a more complete mining method is required.
The concept of order preservation provides a new idea for trend analysis of time series, which has been applied to the problem of order-preserving matching, and the idea is to find patterns influenced by relative order, rather than their absolute values, in the field of interest, and to indicate that matching is successful when the relative order of the subsequence is the same as that of the given pattern. The following example A details the relative order concept and the problem of pattern matching for the order-preserving sequence.
Example a. given time series S ═ S (S)1,s2,s3,s4,s5,s6,s7,s8,s9,s10,s11,s12,s13,s14,s15,s16,s17) (9,12,11,17,16,21,14,18,15,19,21,19,26,18,25,26,27), and (P) mode P1,p2,p3,p4,p5)=(6,5,8,4,7)。
In example a, for a given pattern P ═ (6,5,8,4,7), its relative order is (3,2,5,1,4), since in a pattern P with a pattern length of 5, P is4Since 4 is the smallest number among 5 numbers of pattern P, P is written4Has a relative order of 1, in the same way as p5Is the fourth smallest among 5 numbers of pattern P, so P is noted5Is 4. The task of order-preserving sequence pattern matching is to find out the subsequences in the time sequence S that have the same relative order as the pattern P. FIG. 2 of the drawings in the specification shows that(s)4,s5,s6,s7,s8) That is (17,16,21,14,18) appears in one group because its relative order is also (3,2,5,1,4), with the same relative order as pattern P; for the same reason,(s)11,s12,s13,s14,s15) Another group appears as (21,19,26,18, 25). As can be seen from FIG. 2 of the drawings, the trend and the pattern of the data fluctuation of the two matched subsequencesThe data fluctuation trends of P are quite similar, which is the characteristic of the order-preserving sequence mode, namely the trend characteristic of the time sequence can be well represented.
Although it can be seen from example a that the existing order-preserving pattern matching technology can find the subsequences with the same trend change as the given pattern P in the time sequence, this does not satisfy the user requirement because sometimes the user does not have prior knowledge, does not give a specific pattern in advance, and is more interested in those frequently occurring but unknown patterns. The invention provides an order-preserving sequence pattern mining method, which is used for mining order-preserving sequence patterns frequently appearing in a time sequence, wherein each generated frequent order-preserving sequence pattern represents a frequent trend, so that a user can obtain a data change rule within a period of time according to a mining result and can predict the trend of future data according to the data change rule, and the method has practical significance and practical value. The following example B describes the problem of order-preserving sequence pattern mining in detail.
Example b. given the time series S ═ S (S)1,s2,s3,s4,s5,s6,s7,s8,s9,s10,s11,s12,s13,s14,s15,s16) (12,11,22,26,13,15,19,20,27,14,17,21,25,31,16,18), and the minimum support threshold value minsup is 3.
For subsequence(s)3,s4,s5,s6) The subsequence(s) can be found in the same way as (22,26,13,15) and the relative order is (3,4,1,2)8,s9,s10,s11) And subsequence(s)13,s14,s15,s16) Is also (3,4,1,2), so that the subsequence with the relative order of (3,4,1,2) appears 3 times in total, and the pattern expressed by the relative order of (3,4,1,2) is called the order-preserving sequence pattern. The subsequence(s) can be seen in FIG. 3 of the drawings of the specification3,s4,s5,s6) Subsequence(s)8,s9,s10,s11) And subsequence(s)13,s14,s15,s16) Of (2) aThe trends are very similar and can be expressed as (3,4,1, 2). The solution goal of the order-preserving sequence pattern mining problem is to mine all frequent order-preserving sequence patterns over a given time sequence. For example B, there are 7 frequent order-preserving sequence patterns in the time series S, that is, (1,2), (2,1), (1,2,3), (2,3,1), (3,1,2), (1,2,3,4), and (3,4,1,2), which are all important trends that frequently occur on the time series S, and the user can perform the next prediction decision and other work according to the mining result, so that it has very important practical significance.
The time series pattern mining problem generally requires accuracy, generality and completeness. When processing a high-dimensional time sequence, the problems of loss of valuable information and excessive space-time complexity are required to be avoided, and the purpose of analyzing key trends in the time sequence is required to be achieved by finding frequent patterns, but the existing related technology is difficult to meet the conditions at the same time. CN107451293A discloses a method and an apparatus for mining contrast patterns, which researches a method for mining contrast patterns in a multi-class sequence data set, but the data targeted by this technique is character-type data, and because of the high dimension of time series, if this method is directly applied to time series mining, there is a defect that the space-time complexity will be too high; the document published by chen et al, "text emotion feature extraction method based on order preserving submatrix and frequent sequence pattern mining, university of Shandong," studies a method for mining order preserving submatrix from feature vectors converted from Chinese network comment data, but the document is a method for vectorizing text data and constructing a matrix, needs to consider rows and columns of the matrix at the same time, does not conform to the characteristics of one-dimensional time sequence, and the proposed method cannot be applied to time sequence analysis, and does not have the generality of solution. The document "HOTSAX" published by Keogh et al, effective refining the most unused time series subsequences, IEEEInternational reference on Data mining, "researches the mining method for finding abnormal patterns from time series, but the document needs to adopt SAX method to carry out one-step pretreatment before mining the time series, which can cause the loss of important information and destroy the continuity of the original time series to a certain extent, which is the deficiency thereof; the document "Order-preserving matching, the scientific science" published by Kim et al researches a method for finding a subsequence with the same relative sequence as a known pattern from a time sequence, but the technology can only calculate the support degree of a sequence-preserving pattern, and has the defects that a frequently-occurring but unknown sequence-preserving pattern in a data set cannot be found, so that the key trend in the time sequence cannot be analyzed, and certain limitations exist in the solving difficulty and the application range.
In summary, the existing technology aims at mining frequent patterns from a time sequence, and has the defects that the accuracy, generality and completeness of solution are difficult to be considered, important information is difficult to be lost when the time sequence is processed, and key trends are difficult to be analyzed by mining frequent patterns.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method for mining the order-preserving sequence mode is provided, the candidate modes are generated by using a mode fusion method, the number of the candidate modes is reduced, the mode support degree of the candidate modes is calculated through a series of conversion and verification steps, and the defects that in the prior art, aiming at mining frequent modes from a time sequence, the accuracy, the generality and the completeness of solution are difficult to be considered, important information is difficult to be lost when the time sequence is processed, and the key trend is difficult to be analyzed through mining the frequent modes are overcome.
The technical scheme adopted by the invention for solving the technical problem is as follows: the order-preserving sequence pattern mining method generates candidate patterns by using a pattern fusion method, reduces the number of the candidate patterns, and calculates the support degree of the candidate patterns through a series of conversion and verification steps, and comprises the following specific steps:
first, inputting a time sequence S and a minimum support threshold minsup:
inputting a time sequence S, determining the length of the time sequence S to be n, and respectively recording each element in the time sequence S as an element S1Element s2…, element snInputting a minimum support threshold minsup, which is the minimum number of occurrences of the desired pattern in the time series S, specified by the user;
second, a frequent pattern set fre with a pattern length of 2 is obtained2
Candidate pattern set cand with pattern length of 22When { (1,2), (2,1) }, the candidate pattern set cand with the pattern length of 2 is sequentially calculated according to the following calculation procedure of the pattern support degree2Each candidate pattern P in { (1,2), (2,1) } isdThe mode support degree in the time series S, when the mode support degree of the candidate mode is larger than or equal to the minimum support degree threshold value min, the candidate mode PdThat is, a frequent pattern with a pattern length of 2, and compares the candidate pattern PdAdding to frequent pattern set fre of pattern length 22Thus obtaining a frequent pattern set fre of pattern length 22
The calculation steps of the mode support degree are as follows:
firstly, the candidate pattern P in the current processed candidate pattern setdIs sorted from small to large, the ith element is arranged in a candidate pattern PdHas a position index of]In candidate pattern PdIn which is pindex[i]<pindex[i+1]The condition is satisfied, wherein pindex[i]Is a candidate pattern PdThe ith-ranked element, pindex[i+1]Is a candidate pattern PdThe element with the rank of i +1, i is more than or equal to 1 and less than or equal to m-1, wherein m is the candidate pattern P currently processeddThe length of the pattern of (a) is,
then the candidate pattern PdThe binary string P 'is converted according to the following formula (1), and each element in the binary string P' is denoted as element a1…, element ai…, element am-1The time series S is converted into a binary string S 'according to the following formula (2), and each element in the binary string S' is respectively designated as element b1…, element bj…, element bn-1The equations (1) and (2) are shown below,
Figure BDA0002540141550000041
Figure BDA0002540141550000042
in equations (1) and (2), m is the currently processed candidate pattern PdThe initial value of m is 2, n is the length of the time series S, aiIs the value of each element in the binary digit string P', wherein i is more than or equal to 1 and less than or equal to m-1, and the candidate pattern PdTwo consecutive elements piAnd pi+1Comparing, wherein i is more than or equal to 1 and less than or equal to m-1, when pi<pi+1Then aiIs equal to 1, when pi>pi+1Then aiEqual to 0; bjIs the value of each element in the binary digit string S', wherein j is more than or equal to 1 and less than or equal to n-1, and two continuous elements S in the time sequence SjAnd sj+1Comparing, wherein j is more than or equal to 1 and less than or equal to n-1, when sj<sj+1Then bjIs equal to 1, when sj>sj+1Then bjEqual to 0;
finding out the occurrence of binary string P 'in binary string S' by classical pattern matching algorithm, retaining the corresponding subsequence in time sequence S as candidate subsequence according to the occurrence whenever finding one occurrence, and verifying position index l of first element of the candidate subsequence1Whether or not conditions are satisfied
Figure BDA0002540141550000043
Satisfy, candidate pattern PdPlus one, not satisfied, candidate pattern PdIs not changed, wherein,
Figure BDA0002540141550000044
for the candidate sub-sequence and the candidate pattern PdElement p of (1)index[i]The position of (a) of (b) corresponds to the element,
Figure BDA0002540141550000045
for the candidate sub-sequence and the candidate pattern PdElement p of (1)index[i+1]I is more than or equal to 1 and less than or equal to m-1, when all occurrences are found and all candidate subsequences are verified, the candidate pattern P can be obtaineddThe mode support of (1);
thirdly, generating a candidate pattern set cand with a pattern length of L +1L+1
Adopting mode fusion method, and collecting fre by frequent mode with mode length LLGenerating a candidate pattern set cand with a pattern length L +1L+1Wherein L represents the pattern length of the currently processed frequent pattern, the initial value of L is 2, and in the process of generating the candidate pattern set, for the frequent pattern P, each element of the frequent pattern P is an element P1Element p2… element pLThe last element P of the frequent pattern PLThe remaining part, called prefix of frequent pattern P, is denoted as prefix (P), and the relative order of the prefixes of frequent pattern P is denoted as prefix (P); the first element P of the frequent pattern P1The remaining part, except for the suffix of the frequent pattern P, is designated as suffix (P), the relative order of suffixes of the frequent pattern P is designated as suffix (P),
the mode fusion method has the following fusion rules under two different conditions:
1) the general case is as follows: for the frequent pattern P and the frequent pattern Q with the length of both patterns being L, each element of the frequent pattern P is an element P1Element p2… element pLEach element of the frequent pattern Q is an element Q1Element q2… element qLWhen the relative order of suffixes of the frequent pattern P is equal to the relative order of prefixes of the frequent pattern Q, but the suffixes of the frequent pattern P are not equal to the prefixes of the frequent pattern Q, the frequent pattern P and the frequent pattern Q can be merged into a candidate pattern with a pattern length L +1, which is denoted as a candidate pattern X, and each element of the candidate pattern X is an element X1Element x2… element xL+1This is a common case, and the specific fusion rule is as follows:
comparing the first element P of the frequent pattern P1And the last element Q of the frequent pattern QLThe size of (2):
① when p1<qLLet the first element X of the candidate pattern X1=p1The last element X of candidate pattern XL+1=qL+1, and then the elements P of the other positions of the frequent pattern P than the first elementuWith the last element Q of the frequent pattern QLBy comparison, when p isu>qLThen the corresponding position element X of the candidate pattern Xu=pu+1, otherwise, xu=puWherein u is more than or equal to 2 and less than or equal to L;
② when p1>qLLet the first element X of the candidate pattern X1=p1+1, the last element X of the candidate pattern XL+1=qLThen, the elements Q of the other positions except the last element of the frequent pattern Q are setvWith the first element P of the frequent pattern P1Making a comparison when q isv>p1Then the corresponding position element X of the candidate pattern Xv+1=qv+1, otherwise, xv+1=qvWherein v is more than or equal to 1 and less than or equal to L-1;
2) special cases are as follows: for the frequent pattern P and the frequent pattern Q with the length of both patterns being L, each element of the frequent pattern P is an element P1Element p2… element pLEach element of the frequent pattern Q is an element Q1Element q2… element qLWhen not only the relative order of suffixes of the frequent pattern P and the relative order of prefixes of the frequent pattern Q are equal, but also the suffixes of the frequent pattern P and the prefixes of the frequent pattern Q are equal, the frequent pattern P and the frequent pattern Q can be merged into two candidate patterns with a pattern length L +1, which are respectively denoted as a candidate pattern T and a candidate pattern K, each element of the candidate pattern T is an element T1Element t2…, element tL+1Each element of the candidate pattern K is an element K1Element k2… yuanElement kL+1This is a special case, and the specific fusion rule is as follows:
when generating the candidate pattern T, let the first element T of the candidate pattern T1=p1+1, the last element T of the candidate pattern TL+1=p1Then, the elements P of the other positions of the frequent pattern P except the first element are setuAnd p1Making a comparison when pu>p1Then the corresponding position element T of the candidate pattern Tu=pu+1, otherwise, tu=puWherein u is more than or equal to 2 and less than or equal to L;
when generating the candidate pattern K, let the first element K of the candidate pattern K1=p1The last element K of KL+1=p1+1, and then the elements P of the other positions of the frequent pattern P than the first elementuAnd p1Making a comparison when pu>p1Then the corresponding position element K of the candidate pattern Ku=pu+1, otherwise, ku=puWherein u is more than or equal to 2 and less than or equal to L;
by adopting the mode fusion method, the frequent mode set fre with the mode length of L is adoptedLGenerating a candidate pattern set cand with a pattern length L +1L+1The specific treatment method comprises the following steps:
frequent pattern set fre when the pattern length is LLNot empty, first take out frequent pattern set freLFirst frequent pattern P inaCalculating the frequent pattern PaAnd the relative order of suffixes, then sequentially traversing the frequent pattern set fre from left to rightLEach of the frequent patterns PbAnd sequentially judging the frequent pattern PbAnd a frequent pattern PaWhether two conditions in the mode fusion method are met or not is determined, when any condition is met, the candidate modes with the mode length of L +1 are generated by fusion according to the corresponding fusion rule, and then the generated candidate modes with the mode length of L +1 are added into a candidate mode set cand with the mode length of L +1L+1In (3), when all the frequent patterns P are traversedbFor the frequent pattern PaThe fusion process of (2) is ended, and then from the frequent pattern set freLFirst frequent pattern P inaAnd the above steps are repeated until the frequent pattern set fre is processedLThe generation of the candidate pattern set cand with the pattern length of L +1 is completed for the last frequent pattern in (1)L+1
Fourthly, obtaining a frequent pattern set fre with the pattern length of L +1L+1
According to the method for calculating the mode support degree in the second step, the candidate mode set cand with the mode length L +1 is calculated in sequenceL+1Each candidate pattern P indMode support degree sup (P)dS) when the candidate pattern P isdMode support degree sup (P)dS) is more than or equal to the minimum support threshold value minsup, the candidate pattern P is selecteddFrequent pattern set fre added to pattern length L +1L+1When the candidate pattern set cand is calculatedL+1The mode support of all the candidate modes in the set, that is, the frequent mode set fre with the mode length of L +1 is obtainedL+1
And fifthly, finishing the excavation of the order-preserving sequence mode:
frequent pattern set fre when the pattern length is L +1L+1When the candidate pattern set cand is not empty, the third step and the fourth step are cycled until the candidate pattern set cand with the pattern length L +1L+1Frequent pattern set fre of null or pattern length L +1L+1And if the sequence is empty, finishing the mining of the sequence preserving sequence mode.
In the order preserving sequence pattern mining method, the used programming software is VC + +6.0, the drawing tool is Visio2013, the used Processor is Pentium (R) Dual-Core 32Processor +, the operating system is Windows7 and above versions, and the software and hardware environment used by the classic pattern matching algorithm are well known by those skilled in the art.
The invention has the beneficial effects that: compared with the prior art, the invention has the prominent substantive characteristics as follows:
(1) the invention solves the problem of mining the order-preserving sequence mode, firstly reads in the time sequence S and the minimum support thresholdminsup, determining frequent pattern set fre with pattern length 22By the mode fusion method, from the frequent pattern set fre with the pattern length of 22Generating a candidate pattern set cand with a pattern length of 33Then, the candidate pattern sets cand with the pattern length of 3 are calculated respectively3Adding the candidate mode with the mode support degree not less than the minimum support degree threshold value min into the frequent mode set fre with the mode length of 33When the mode support degree of the candidate mode is calculated, the occurrence number of the candidate mode in the time sequence S is obtained through a series of conversion and verification steps, so that the complete mode support degree result can be ensured, and the judgment of invalid sequence segments is reduced; the above process is iterated until the candidate pattern set cand with the pattern length L +1L+1Or frequent pattern set freL+1When the time is empty, the mining of the order-preserving sequence mode is finished; by the method, the times of calculating the mode support number are reduced, and the time complexity and the space complexity are reduced, so that the problem of mining the order-preserving sequence mode is solved.
(2) CN106339609A discloses an optimal alignment sequence pattern mining method with free space constraint, which is a method for mining alignment patterns, inputting two types of sequence sets of positive and negative examples, and allowing existence of space constraint, while the method is a preserving pattern, all the inputs are one type of sequence sets, and there is no positive or negative score, so that the method is more general, and there is no space constraint, so that the mined pattern is more accurate, which is the most substantial difference between the two.
(3) CN105868314A discloses a method for mining weighted negative sequence patterns under multiple support degrees, which is a negative sequence pattern, sets two parameters of weighted support degree and minimum support degree, and if the set parameters are not proper, the mining result is not accurate, but the invention can mine the desired frequent sequence-preserving sequence pattern only by setting a threshold value of the minimum support number, which is the largest substantial difference between the two.
(4) CN109101530A discloses a high-utility event sequence pattern mining algorithm, which is used for mining high-utility event sequence patterns, while the method is used for mining order-preserving sequence patterns, which are concerned about frequent change trends and do not need to consider the utility value of the patterns, which is the greatest substantial difference between the two.
(5) CN104750830A discloses a method for mining time series data periodically, which is to mine time series, but the mined time series is periodic patterns in the time series, and has little significance for analyzing trends in the time series, while the mined time series is order-preserving sequence patterns, which can capture critical trends frequently occurring in the time series, which is the most substantial difference between the two.
(6) CN104182461A discloses a time series data mining system, in the mining process, a time series clustering analysis module divides time series data into various categories according to the degree of association, and finally calculates the frequent pattern of the time series, but the invention analyzes the time series from the angle of mining the sequence pattern, when mining the frequent pattern, a candidate pattern is generated by using a pattern fusion method, and the pattern support degree of the candidate pattern in the time series is calculated through a series of conversion and verification steps, so that all the frequent patterns are obtained, and the time series does not need to be classified, compared with CN104182461A, the invention has obvious progress.
(7) CN108874952A discloses a method for mining the most frequent sequence mode based on distributed logs, the method is used for mining the most frequent sequence mode, a plurality of frequent modes are removed in a result set, and a plurality of useful information can be missed, and all frequent modes are mined by the method, so that the completeness of the result is guaranteed, and the method is remarkably improved compared with CN 108874952A.
(8) CN106469171A discloses a method for mining parallel frequent time sequence, the invention firstly converts time sequence into character sequence matrix, and mines frequent pattern on the basis, the conversion process is easy to lose valuable information, and the result is unfavorable to trend analysis of time sequence, but the invention can directly mine without converting time sequence, and the mining result can help people understand important trend of transaction development, compared with CN106469171A, the invention has obvious progress.
(9) CN109344179A discloses a frequent adjacent sequence pattern mining method, which stores patterns with different lengths into different sparse tensors and respectively finds out frequent patterns therein, resulting in high space-time complexity, but the invention firstly generates frequent patterns with short lengths, and then generates candidate patterns with longer lengths by fusing the frequent patterns with short lengths by using a pattern fusion method, thereby effectively reducing the number of the candidate patterns, having lower space-time complexity and fast efficiency, and having remarkable progress compared with CN 109344179A.
(10) CN107844540A discloses a time series mining method for power data, which first preprocesses data, and then divides a database to generate a sequence pattern set, and this invention needs to preprocess a time series, which will result in loss of valuable information and destroy the continuity of the time series to a certain extent, but the invention can mine without preprocessing the time series, and can find a key trend in the time series without losing the useful information, compared with CN107844540A, the invention has significant progress.
(11) CN109033341A discloses a Top-k contrast sequence pattern mining algorithm based on concurrency with interval constraint, which is to mine a contrast sequence pattern from a character sequence and needs to set interval constraint to make the mining process more complex and the mining result not accurate, but to mine an order-preserving sequence pattern from a time sequence and does not need to set interval constraint to make the mined result accurate and effective, compared with CN109033341A, the invention has obvious progress.
(12) CN108073701A discloses a mining method of rare patterns of multi-dimensional time sequence data, which is characterized in that mining is carried out on a multi-dimensional time sequence, rare patterns which do not appear frequently are mined, the application range of the mining method is smaller, people are more concerned about frequently appearing patterns in an actual scene, and the mining method is carried out on a one-dimensional time sequence, is more widely applied, and has remarkable progress compared with CN 108073701A.
(13) CN107451293A discloses a method for mining contrast patterns from multi-class sequence data, which is applicable to character-type sequences and cannot be directly applied to time sequences, and the mining method provided by the present invention has the substantial characteristic of analyzing the trend of time sequences mainly aiming at time sequence data. The method has the remarkable advantages that frequent order-preserving sequence modes are excavated in the time sequence, the frequently presented trend changes of numerical data can be found, and a theoretical basis is provided for prediction and decision of people.
(14) The text emotion feature extraction method based on the order preserving submatrix and frequent sequence pattern mining is characterized in that the data expression form of the order preserving submatrix disclosed by Shandong university school newspaper is a matrix, the emotion tendency of a text is analyzed through the model, and the mining method provided by the invention has the substantial characteristic that the data expression form of the order preserving sequence pattern is a sequence, and the key trend in a time sequence is analyzed through the sequence pattern. The method has the obvious advantages that frequent patterns can be mined from the time sequence without preprocessing data, loss of valuable information is avoided, and the purpose of analyzing key trends in the time sequence can be achieved.
Compared with the prior art, the method has the following remarkable progress:
(1) the method realizes the mining of the order-preserving sequence mode in the time sequence, can find out the frequent trend in the time sequence without preprocessing, overcomes the defect of information loss caused by increasing preprocessing steps in the prior art, and fills the vacancy that the prior art can only locate the appearance position of the known order-preserving mode in the sequence and can not find the unknown frequent order-preserving mode.
(2) The method introduces the concept of order preservation into the sequence pattern mining, most of the existing methods concern the absolute value of the pattern, but ignore the overall trend change of the pattern, so the difference between time sequences cannot be effectively reflected in the field of time sequence analysis, and the method pays more attention to the relative size of numerical values, expresses the trend characteristics of the numerical data by the relative sequence of the patterns, more accords with the characteristics of numerical data, and has universality and practical significance for the time sequence analysis;
(3) the invention discovers frequently-occurring trend changes from a time sequence, the existing mining technology needs to preprocess the time sequence before mining, such as SAX symbolization processing, but the invention can mine without the step, thus ensuring that the continuity of the time sequence is not damaged, and valuable information is not missed, so that the mining result is more complete, the application of the mining result is more extensive, and the mining method is more in line with the requirement of actual work;
(4) the invention researches the excavation of an order-preserving sequence mode, and mainly has two core problems: calculating the mode support degree and generating the candidate mode. The existing technology mainly focuses on solving the first big problem, namely, the support degree of the pattern is calculated through an order-preserving pattern matching technology, but the mining can not be completed only by the technology, a method for generating a candidate pattern is needed, and no technology related to order-preserving candidate pattern generation exists at present, so that the invention provides a brand-new pattern fusion method to generate the candidate pattern according to the characteristics of the order-preserving sequence pattern, ensures the complete operation of the mining process, fills the vacancy that the existing technology can only locate the occurrence position of the known order-preserving pattern in the sequence and can not find the unknown frequent order-preserving pattern, and has great practical significance.
(5) The method provided by the invention is reasonably applied to the time sequence, can help a user to obtain the data change rule within a period of time and provide a theoretical basis for predicting the trend and decision of future data, so that the method has important research value. The order-preserving sequence pattern mining method provided by the invention not only can help a user to extract valuable information and knowledge, but also reduces the difficulty of data processing and analysis, and has great development potential.
Drawings
The invention is further illustrated with reference to the figures and examples.
FIG. 1 is a comparison of two sets of significantly different time series that are SAX-signed to the same character sequence.
Fig. 2 shows all occurrences of pattern P in case a in time series S.
Fig. 3 is a trend graph of the time series S in the example B, and the sub-sequence represented by the dotted line is the appearance of the order-preserving sequence pattern (3,4,1,2) in the time series S.
FIG. 4 is a schematic flow chart of a computer process used in the method of the present invention.
Detailed Description
As shown in fig. 1, in the prior art, before mining a time sequence, a time sequence needs to be symbolized by using an SAX method, and numerical data is converted into character data, because the SAX is segmented by using a segment aggregation approximation (PAA), and then each segment is averaged, two time sequences with different trend information are symbolized to obtain the same symbol sequence. Fig. 1 (a) and (b) are time series with two distinct trends, but both are symbolized as "beccde" after being symbolized by SAX. The above explains that the existing processing technology for time series loses important information in data, and is not beneficial to analyzing the trend of time series.
The embodiment shown in fig. 2 shows that the pattern P in example a has 2 occurrences in the time series S, wherein the length of the time series S is 17, the 17 numerical corresponding position indices are denoted by '1' to '17', respectively, and the 1 st and 2 nd occurrences in the time series S of the pattern P ═ 6,5,8,4,7 are denoted by the corresponding position indices in the time series S, so that the 2 order-preserving occurrences in the time series S of the pattern P are <4,5,6,7,8> and <11,12,13,14,15>, respectively. The above illustrates that the existing order-preserving pattern matching technology can only find the appearance position of a given pattern P in a time series S, and cannot find frequently occurring but unknown patterns in the time series.
The example shown in FIG. 3 shows the subsequence(s) in example B3,s4,s5,s6) Subsequence(s)8,s9,s10,s11) And subsequence(s)13,s14,s15,s16) Is (3,4,1,2), so the preserved sequence pattern (3,4,1,2) appears 3 times in the time sequence S, therefore the pattern support degree of the preserved sequence pattern (3,4,1,2) is not less than the minimum support degree threshold value min, so the preserved sequence pattern (3,4,1,2) is a frequent preserved sequence pattern. By taking the example as an example, the method can overcome the defects of the prior art by mining the frequent order-preserving sequence mode, and achieve the purposes of not missing valuable information and analyzing the key trend in the time sequence.
FIG. 4 is a flow of the computer processing employed by the method of the present invention: 1) start → 2) input time sequence S and minimum support threshold min → 3) obtain frequent pattern set fre with pattern length 22→ 4) generating candidate pattern set cand with pattern length L +1L+1→ 5) candidate pattern set cand with pattern length L +1L+1If the result is empty, executing the step 10; no, step 6 → 6) is executed to calculate the candidate pattern set cand with pattern length L +1L+1Candidate pattern P indMode support sup (P) in time series SdS) → 7) determining candidate pattern PdMode support sup (P) in time series SdS) whether the minimum support threshold value minsup is not less than or equal to, if yes, step 8 → 8) is executed to put the candidate pattern PdAdding to frequent pattern set fre of pattern length L +1L+1Middle → 9) judging the frequent pattern set fre with a pattern length of L +1L+1If not, executing the step 4; yes, step 10 → 10) is executed.
Example 1
Given the time series S ═ 1.1,1.2,1.3,1.4,1.5,1.1,1.2,1.3,1.4,1.5,1.1,1.2,1.3,1.4,1.5,1.3,1.4), the minimum support threshold min ═ 3.
First, inputting a time sequence S and a minimum support threshold minsup:
the input time series S ═ 1.1,1.2,1.3,1.4,1.5,1.1,1.2,1.3,1.4,1.5,1.1,1.2,1.3,1.4,1.5,1.3,1.4), and the minimum support threshold min ═ 3;
second, a frequent pattern set fre with a pattern length of 2 is obtained2
Candidate pattern set cand with pattern length of 22When { (1,2), (2,1) }, the candidate pattern set cand with the pattern length of 2 is sequentially calculated according to the following calculation procedure of the pattern support degree2The mode support degree of each candidate mode in time series S in { (1,2), (2,1) } is determined, and when the mode support degree of the candidate mode is larger than or equal to a minimum support degree threshold value min, the candidate mode PdThat is, a frequent pattern with a pattern length of 2, and compares the candidate pattern PdAdding to frequent pattern set fre of pattern length 22Performing the following steps;
the calculation steps of the mode support degree are as follows:
firstly, the candidate pattern P in the current processed candidate pattern setdIs sorted from small to large, the ith element is arranged in a candidate pattern PdHas a position index of]In candidate pattern PdIn which is pindex[i]<pindex[i+1]The condition is satisfied, wherein pindex[i]Is a candidate pattern PdThe ith-ranked element, pindex[i+1]Is a candidate pattern PdThe element with the rank of i +1, i is more than or equal to 1 and less than or equal to m-1, wherein m is the candidate pattern P currently processeddThe length of the pattern of (a) is,
then the candidate pattern PdThe binary string P 'is converted according to the following formula (1), and each element in the binary string P' is denoted as element a1…, element ai…, element am-1The time series S is converted into a binary string S 'according to the following formula (2), and each element in the binary string S' is respectively designated as element b1…, element bj…, element bn-1The equations (1) and (2) are shown below,
Figure BDA0002540141550000101
Figure BDA0002540141550000102
in equations (1) and (2), m is the currently processed candidate pattern PdThe initial value of m is 2, n is the length of the time series S, aiIs the value of each element in the binary digit string P', wherein i is more than or equal to 1 and less than or equal to m-1, and the candidate pattern PdTwo consecutive elements piAnd pi+1Comparing, wherein i is more than or equal to 1 and less than or equal to m-1, when pi<pi+1Then aiIs equal to 1, when pi>pi+1Then aiEqual to 0; bjIs the value of each element in the binary digit string S', wherein j is more than or equal to 1 and less than or equal to n-1, and two continuous elements S in the time sequence SjAnd sj+1Comparing, wherein j is more than or equal to 1 and less than or equal to n-1, when sj<sj+1Then bjIs equal to 1, when sj>sj+1Then bjEqual to 0;
finding out the occurrence of binary string P 'in binary string S' by classical pattern matching algorithm, retaining the corresponding subsequence in time sequence S as candidate subsequence according to the occurrence whenever finding one occurrence, and verifying position index l of first element of the candidate subsequence1Whether or not conditions are satisfied
Figure BDA0002540141550000103
Satisfy, candidate pattern PdPlus one, not satisfied, candidate pattern PdIs not changed, wherein,
Figure BDA0002540141550000104
for the candidate sub-sequence and the candidate pattern PdElement p of (1)index[i]The position of (a) of (b) corresponds to the element,
Figure BDA0002540141550000105
for the candidate sub-sequence and the candidate pattern PdElement p of (1)index[i+1]I is more than or equal to 1 and less than or equal to m-1, when all occurrences are found and all candidate subsequences are verified, the candidate pattern P can be obtaineddThe mode support of (1);
the specific operation of this embodiment is as follows:
1) computing a candidate pattern set cand with a pattern length of 221 st candidate pattern P in1Mode support sup (P) in time series S ═ 1,21S) is 13 because of sup (P)1S) is equal to or more than the minimum support threshold value minsup, so the candidate pattern P is set1Adding to frequent pattern set fre of pattern length 22Middle, fre2={(1,2)},
2) Computing a candidate pattern set cand with a pattern length of 222 nd candidate pattern P in2Mode support sup (P) in time series S ═ 2,12S) is 3 because of sup (P)2S) is equal to or more than the minimum support threshold value minsup, so the candidate pattern P is set2Adding to frequent pattern set fre of pattern length 22Middle, fre2={(1,2),(2,1)},
In summary, the frequent pattern set fre with the pattern length of 2 is obtained2={(1,2),(2,1)};
Thirdly, generating a candidate pattern set cand with a pattern length of L +1L+1
Adopting mode fusion method, and collecting fre by frequent mode with mode length LLGenerating a candidate pattern set cand with a pattern length L +1L+1Wherein L represents the pattern length of the currently processed frequent pattern, the initial value of L is 2, and in the process of generating the candidate pattern set, for the frequent pattern P, each element of the frequent pattern P is an element P1Element p2… element pLThe last element P of the frequent pattern PLThe remaining part, except for the prefix called frequent pattern P, is denoted prefix (P), frequencyThe relative order of the prefixes of the frequent pattern P is denoted as prefixorder (P); the first element P of the frequent pattern P1The remaining part, except for the suffix of the frequent pattern P, is designated as suffix (P), the relative order of suffixes of the frequent pattern P is designated as suffix (P),
the mode fusion method has the following fusion rules under two different conditions:
1) the general case is as follows: for the frequent pattern P and the frequent pattern Q with the length of both patterns being L, each element of the frequent pattern P is an element P1Element p2… element pLEach element of the frequent pattern Q is an element Q1Element q2… element qLWhen the relative order of suffixes of the frequent pattern P is equal to the relative order of prefixes of the frequent pattern Q, but the suffixes of the frequent pattern P are not equal to the prefixes of the frequent pattern Q, the frequent pattern P and the frequent pattern Q can be merged into a candidate pattern with a pattern length L +1, which is denoted as a candidate pattern X, and each element of the candidate pattern X is an element X1Element x2… element xL+1This is a common case, and the specific fusion rule is as follows:
comparing the first element P of the frequent pattern P1And the last element Q of the frequent pattern QLThe size of (2):
① when p1<qLLet the first element X of the candidate pattern X1=p1The last element X of candidate pattern XL+1=qL+1, and then the elements P of the other positions of the frequent pattern P than the first elementuWith the last element Q of the frequent pattern QLBy comparison, when p isu>qLThen the corresponding position element X of the candidate pattern Xu=pu+1, otherwise, xu=puWherein u is more than or equal to 2 and less than or equal to L;
② when p1>qLLet the first element X of the candidate pattern X1=p1+1, the last element X of the candidate pattern XL+1=qLThen divide the last element of the frequent pattern Q byElements q in other positions thanvWith the first element P of the frequent pattern P1Making a comparison when q isv>p1Then the corresponding position element X of the candidate pattern Xv+1=qv+1, otherwise, xv+1=qvWherein v is more than or equal to 1 and less than or equal to L-1;
2) special cases are as follows: for the frequent pattern P and the frequent pattern Q with the length of both patterns being L, each element of the frequent pattern P is an element P1Element p2… element pLEach element of the frequent pattern Q is an element Q1Element q2… element qLWhen not only the relative order of suffixes of the frequent pattern P and the relative order of prefixes of the frequent pattern Q are equal, but also the suffixes of the frequent pattern P and the prefixes of the frequent pattern Q are equal, the frequent pattern P and the frequent pattern Q can be merged into two candidate patterns with a pattern length L +1, which are respectively denoted as a candidate pattern T and a candidate pattern K, each element of the candidate pattern T is an element T1Element t2…, element tL+1Each element of the candidate pattern K is an element K1Element k2…, element kL+1This is a special case, and the specific fusion rule is as follows:
when generating the candidate pattern T, let the first element T of the candidate pattern T1=p1+1, the last element T of the candidate pattern TL+1=p1Then, the elements P of the other positions of the frequent pattern P except the first element are setuAnd p1Making a comparison when pu>p1Then the corresponding position element T of the candidate pattern Tu=pu+1, otherwise, tu=puWherein u is more than or equal to 2 and less than or equal to L;
when generating the candidate pattern K, let the first element K of the candidate pattern K1=p1The last element K of KL+1=p1+1, and then the elements P of the other positions of the frequent pattern P than the first elementuAnd p1Making a comparison when pu>p1Then the corresponding position element of the candidate pattern Kku=pu+1, otherwise, ku=puWherein u is more than or equal to 2 and less than or equal to L;
by adopting the mode fusion method, the frequent mode set fre with the mode length of L is adoptedLGenerating a candidate pattern set cand with a pattern length L +1L+1The specific treatment method comprises the following steps:
frequent pattern set fre when the pattern length is LLNot empty, first take out frequent pattern set freLFirst frequent pattern P inaCalculating the frequent pattern PaAnd the relative order of suffixes, then sequentially traversing the frequent pattern set fre from left to rightLEach of the frequent patterns PbAnd sequentially judging the frequent pattern PbAnd a frequent pattern PaWhether two conditions in the mode fusion method are met or not is determined, when any condition is met, the candidate modes with the mode length of L +1 are generated by fusion according to the corresponding fusion rule, and then the generated candidate modes with the mode length of L +1 are added into a candidate mode set cand with the mode length of L +1L+1In (3), when all the frequent patterns P are traversedbFor the frequent pattern PaThe fusion process of (2) is ended, and then from the frequent pattern set freLFirst frequent pattern P inaAnd the above steps are repeated until the frequent pattern set fre is processedLThe generation of the candidate pattern set cand with the pattern length of L +1 is completed for the last frequent pattern in (1)L+1
The operation of this embodiment is as follows:
1. from a frequent pattern set fre of pattern length 22Generating a candidate pattern set cand with a pattern length of 33
Since the frequent pattern set fre of pattern length 2 is obtained by the second step2={(1,2),(2,1)},
1) Processing frequent pattern set fre with pattern length of 221 st frequent pattern P in1=(1,2):
Frequent pattern P1Suffix of (A), (B), (C1) 2, frequent pattern P1Relative order of suffixes of (P)1)=(1),
① fetch frequent pattern set fre with pattern length 221 st frequent pattern P in1(1,2), frequent pattern P1Prefix (P) of1) Frequent pattern P ═ 11Relative order of prefixes of prefixorder (P)1) (1) because of suffixorder (P)1)=prefixorder(P1) But suffix (P)1)≠prefix(P1) This case is common to the mode fusion method, so the frequent mode P is passed1And a frequent pattern P1A candidate pattern (1,2,3) with a pattern length of 3 can be generated by fusion, and added to the candidate pattern set cand with a pattern length of 33In (c), from this cand3={(1,2,3)},
② fetch frequent pattern set fre with pattern length 222 nd frequent pattern P in2(2,1), frequent pattern P2Prefix (P) of2) 2, frequent pattern P2Relative order of prefixes of prefixorder (P)2) (1) because of suffixorder (P)1)=prefixorder(P2) And suffix (P)1)=prefix(P2) This case is a special case of the mode fusion method, so that the frequent mode P is passed1And a frequent pattern P2Two candidate patterns (2,3,1) and (1,3,2) with a pattern length of 3 can be generated by fusion and added to the candidate pattern set cand with a pattern length of 33In (c), from this cand3={(1,2,3),(2,3,1),(1,3,2)},
Thus for a frequent pattern set fre of pattern length 221 st frequent pattern P in1Finishing the treatment;
2) processing frequent pattern set fre with pattern length of 222 nd frequent pattern P in2=(2,1):
Frequent pattern P2Suffix of (A), (B), (C2) Frequent pattern P ═ 12Relative order of suffixes of (P)2)=(1),
① taking out pattern of length 2Frequent pattern set fre21 st frequent pattern P in1(1,2), frequent pattern P1Prefix (P) of1) Frequent pattern P ═ 11Relative order of prefixes of prefixorder (P)1) (1) because of suffixorder (P)2)=prefixorder(P1) And suffix (P)2)=prefix(P1) This case is a special case of the mode fusion method, so that the frequent mode P is passed2And a frequent pattern P1Two candidate patterns (3,1,2) and (2,1,3) with a pattern length of 3 can be generated by fusion, and added to the candidate pattern set cand with a pattern length of 33In (c), from this cand3={(1,2,3),(2,3,1),(1,3,2),(3,1,2),(2,1,3)};
② fetch frequent pattern set fre with pattern length 222 nd frequent pattern P in2(2,1), frequent pattern P2Prefix (P) of2) 2, frequent pattern P2Relative order of prefixes of prefixorder (P)2) (1) because of suffixorder (P)2)=prefixorder(P2) But suffix (P)2)≠prefix(P2) This case is common to the mode fusion method, so the frequent mode P is passed2And a frequent pattern P2A candidate pattern (3,2,1) with a pattern length of 3 can be generated by fusion, and added to the candidate pattern set cand with a pattern length of 33In (c), from this cand3={(1,2,3),(2,3,1),(1,3,2),(3,1,2),(2,1,3),(3,2,1)},
Thus for a frequent pattern set fre of pattern length 222 nd frequent pattern P in2Finishing the treatment;
in summary, the candidate pattern set cand with the pattern length of 3 is obtained3={(1,2,3),(2,3,1),(1,3,2),(3,1,2),(2,1,3),(3,2,1)};
Candidate pattern set cand when pattern length is 33After the generation, the candidate pattern set cand with the pattern length of 3 is calculated by 1) in the following fourth step3The mode support degree of each candidate mode in the time series S is obtained, thereby obtaining a frequent mode set with the mode length of 3Hefre (Hefre)3
2. From a frequent pattern set fre of pattern length 33Generating a candidate pattern set cand with a pattern length of 44
Since the frequent pattern set fre with a pattern length of 3 is obtained from 1) of the fourth step3={(1,2,3),(2,3,1),(3,1,2)},
1) Processing frequent pattern set fre with pattern length of 331 st frequent pattern P in1=(1,2,3):
Frequent pattern P1Suffix of (A), (B), (C1) (2,3), frequent pattern P1Relative order of suffixes of (P)1)=(1,2),
① fetch frequent pattern set fre with pattern length of 331 st frequent pattern P in1(1,2,3), frequent pattern P1Prefix (P) of1) (1,2), frequent pattern P1Relative order of prefixes of prefixorder (P)1) (1,2) because of suffixorder (P)1)=prefixorder(P1) (1,2) but suffix (P)1)≠prefix(P1) This case is common to the mode fusion method, so the frequent mode P is passed1And a frequent pattern P1A candidate pattern (1,2,3,4) with a pattern length of 4 can be generated by fusion, and added to the candidate pattern set cand with a pattern length of 44In (c), from this cand4={(1,2,3,4)};
② fetch frequent pattern set fre with pattern length of 332 nd frequent pattern P in2(2,3,1), frequent pattern P2Prefix (P) of2) (2,3), frequent pattern P2Relative order of prefixes of prefixorder (P)2) (1,2) because of suffixorder (P)1)=prefixorder(P2) And suffix (P)1)=prefix(P2) This case is a special case of the mode fusion method, so that the frequent mode P is passed1And a frequent pattern P2Two candidate patterns (2,3,4,1) and (1,3,4,2) with the pattern length of 4 can be generated by fusion and added into a candidate pattern set with the pattern length of 4cand4In (c), from this cand4={(1,2,3,4),(2,3,4,1),(1,3,4,2)};
③ fetch frequent pattern set fre with pattern length of 33The 3 rd frequent pattern P in3(3,1,2), frequent pattern P3Prefix (P) of3) (3,1), frequent pattern P3Relative order of prefixes of prefixorder (P)3) (2,1) because of suffixorder (P)1)≠prefixorder(P3) So that the two cases of the pattern fusion method are not satisfied, so that the pattern P is frequent1And a frequent pattern P3Candidate patterns with a pattern length of 4 cannot be generated by fusion.
Thus for a frequent pattern set fre with a pattern length of 331 st frequent pattern P in1Finishing the treatment;
2) processing frequent pattern set fre with pattern length of 332 nd frequent pattern P in2=(2,3,1):
Frequent pattern P2Suffix of (A), (B), (C2) (3,1), frequent pattern P2Relative order of suffixes of (P)2)=(2,1),
① fetch frequent pattern set fre with pattern length of 331 st frequent pattern P in1(1,2,3), frequent pattern P1Prefix (P) of1) (1,2), frequent pattern P1Relative order of prefixes of prefixorder (P)1) (1,2) because of suffixorder (P)2)≠prefixorder(P1) So that the two cases of the pattern fusion method are not satisfied, so that the pattern P is frequent2And a frequent pattern P1Candidate patterns with a pattern length of 4 cannot be generated by fusion.
② fetch frequent pattern set fre with pattern length of 332 nd frequent pattern P in2(2,3,1), frequent pattern P2Prefix (P) of2) (2,3), frequent pattern P2Relative order of prefixes of prefixorder (P)1) (1,2) because of suffixorder (P)2)≠prefixorder(P2) Therefore, the two cases of the pattern fusion method are not satisfied, so the pattern is frequentP2And a frequent pattern P2Candidate patterns with a pattern length of 4 cannot be generated by fusion.
③ fetch frequent pattern set fre with pattern length of 33The 3 rd frequent pattern P in3(3,1,2), frequent pattern P3Prefix (P) of3) (3,1), frequent pattern P3Relative order of prefixes of prefixorder (P)3) (2,1) because of suffixorder (P)2)=prefixorder(P3) And suffix (P)2)=prefix(P3) This case is a special case of the mode fusion method, so that the frequent mode P is passed2And a frequent pattern P3Two candidate patterns (3,4,1,2) and (2,4,1,3) with a pattern length of 4 can be generated by fusion, and added to the candidate pattern set cand with a pattern length of 44In (c), from this cand4={(1,2,3,4),(2,3,4,1),(1,3,4,2),(3,4,1,2),(2,4,1,3)};
Thus for a frequent pattern set fre with a pattern length of 332 nd frequent pattern P in2Finishing the treatment;
3) processing frequent pattern set fre with pattern length of 33The 3 rd frequent pattern P in3=(3,1,2):
Frequent pattern P3Suffix of (A), (B), (C3) (1,2), frequent pattern P3Relative order of suffixes of (P)3)=(1,2),
① fetch frequent pattern set fre with pattern length of 331 st frequent pattern P in1(1,2,3), frequent pattern P1Prefix (P) of1) (1,2), frequent pattern P1Relative order of prefixes of prefixorder (P)1) (1,2) because of suffixorder (P)3)=prefixorder(P1) And suffix (P)3)=prefix(P1) This case is a special case of the mode fusion method, so that the frequent mode P is passed3And a frequent pattern P1Two candidate patterns (3,1,2,4) and (4,1,2,3) with a pattern length of 4 can be generated by fusion, and added to the candidate pattern set cand with a pattern length of 44In (c), from this cand4={(1,2,3,4),(2,3,4,1),(1,3,4,2),(3,4,1,2),(2,4,1,3),(4,1,2,3),(3,1,2,4)};
② fetch frequent pattern set fre with pattern length of 332 nd frequent pattern P in2(2,3,1), frequent pattern P2Prefix (P) of2) (2,3), frequent pattern P2Relative order of prefixes of prefixorder (P)2) (1,2) because of suffixorder (P)3)=prefixorder(P2) But suffix (P)3)≠prefix(P2) This case is common to the mode fusion method, so the frequent mode P is passed3And a frequent pattern P2A candidate pattern (4,2,3,1) with a pattern length of 4 can be generated by fusion, and is added to the candidate pattern set cand with the pattern length of 44In (c), from this cand4={(1,2,3,4),(2,3,4,1),(1,3,4,2),(3,4,1,2),(2,4,1,3),(4,1,2,3),(3,1,2,4),(4,2,3,1)};
③ fetch frequent pattern set fre with pattern length of 33The 3 rd frequent pattern P in3(3,1,2), frequent pattern P3Prefix (P) of3) (3,1), frequent pattern P3Relative order of prefixes of prefixorder (P)3) (2,1) because of suffixorder (P)3)≠prefixorder(P3) So that the two cases of the pattern fusion method are not satisfied, so that the pattern P is frequent3And a frequent pattern P3Candidate patterns with a pattern length of 4 cannot be generated by fusion.
Thus for a frequent pattern set fre with a pattern length of 33The 3 rd frequent pattern P in3Finishing the treatment;
in summary, the candidate pattern set cand with the pattern length of 4 is obtained4={(1,2,3,4),(2,3,4,1),(1,3,4,2),(3,4,1,2),(2,4,1,3),(4,1,2,3),(3,1,2,4),(4,2,3,1)};
Candidate pattern set cand when pattern length is 44After generation, the candidate pattern set cand with the pattern length of 4 is calculated by 2) in the following fourth step4The mode support degree of each candidate mode in the time series S is obtained, thereby obtaining a frequent mode set with the mode length of 4fre4
3. From a frequent pattern set fre of pattern length 44Generating a candidate pattern set cand with a pattern length of 55
Since the frequent pattern set fre with a pattern length of 4 is obtained by 2) of the fourth step4={(1,2,3,4)},
1) Processing a frequent pattern set fre of pattern length 441 st frequent pattern P in1=(1,2,3,4):
Frequent pattern P1Suffix of (A), (B), (C1) (2,3,4), frequent pattern P1Relative order of suffixes of (P)1)=(1,2,3),
① fetch frequent pattern set fre with pattern length 441 st frequent pattern P in1(1,2,3,4), frequent pattern P1Prefix (P) of1) (1,2,3), frequent pattern P1Relative order of prefixes of prefixorder (P)1) (1,2,3) because of suffixorder (P)1)=prefixorder(P1) But suffix (P)1)≠prefix(P1) This case is common to the mode fusion method, so the frequent mode P is passed1And a frequent pattern P1A candidate pattern (1,2,3,4,5) with a pattern length of 5 can be generated by fusion, and added to the candidate pattern set cand with a pattern length of 55In (c), from this cand5={(1,2,3,4,5)};
Thus for a frequent pattern set fre with a pattern length of 441 st frequent pattern P in1Finishing the treatment;
in summary, the candidate pattern set cand with the pattern length of 5 is obtained5={(1,2,3,4,5)};
Candidate pattern set cand when the pattern length is 55After generation, the candidate pattern set cand with the pattern length of 5 is calculated by 3) in the following fourth step5The mode support degree of each candidate mode in the time series S is obtained, thereby obtaining a frequent mode set fre with a mode length of 55
4. From frequent patterns of pattern length 5Set fre5Generating a candidate pattern set cand with a pattern length of 66
Since the frequent pattern set fre of pattern length 5 is obtained from step 3) of the fourth step5={(1,2,3,4,5)},
1) Processing frequent pattern set fre with pattern length of 551 st frequent pattern P in1=(1,2,3,4,5):
Frequent pattern P1Suffix of (A), (B), (C1) (2,3,4,5), frequent pattern P1Relative order of suffixes of (P)1)=(1,2,3,4),
Fetching frequent pattern set fre with pattern length of 551 st frequent pattern P in1(1,2,3,4,5), frequent pattern P1Prefix (P) of1) (1,2,3,4), frequent pattern P1Relative order of prefixes of prefixorder (P)1) (1,2,3,4) because of suffixorder (P)1)=prefixorder(P1) But suffix (P)1)≠prefix(P1) This case is common to the mode fusion method, so the frequent mode P is passed1And a frequent pattern P1A candidate pattern (1,2,3,4,5,6) with a pattern length of 6 can be generated by fusion, and added to the candidate pattern set cand with a pattern length of 66In (c), from this cand6={(1,2,3,4,5,6)};
Thus for a frequent pattern set fre with a pattern length of 551 st frequent pattern P in1Finishing the treatment;
in summary, the candidate pattern set cand with the pattern length of 6 is obtained6={(1,2,3,4,5,6)};
Candidate pattern set cand when the pattern length is 66After generation, the candidate pattern set cand with the pattern length of 6 is calculated by 4) in the fourth step below6The mode support degree of each candidate mode in the time series S is obtained, thereby obtaining a frequent mode set fre with a mode length of 66
Fourthly, obtaining a frequent pattern set fre with the pattern length of L +1L+1
According to the method for calculating the mode support degree in the second step, the candidate mode set cand with the mode length L +1 is calculated in sequenceL+1Each candidate pattern P indMode support degree sup (P)dS) when the candidate pattern P isdMode support degree sup (P)dS) is more than or equal to the minimum support threshold value minsup, the candidate pattern P is selecteddFrequent pattern set fre added to pattern length L +1L+1When the candidate pattern set cand is calculatedL+1The mode support of all the candidate modes in the set, that is, the frequent mode set fre with the mode length of L +1 is obtainedL+1
The operation of this embodiment is as follows:
1) obtaining a frequent pattern set fre with a pattern length of 33
① calculate a candidate pattern set cand with a pattern length of 331 st candidate pattern P in1Mode support sup (P) in time series S ═ 1,2,31S) is 9 because of sup (P)1S) is equal to or more than the minimum support threshold value minsup, so the candidate pattern P is set1Adding (1,2,3) to a frequent pattern set fre with a pattern length of 33From this fre3={(1,2,3)};
② calculate a candidate pattern set cand with a pattern length of 332 nd candidate pattern P in2Pattern support sup (P) in time series S ═ 2,3,12S) is 3 because of sup (P)2S) is equal to or more than the minimum support threshold value minsup, so the candidate pattern P is set2Add (2,3,1) to frequent pattern set fre with pattern length of 33From this fre3={(1,2,3),(2,3,1)};
③ calculate a candidate pattern set cand with a pattern length of 33The 3 rd candidate pattern P in3Mode support sup (P) in time series S ═ 1,3,23S) is 0 because of sup (P)3,S)<Minimum support threshold value minsup, so candidate pattern P3Not frequently (1,3, 2);
④ calculate a candidate pattern set cand with a pattern length of 33The 4 th candidate pattern P in4=(3,1,2) mode support sup (P) in the time series S4S) is 3 because of sup (P)4S) is equal to or more than the minimum support threshold value minsup, so the candidate pattern P is set4Adding (3,1,2) to a frequent pattern set fre with a pattern length of 33From this fre3={(1,2,3),(2,3,1),(3,1,2)};
⑤ calculate a candidate pattern set cand with a pattern length of 33The 5 th candidate pattern P in4Pattern support sup (P) in time series S ═ 2,1,35S) is 0 because of sup (P)5,S)<Minimum support threshold value minsup, so candidate pattern P5Not frequently (2,1, 3);
⑥ calculate a candidate pattern set cand with a pattern length of 33The 6 th candidate pattern P in6Pattern support sup (P) in time series S ═ 3,2,16S) is 0 because of sup (P)6,S)<Minimum support threshold value minsup, so candidate pattern P6Not frequently (3,2, 1);
in summary, the frequent pattern set fre with the pattern length of 3 is obtained3={(1,2,3),(2,3,1),(3,1,2)};
2) Obtaining a frequent pattern set fre with a pattern length of 44
① calculate a candidate pattern set cand with a pattern length of 441 st candidate pattern P in1Mode support sup (P) in time series S ═ 1,2,3,41S) is 6 because of sup (P)1S) is equal to or more than the minimum support threshold value minsup, so the candidate pattern P is set1Adding (1,2,3,4) to a frequent pattern set fre with a pattern length of 44From this fre4={(1,2,3,4)};
② calculate a candidate pattern set cand with a pattern length of 442 nd candidate pattern P in2Support sup (P) of (2,3,4,1) in time series S2S) is 2 because of sup (P)2,S)<Minimum support threshold value minsup, so candidate pattern P2Not frequently (2,3,4, 1);
③ calculate a candidate pattern set cand with a pattern length of 44The 3 rd candidate pattern P in3Mode support sup (P) in time series S ═ 1,3,4,23S) is 0 because of sup (P)3,S)<Minimum support threshold value minsup, so candidate pattern P3Not frequently (1,3,4, 2);
④ calculate a candidate pattern set cand with a pattern length of 44The 4 th candidate pattern P in4Support sup (P) in time series S for (3,4,1,2)4S) is 2 because of sup (P)4,S)<Minimum support threshold value minsup, so candidate pattern P4Not frequently (3,4,1, 2);
⑤ calculate a candidate pattern set cand with a pattern length of 44The 5 th candidate pattern P in5Support sup (P) in time series S for (2,4,1,3)5S) is 0 because of sup (P)5,S)<Minimum support threshold value minsup, so candidate pattern P5Not frequently (2,4,1, 3);
⑥ calculate a candidate pattern set cand with a pattern length of 44The 6 th candidate pattern P in6Support sup (P) in time series S for (4,1,2,3)6S) is 2 because of sup (P)6,S)<Minimum support threshold value minsup, so candidate pattern P6Not frequently (4,1,2, 3);
⑦ calculate a candidate pattern set cand with a pattern length of 44The 7 th candidate pattern P in7Support sup (P) in time series S for (3,1,2,4)7S) is 0 because of sup (P)7,S)<Minimum support threshold value minsup, so candidate pattern P7Not frequently (3,1,2, 4);
⑧ calculate a candidate pattern set cand with a pattern length of 44The 8 th candidate pattern P in (1)8Support sup (P) of (4,2,3,1) in time series S8S) is 0 because of sup (P)8,S)<Minimum support threshold value minsup, so candidate pattern P8Not frequently (4,2,3, 1);
in summary, a frequent pattern set fre with a pattern length of 4 is obtained4={(1,2,3,4)};
3) Obtaining a frequent pattern set fre with a pattern length of 55
① calculate a candidate pattern set cand with a pattern length of 55First candidate pattern P in1Mode support sup (P) in time series S ═ 1,2,3,4,51S) is 3 because of sup (P)1S) is equal to or more than the minimum support threshold value minsup, so the candidate pattern P is set1Join the frequent pattern set fre with pattern length of 5 ═ 1,2,3,4,55Middle, fre5={(1,2,3,4,5)};
In summary, the frequent pattern set fre with the pattern length of 5 is obtained5={(1,2,3,4,5)};
4) Obtaining a frequent pattern set fre with a pattern length of 66
① calculate a candidate pattern set cand with a pattern length of 66Candidate pattern P in1Pattern support sup (P) in time series S for (1,2,3,4,5,6)1S) is 0 because of sup (P)1,S)<Minimum support threshold value minsup, so candidate pattern P1Not frequently (1,2,3,4,5, 6);
in summary, a frequent pattern set with a pattern length of 6 is obtained
Figure BDA0002540141550000171
And fifthly, finishing the excavation of the order-preserving sequence mode:
frequent pattern set fre when the pattern length is L +1L+1And if the sequence is empty, finishing mining the sequence preserving sequence mode.
Because in the fourth step, the frequent pattern set with a pattern length of 6
Figure BDA0002540141550000172
Frequent pattern set fre with pattern length of 66And the sequence preserving sequence pattern is empty, so that the mining of the sequence preserving sequence pattern is finished.
Example 2
Given the time series S ═ (2,1,3,4,8,9,7,12,14,13,15,17), the minimum support threshold minsup ═ 3.
"fifth step, when the mode length is the candidate mode set cand of L +1L+1If the sequence is empty, the mining of the sequence preserving mode is finished。
Because in the third step the set of candidate patterns with a pattern length of 5
Figure BDA0002540141550000173
Candidate pattern set cand with pattern length of 55And the sequence preserving sequence pattern is empty, so that the mining of the sequence preserving sequence pattern is finished. "
Except for the above differences, the same procedure as in example 1 was repeated.
In the above embodiment, the programming software is VC + +6.0, the drawing tool is Visio2013, the Processor is pentium (r) Dual-Core 32Processor +, the operating system is Windows7 and above, the classic pattern matching algorithm, and the software and hardware environments used above are well known to those skilled in the art.

Claims (1)

1. The method for mining the order-preserving sequence mode is characterized by comprising the following steps: the method for generating the candidate patterns by using the pattern fusion method reduces the number of the candidate patterns and calculates the support degree of the candidate patterns through a series of conversion and verification steps, and comprises the following specific steps:
first, inputting a time sequence S and a minimum support threshold minsup:
inputting a time sequence S, determining the length of the time sequence S to be n, and respectively recording each element in the time sequence S as an element S1Element s2…, element snInputting a minimum support threshold minsup, which is the minimum number of occurrences of the desired pattern in the time series S, specified by the user;
second, a frequent pattern set fre with a pattern length of 2 is obtained2
Candidate pattern set cand with pattern length of 22When { (1,2), (2,1) }, the candidate pattern set cand with the pattern length of 2 is sequentially calculated according to the following calculation procedure of the pattern support degree2Each candidate pattern P in { (1,2), (2,1) } isdThe mode support degree in the time series S, when the mode support degree of the candidate mode is larger than or equal to the minimum support degree threshold value min, the candidate mode PdThat is, a frequent pattern with a pattern length of 2, and compares the candidate pattern PdAdding to frequent pattern set fre of pattern length 22Thus obtaining a frequent pattern set fre of pattern length 22
The calculation steps of the mode support degree are as follows:
firstly, the candidate pattern P in the current processed candidate pattern setdIs sorted from small to large, the ith element is arranged in a candidate pattern PdHas a position index of]In candidate pattern PdIn which is pindex[i]<pindex[i+1]The condition is satisfied, wherein pindex[i]Is a candidate pattern PdThe ith-ranked element, pindex[i+1]Is a candidate pattern PdThe element with the rank of i +1, i is more than or equal to 1 and less than or equal to m-1, wherein m is the candidate pattern P currently processeddThe length of the pattern of (a) is,
then the candidate pattern PdThe binary string P 'is converted according to the following formula (1), and each element in the binary string P' is denoted as element a1…, element ai…, element am-1The time series S is converted into a binary string S 'according to the following formula (2), and each element in the binary string S' is respectively designated as element b1…, element bj…, element bn-1The equations (1) and (2) are shown below,
Figure FDA0002540141540000011
Figure FDA0002540141540000012
in equations (1) and (2), m is the currently processed candidate pattern PdThe initial value of m is 2, n is the length of the time series S, aiIs the value of each element in the binary digit string P', wherein i is more than or equal to 1 and less than or equal to m-1, and the candidate pattern PdTwo consecutive elements piAnd pi+1Comparing, wherein i is more than or equal to 1 and less than or equal to m-1, when pi<pi+1Then aiIs equal to 1, when pi>pi+1Then aiEqual to 0; bjIs the value of each element in the binary digit string S', wherein j is more than or equal to 1 and less than or equal to n-1, and two continuous elements S in the time sequence SjAnd sj+1Comparing, wherein j is more than or equal to 1 and less than or equal to n-1, when sj<sj+1Then bjIs equal to 1, when sj>sj+1Then bjEqual to 0;
finding out the occurrence of binary string P 'in binary string S' by classical pattern matching algorithm, retaining the corresponding subsequence in time sequence S as candidate subsequence according to the occurrence whenever finding one occurrence, and verifying position index l of first element of the candidate subsequence1Whether or not conditions are satisfied
Figure FDA0002540141540000013
Satisfy, candidate pattern PdPlus one, not satisfied, candidate pattern PdIs not changed, wherein,
Figure FDA0002540141540000014
for the candidate sub-sequence and the candidate pattern PdElement p of (1)index[i]The position of (a) of (b) corresponds to the element,
Figure FDA0002540141540000021
for the candidate sub-sequence and the candidate pattern PdElement p of (1)index[i+1]I is more than or equal to 1 and less than or equal to m-1, when all occurrences are found and all candidate subsequences are verified, the candidate pattern P can be obtaineddThe mode support of (1);
thirdly, generating a candidate pattern set cand with a pattern length of L +1L+1
Adopting mode fusion method, and collecting fre by frequent mode with mode length LLGenerating a candidate pattern set cand with a pattern length L +1L+1Wherein L represents the pattern length of the currently processed frequent pattern, the initial value of L is 2, and the candidate pattern is generatedIn the process of formula aggregation, for the frequent pattern P, each element thereof is an element P1Element p2… element pLThe last element P of the frequent pattern PLThe remaining part, called prefix of frequent pattern P, is denoted as prefix (P), and the relative order of the prefixes of frequent pattern P is denoted as prefix (P); the first element P of the frequent pattern P1The remaining part, except for the suffix of the frequent pattern P, is designated as suffix (P), the relative order of suffixes of the frequent pattern P is designated as suffix (P),
the mode fusion method has the following fusion rules under two different conditions:
1) the general case is as follows: for the frequent pattern P and the frequent pattern Q with the length of both patterns being L, each element of the frequent pattern P is an element P1Element p2… element pLEach element of the frequent pattern Q is an element Q1Element q2… element qLWhen the relative order of suffixes of the frequent pattern P is equal to the relative order of prefixes of the frequent pattern Q, but the suffixes of the frequent pattern P are not equal to the prefixes of the frequent pattern Q, the frequent pattern P and the frequent pattern Q can be merged into a candidate pattern with a pattern length L +1, which is denoted as a candidate pattern X, and each element of the candidate pattern X is an element X1Element x2… element xL+1This is a common case, and the specific fusion rule is as follows:
comparing the first element P of the frequent pattern P1And the last element Q of the frequent pattern QLThe size of (2):
① when p1<qLLet the first element X of the candidate pattern X1=p1The last element X of candidate pattern XL+1=qL+1, and then the elements P of the other positions of the frequent pattern P than the first elementuWith the last element Q of the frequent pattern QLBy comparison, when p isu>qLThen the corresponding position element X of the candidate pattern Xu=pu+1, otherwise, xu=puWherein u is more than or equal to 2 and less than or equal to L;
② when p1>qLLet the first element X of the candidate pattern X1=p1+1, the last element X of the candidate pattern XL+1=qLThen, the elements Q of the other positions except the last element of the frequent pattern Q are setvWith the first element P of the frequent pattern P1Making a comparison when q isv>p1Then the corresponding position element X of the candidate pattern Xv+1=qv+1, otherwise, xv+1=qvWherein v is more than or equal to 1 and less than or equal to L-1;
2) special cases are as follows: for the frequent pattern P and the frequent pattern Q with the length of both patterns being L, each element of the frequent pattern P is an element P1Element p2… element pLEach element of the frequent pattern Q is an element Q1Element q2… element qLWhen not only the relative order of suffixes of the frequent pattern P and the relative order of prefixes of the frequent pattern Q are equal, but also the suffixes of the frequent pattern P and the prefixes of the frequent pattern Q are equal, the frequent pattern P and the frequent pattern Q can be merged into two candidate patterns with a pattern length L +1, which are respectively denoted as a candidate pattern T and a candidate pattern K, each element of the candidate pattern T is an element T1Element t2…, element tL+1Each element of the candidate pattern K is an element K1Element k2…, element kL+1This is a special case, and the specific fusion rule is as follows:
when generating the candidate pattern T, let the first element T of the candidate pattern T1=p1+1, the last element T of the candidate pattern TL+1=p1Then, the elements P of the other positions of the frequent pattern P except the first element are setuAnd p1Making a comparison when pu>p1Then the corresponding position element T of the candidate pattern Tu=pu+1, otherwise, tu=puWherein u is more than or equal to 2 and less than or equal to L;
when generating the candidate pattern K, let the first element K of the candidate pattern K1=p1The last element K of KL+1=p1+1, and then the elements P of the other positions of the frequent pattern P than the first elementuAnd p1Making a comparison when pu>p1Then the corresponding position element K of the candidate pattern Ku=pu+1, otherwise, ku=puWherein u is more than or equal to 2 and less than or equal to L;
by adopting the mode fusion method, the frequent mode set fre with the mode length of L is adoptedLGenerating a candidate pattern set cand with a pattern length L +1L+1The specific treatment method comprises the following steps:
frequent pattern set fre when the pattern length is LLNot empty, first take out frequent pattern set freLFirst frequent pattern P inaCalculating the frequent pattern PaAnd the relative order of suffixes, then sequentially traversing the frequent pattern set fre from left to rightLEach of the frequent patterns PbAnd sequentially judging the frequent pattern PbAnd a frequent pattern PaWhether two conditions in the mode fusion method are met or not is determined, when any condition is met, the candidate modes with the mode length of L +1 are generated by fusion according to the corresponding fusion rule, and then the generated candidate modes with the mode length of L +1 are added into a candidate mode set cand with the mode length of L +1L+1In (3), when all the frequent patterns P are traversedbFor the frequent pattern PaThe fusion process of (2) is ended, and then from the frequent pattern set freLFirst frequent pattern P inaAnd the above steps are repeated until the frequent pattern set fre is processedLThe generation of the candidate pattern set cand with the pattern length of L +1 is completed for the last frequent pattern in (1)L+1
Fourthly, obtaining a frequent pattern set fre with the pattern length of L +1L+1
According to the method for calculating the mode support degree in the second step, the candidate mode set cand with the mode length L +1 is calculated in sequenceL+1Each candidate pattern P indMode support degree sup (P)dS) when the candidate pattern isPdMode support degree sup (P)dS) is more than or equal to the minimum support threshold value minsup, the candidate pattern P is selecteddFrequent pattern set fre added to pattern length L +1L+1When the candidate pattern set cand is calculatedL+1The mode support of all the candidate modes in the set, that is, the frequent mode set fre with the mode length of L +1 is obtainedL+1
And fifthly, finishing the excavation of the order-preserving sequence mode:
frequent pattern set fre when the pattern length is L +1L+1When the candidate pattern set cand is not empty, the third step and the fourth step are cycled until the candidate pattern set cand with the pattern length L +1L+1Frequent pattern set fre of null or pattern length L +1L+1And if the sequence is empty, finishing the mining of the sequence preserving sequence mode.
CN202010544303.5A 2020-06-15 2020-06-15 Order-preserving sequence pattern mining method Withdrawn CN111581262A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010544303.5A CN111581262A (en) 2020-06-15 2020-06-15 Order-preserving sequence pattern mining method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010544303.5A CN111581262A (en) 2020-06-15 2020-06-15 Order-preserving sequence pattern mining method

Publications (1)

Publication Number Publication Date
CN111581262A true CN111581262A (en) 2020-08-25

Family

ID=72114592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010544303.5A Withdrawn CN111581262A (en) 2020-06-15 2020-06-15 Order-preserving sequence pattern mining method

Country Status (1)

Country Link
CN (1) CN111581262A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112182497A (en) * 2020-09-25 2021-01-05 齐鲁工业大学 Biological sequence-based negative sequence pattern similarity analysis method, realization system and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112182497A (en) * 2020-09-25 2021-01-05 齐鲁工业大学 Biological sequence-based negative sequence pattern similarity analysis method, realization system and medium

Similar Documents

Publication Publication Date Title
US20150363549A1 (en) Data analysis device and method therefor
Tahir et al. EPMA: efficient pattern matching algorithm for DNA sequences
Li et al. Extracting statistical graph features for accurate and efficient time series classification
CN111475551A (en) High average utility sequence pattern mining method under non-overlapping condition
Gao et al. Efficient discovery of variable-length time series motifs with large length range in million scale time series
CN111581262A (en) Order-preserving sequence pattern mining method
Wu et al. COPP-Miner: Top-k contrast order-preserving pattern mining for time series classification
Wang et al. Knockoff-Guided Feature Selection via A Single Pre-trained Reinforced Agent
Feng et al. L-match: A lightweight and effective subsequence matching approach
Rabea et al. A fast algorithm for constructing suffix arrays for DNA alphabets
CN109828785B (en) Approximate code clone detection method accelerated by GPU
Kane Trend and value based time series representation for similarity search
CN112905689A (en) Order-preserving sequence rule mining method
Gurung et al. An analysis of the intelligent predictive string search algorithm: a probabilistic approach
Huang et al. An Approach of Suspected Code Plagiarism Detection Based on XGBoost Incremental Learning
Somayajulu Index based multiple pattern matching algorithm using DNA sequence and pattern count
Wu et al. Top-k contrast order-preserving pattern mining for time series classification
Durge et al. MRQPMS: Design of a Map Reduce Bioinspired Model for Solving Quorum Planted Motif Search for High-Speed Deployments.
CN118072817B (en) Base recognition operator acceleration method, system and device based on in-memory calculation
Markić et al. String pattern searching algorithm based on characters indices
CN115033636A (en) Approximate order-preserving sequence pattern mining method
CN117435246B (en) Code clone detection method based on Markov chain model
CN116304749B (en) Long text matching method based on graph convolution
Bhukya et al. Multiple Pattern Matching Algorithm Using Pair-Count
Munjal et al. Sequence similarity using composition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200825

WW01 Invention patent application withdrawn after publication