CN115563192B - Method for mining high-utility periodic frequent pattern applied to purchase pattern - Google Patents

Method for mining high-utility periodic frequent pattern applied to purchase pattern Download PDF

Info

Publication number
CN115563192B
CN115563192B CN202211463101.3A CN202211463101A CN115563192B CN 115563192 B CN115563192 B CN 115563192B CN 202211463101 A CN202211463101 A CN 202211463101A CN 115563192 B CN115563192 B CN 115563192B
Authority
CN
China
Prior art keywords
item set
utility
sequence
hupfps
list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211463101.3A
Other languages
Chinese (zh)
Other versions
CN115563192A (en
Inventor
张振洲
陈建铭
吴明泰
吴祖扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University of Science and Technology
Original Assignee
Shandong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University of Science and Technology filed Critical Shandong University of Science and Technology
Priority to CN202211463101.3A priority Critical patent/CN115563192B/en
Publication of CN115563192A publication Critical patent/CN115563192A/en
Application granted granted Critical
Publication of CN115563192B publication Critical patent/CN115563192B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for mining a high-utility periodic frequent pattern applied to a purchasing pattern, which comprises the following steps: s1, inputting a database and five custom thresholds; s2, scanning the database to construct a HUPFPS-list of the item set x 1, and judging whether the HUPFPS-list is a high-utility periodic frequent mode or not; s3, pruning the search space according to the upper bound value, and adding the HUPFPS-list meeting the conditions into the set; s4, intersecting and combining the trimmed 1 item sets into 2 item sets, and judging whether the 2 item sets are HUPFPS or not; s5, recursively circulating the HUPFPS-list of the n-1 item set to generate an n item set until the n item set cannot be expanded, and outputting all high-utility periodic frequent item sets. The technical scheme of the invention overcomes the problems that most researches on periodic patterns in the prior art are mined in a single sequence and the internal utility and the external utility of the patterns are not considered.

Description

High-utility periodic frequent pattern mining method applied to purchase pattern
Technical Field
The invention relates to the technical field of data mining, in particular to a high-utility periodic frequent pattern mining method applied to a purchasing pattern.
Background
In recent years, high-utility periodic pattern mining has gradually become one of the trending directions of data mining, and many scholars have made intensive studies on periodic pattern mining. However, the previous periodic pattern mining algorithms are all mining for a single time series, and the mining for the periodic patterns ignores the weight (value) and quantitative information inherent in the data, so that the mining patterns cannot gain advantages in profit or benefit. In order to meet the demand of the public for benefits, high-Utility Pattern Mining (HUPM) associated with benefits has become one of the research focuses of the academic and industrial fields of data intelligence field. In the utility model mining research, the model can appear more than once in a certain data/record, and the value of the model itself can be set with a specific gravity, which is more suitable for the application needs of the real society. As periodic patterns continue to be studied in depth, some variations of periodic patterns take into account the utility (profit) of the pattern. Then, an algorithm named PHUSPM is designed to mine a high-utility periodic pattern in a plurality of symbol sequences, the algorithm treats the plurality of sequences as a sequence, and the periodic pattern in a single sequence is mined by using the same periodic metric.
In recent years, sequence pattern mining has become one of the most popular pattern mining tasks, and is a generalization of the frequent item set mining problem, aiming to find frequent sub-sequences in a sequence. Currently, although many SPM algorithms are proposed to be applied to practical applications, there are limitations to SPM algorithms, which do not consider the number of items in the sequence and their unit profit, and they cannot be used to find high-utility patterns that often appear in the data. These factors are more useful in the field, for example, when a customer buys beer and fried chicken, then beef, the mode of purchase may generate high profits, but the beef accounts for more than one total profit, and in practical application, it is more important to find the mode of high profits which is bought periodically every week by a plurality of customers. In the conventional periodic frequent pattern mining PFPM, some items are purchased by customers regularly, but the customers cannot find out which profits of the frequently purchased items are higher, which greatly hinders their effectiveness on some practical applications, such as combination recommendation of products. Another example is the regular occurrence of certain DNA molecules in a gene sequence, but each DNA molecule is of different importance, which directly affects the expression of some external traits, and it is most critical to find DNA molecules that occur frequently and play a major role. Most studies on periodic patterns are mined in a single sequence and do not consider internal and external utilities of the patterns, and therefore, a method for high-utility periodic frequent pattern mining capable of mining in multiple sequences and considering internal and external utilities is needed.
Disclosure of Invention
The invention mainly aims to provide a method applied to high-utility periodic frequent pattern mining in a purchase mode, so as to solve the problems that most researches on periodic patterns in the prior art are mined in a single sequence and the internal utility and the external utility of the patterns are not considered.
In order to achieve the above object, the present invention provides a method for mining high utility period frequent patterns in a purchase pattern, comprising the following steps:
step 1, inputting a database of goods and quantity purchased by a customer within a period of time, and customizing five thresholds by a merchant, namely a minimum support rate threshold minsupRa, a maximum periodicity threshold maxPr, a maximum standard deviation threshold maxStd, a minimum high utility threshold minHuRa and a minimum sequence periodicity threshold minSeqRa;
step 2, scanning the database to construct 1 item set x HUPFPS-list, namely constructing a data list HUPFPS-list which is formed by the commodity x appearing in the purchase sequence of several users, appearing in sequence according to the time sequence and the utility of the commodity, and judging whether the 1 item set x is a high utility period frequent pattern HUPFPS, which specifically comprises the following steps:
step 2.1, scanning each sequence in the database and calculating the support rate supRa ({ x }, S), maximum periodicity maxPer ({ x }, S), utility ratio utiRa ({ x }, S) and period standard deviation stanvev ({ x }, S) of 1 item set x;
for a product x appearing in the purchase sequence S, if the purchase frequency of a certain product x is greater than the minimum purchase frequency ratio, i.e., supRa ({ x }, S) ≧ minSupRa, the time interval between two times of purchase of the product x does not exceed the maximum period threshold, i.e., maxPeer ({ x }, S) ≦ maxPr, the purchase period of the product x is stable within a certain range, i.e., stanvv ({ x }, S) < maxStd, and the sales ratio of the product x in a customer purchase sequence is greater than the merchant-defined threshold, i.e., utiRa ({ x }, S) ≧ minHuRa, then 1 item set x is a high-utility period frequent pattern in the purchase sequence S of a certain customer, and the algorithm stores the sequences of which the 1 item set x satisfies the condition in the set huprSeq (x).
Step 2.2, calculating huSeqRa (x) according to the set huPrSeq (x), and if the high utility period sequence is more than or equal to minSeqRa (x), outputting 1 item set x which is a high utility period frequent pattern HUPFPS item set;
wherein, the number of sequences of the set huPrSeq (x) in the database that the 1 item set x satisfies the set huPrSeq (x) is | huPrSeq (x) |, and the value of the high utility cycle sequence ratio of the 1 item set x in the database is defined as hupesqa (x) = | huPrSeq (x) |/| D |, where | D | is the number of sequences in the sequence database.
Step 3, pruning the search space according to the upper bound value upseqRa, adding HUPFPS-list of 1 item set which meets the condition that upseqRa (x) is more than or equal to minseqRa into the set bound HUPFPS, and not expanding the condition which does not meet the condition;
step 4, utilizing a set bound HUPFPS to intersect and merge the 1 item sets after pruning into 2 item sets, namely the combination of 2 commodity data information, constructing HUPFPS-list of the 2 item sets, storing the HUPFPS-list of the item set which accords with upseqRa (x) and is not less than minseqRa into the bound HUPFPS so as to carry out a new iteration, and judging whether the 2 item sets are HUPFPS or not;
and 5, recursively circulating the HUPFPS-list of the n-1 item set to generate an n item set until the n item set cannot be expanded, and outputting all high-utility periodic frequent item sets.
Further, the item set of one commodity is item set X1, the item sets of a plurality of commodities are item sets X, the item set X satisfies the number of trades of a certain commodity X in a database, supRa (X, S) ≧ minSupRa, and all sequence sets of maximum periodicity maxPeer (X, S) ≦ maxPr and utility ratio utiRa (X, S) ≧ minHuRa in the item set X are recorded as huCand (X) = { S) = 1 ,...,S n And the number of sequences in the set is recorded as UpSeqRa (X) = | huCand (X) |/| D |, and the upper bound of the value of the sequence ratio of the high utility period of the item set X in the database is defined as UpSeqRa (X) = | huCand (X) |/| D |.
Further, the support rate of the item set X in the sequence S is defined as supRa (X, S) = sup (X, S)/| S |, where | S | is the total number of transactions contained in the sequence S;
the number of times a transaction including the occurrence of a certain commodity X in the sequence S is defined as sup (X, S) = | TR (X, S) |.
Further, let u (X, S) be the total utility of the item set X in a purchase sequence S, su (S) be the total utility of the sequence S, and the ratio thereof is defined as utiRa (X, S) = u (X, S)/su (S), where utiRa (X, S) is referred to as utility ratio.
The invention has the following advantages:
1. the method provided by the invention not only considers the frequency ratio of the mode in each sequence, but also considers the periodicity of the mode in each sequence and the utility ratio of the mode in the sequence.
2. In order to ensure the frequency of the periodic pattern in each sequence, the invention defines a new metric, namely the ratio of the support number in different sequence lengths of the periodic pattern to the sequence length, so as to ensure that the output of the algorithm is the high-utility periodic frequent pattern.
3. The invention provides a measure for mining a high-utility periodic frequency pattern in a plurality of sequences, namely a high-utility periodic sequence ratio huseqRa, and aims to define the high-utility periodic frequency pattern in the plurality of sequences.
4. On the basis of a support counting method, the method is improved to use the constraint of the support ratio, the internal utility and the external utility of a project are considered on the basis of periodic pattern mining, the high utility ratio of the pattern in a sequence is defined, the purpose is to define and find that the pattern is high utility in a sequence, the accuracy of the high utility frequent pattern is ensured, and the mining requirement can be effectively met.
5. In order to reduce the search space and accelerate the HUPFPS speed of the high-utility periodic frequent pattern mining algorithm, the invention provides a pruning strategy, namely defining an upper bound upseqRa of a high-utility periodic sequence ratio, and extending two pruning characteristics, namely:
(1) The algorithm calculates the upseqRa value of 1 item set x to prune the search space and stores the HUPFPS-list of 1 item set of upseqRa (x) being more than or equal to minseqRa in the set boundHUPFPS;
(2) The high utility period sequence ratio upper bound for item set X in the database is defined as upseqRa (X) = | huand (X) |/| D |.
Therefore, an efficient algorithm is generated, the algorithm is called a high-utility periodic frequent pattern mining algorithm HUPFPS, a HUPFPS-list structure is constructed by the algorithm through a cross program, repeated scanning of a database is avoided, and algorithm operation efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts. In the drawings:
FIG. 1 illustrates a flow chart of a method for high utility cycle frequent pattern mining in a purchase pattern according to the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method firstly introduces the definition of a period mode and a utility mode in a traditional single sequence, then extends the period mode and the utility mode to a plurality of sequences, and finally proposes a pruning strategy of a search space and two new pruning characteristics. The following introduces definitions and theorems relating to the present invention:
definition 1: let I = { X 1 , X 2 , ..., X m Is a set of m different items in the database, a set of items X is a subset of I, denoted X contained in I, a set of items X having k different items { I } 1 ,i 2, ...,i k K-itemset, a set of items 1 item X, a plurality of items X, a database n sets of sequences, one sequence S an ordered list of transactions, denoted S = { T }, a database n sets of sequences, a transaction X, a database n, and a transaction X, n, m 1 ,T 2 , ... T j T here j Representing a transaction in a sequence, where j is the sequenceThe unique transaction identifier in the column.
Definition 2: each project in the database has a measure of profit per unit or other value, denoted as pl (i) m ) This represents how important the item is to the user. The unit profit for each project has a specialized profit list, denoted as profit = { pl (i) 1 ), pl(i 2 ),..., pl(i m ) }, any transaction T in a sequence q Item i j Is expressed as u (i, T) q , S n ) = q(i j , T q , S n )* pl(i j ) Wherein q (i) j, T q , S n ) Is to point out that the sequence S is present n Middle, transaction T q Item i j The number of the cells.
Definition 3: consider a sequence S of rows i A set of items X, sequence S i An ordered transaction list containing a set of items X is defined as TR (X, S) =<T g(1) , T g(2) ,..., T g(k) >Is contained in S i。 Let T g(z) And Tg (z+1) Is the occurrence of item set X in sequence S i Two consecutive transactions. The periodic calculation formula for two consecutive transactions containing item set X is per (T) g(z), T g(z+1) ) = g (z + 1) -g (z). Sequence S i The period of the middle set X is pr (X, S) i = per1, per 2., perk +1}, where Perk = g (k) -g (k-1), g (k) being the TID of the transaction in which the set of items X appears, and g (0) =0 and g (k + 1) = | S are specified i L, where l S i Is the length of the sequence.
Definition 4: the standard deviation of the period of one set of terms X in the sequence S is denoted as stanDev (X, S).
Definition 5: the maximum periodicity of one item set X in the sequence S is defined as maxPer (X, S) = argmax (pr (X, S)).
Definition 6: in a sequence S, one set of items X may appear in multiple transactions, and the number of transactions in the sequence S that contain the occurrence of X is defined as sup (X, S) = | TR (X, S) |.
Definition 7: the support rate of the item set X in the sequence S is defined as supRa (X, S) = sup (X, S)/| S |, where | S | is the total number of transactions contained in the sequence S.
Definition 8: let sequence S i The total utility of item set X in (1) is u (X, S) i ) Sequence S i Has a total effect of su (S) i ). The ratio is defined as utiRa (X, S) i ) = u(X, S i )/su(S i ) Wherein utiRa (X, S) i ) Referred to as utility ratio.
Definition 9: assuming that there are four user-defined thresholds, minSuPra, maxPr, maxStd, and minHuRa, respectively, if a set of terms X satisfies the conditions in the sequence S, supRa (X, S) ≧ minSuPra, maxPer (X, S) ≦ maxPr, stanvv (X, S) ≦ maxStd, and utiRa (X, S) ≧ minHuRa, then the set of terms X is defined to be highly frequent in the sequence S. In the database, the set of all sequences whose entry set X satisfies the periodicity frequency is represented as huprSeq (X) = { S | suppRa (X, S) ≧ minosupRa ^ maxPeer (X, S) ≦ maxPr ^ standv (X, S) ≦ maxStd ^ utiRa (X, S) ≦ minHuRa ^ See D }.
Definition 10: if term set X satisfies the set hupPrSeq (X) with a sequence number of | hupPrSeq (X) | in the database, then the value of the high utility periodic sequence ratio of term set X in the database is defined as hupeqRa (X) = | hupPrSeq (X) |/| D |, where | D | is the number of sequences in the database.
Definition 11: in the database, if huseqRa (X) ≧ minseqRa, then the high utility periodic frequent pattern of item set X in the database.
Definition 12: assuming that the term set X satisfies the conditions that supRa (X, S) ≥ minSupRa, maxPeer (X, S) ≤ maxPr and utiRa (X, S) ≥ minHuRa in the database, all the sequence sets are denoted as huCand (X) = { S) = 1 ,...,S n And the term set X is called an Utility cycle frequent candidate pattern, the number of sequences in the set is denoted as | huCand (X) |, and the UpSeqRa (X) = | huCand (X) |/| D | is defined as the upper bound of the UpSeqRa (X) = | for the term set X in the database.
Theorem 1: in the sequence database, the value of upseqRa of item set X is not less than huseqRa, and is expressed as upseqRa (X) ≧ huseqRa (X).
Theorem 2: in the database, for any two sets of items, upseqRa (X) ≧ upseqRa (XY) if the subset of items whose XY is X is denoted XY-containing X.
Theorem 3: in one database, if upseqRa (X) of any item set X ≦ minSeqRa, then any item set X and its superset are not HUPFPS.
The specific algorithm process in the present invention is described below with reference to fig. 1:
as shown in fig. 1, a method applied to frequent pattern mining of high utility periods in a purchasing pattern includes step 1, inputting a database of goods and quantities purchased by customers within a period of time, and defining five thresholds by a merchant, namely a minimum support rate threshold minSupRa, a maximum periodicity threshold maxPr, a maximum standard deviation threshold maxStd, a minimum high utility threshold minHuRa and a minimum sequence periodicity threshold minSeqRa;
the algorithm finds all HUPFPS by depth-first search, taking as input one multi-sequence database and five custom thresholds.
And 2, scanning the database to construct 1 HUPFPS-list of the item set x, namely constructing a data list HUPFPS-list formed by the purchase sequence of users of which a certain commodity appears, the transactions of which the commodity appears in sequence according to the time sequence and the utility of the commodity, and judging whether the item set x is a high utility periodic frequent pattern HUPFPS or not.
Specifically, each sequence in the database is scanned and the support rate supRa ({ x }, S) for the 1-term set x, the maximum number of cycles maxPer ({ x }, S) for the 1-term set x, the utility ratio utiRa ({ x }, S) and the cycle standard deviation standv ({ x }, S) for the 1-term set x are calculated;
for a product x appearing in the purchase sequence S, if the purchase frequency of a certain product x is greater than the minimum purchase frequency ratio, i.e., supRa ({ x }, S) ≧ minSupRa, the time interval between two times of purchase of the product x does not exceed the maximum period threshold, i.e., maxPeer ({ x }, S) ≦ maxPr, the stability of the purchase period of the product x is within a certain range, i.e., stanvv ({ x }, S) < maxStd, and the sales ratio of the product x in a customer purchase sequence is greater than the merchant-defined threshold, i.e., utiRa ({ x }, S) ≧ minHuRa, then 1 item set x is a high-utility period frequent pattern in the purchase sequence S of a certain customer, and the algorithm stores the sequence of which 1 item set x satisfies the condition into the set huprSeq (x).
The algorithm then divides the number of sequences in the set huPrSeq by the total number of sequences | D | to calculate the high utility period ratio hupeqra (x) for 1 item set x, which is a high utility period frequent item set if this value is not less than minSeqRa.
In step 3, the search space is pruned according to the upper bound value upseqRa, HUPFPS-list of 1 item set x meeting the condition upseqRa (x) which is more than or equal to minseqRa is added to the set bound HUPFPS, and expansion is not performed any more when the condition is not met.
Specifically, the algorithm computes the upseqRa value of 1 term set x to prune the search space and stores the HUPFPS-list of 1 term set x where upseqRa (x) ≧ minseqRa in the set bound HUPFPS, with the HUPFPS-lists in the set sorted according to the value of upseqRa. Algorithm HUPFPS performs depth-first search calls boundhpfps, minSupRa, maxPr, maxStd, minSeqRa, minHuRa and database, performing recursive search for 2 sets of terms and larger patterns. This process will only explore sets of items having an upseqRa value no less than minSeqRa.
And 4, intersecting and merging the 1 item set after pruning into 2 item sets, namely the combination of 2 commodity data information, by utilizing a set bound HUPFPS, constructing HUPFPS-list of the 2 item sets, storing the HUPFPS-list of the item set which accords with upseqRa (x) which is not less than minSeqRa into the bound HUPFPS so as to carry out a new iteration, and judging whether the 2 item set is HUPFPS or not.
Specifically, the search process takes as input a set of terms P and a series of custom thresholds minSupRa, maxPr, maxStd, minSeqRa and minHuRa and a set boundHUPFPS. The extension of item set P is the set of items obtained by appending item set z to P, denoted Pz. When the algorithm first invokes this search process, P is an empty set and the extended term set of P is a 1 term set. The search process executes a loop that combines each pair of expanded term sets Px and Py of P into a HUPFPS-list of term set Pxy.
The algorithm can construct the HUPFPS-list of the extension item set Pxy from the HUPFPS-list of Px and Py by a cross program without repeatedly scanning a database. The algorithm then scans Pxy's HUPFPS-list to calculate huCand (Pxy) and upseqRa (Pxy). Then, if upseqRa (Pxy) ≧ minseqRa, item set Pxy and its superset may be a HUPFPS and Pxy's HUPFPS-list is added to the set boundHUPFPS, which stores HUPFPS-lists for all extension item sets for Px with upseqRa values no less than minseqRa. Then, the algorithm calculates the value of huSeqRa (Pxy), and if the value is not less than minSeqRa, outputs Pxy as HUPFPS.
And 5, recursively circulating the HUPFPS-list of the n-1 item set to generate an n item set until the n item set cannot be expanded, and outputting all high-utility periodic frequent item sets.
Specifically, the calling pattern search process, which is recursive throughout the last algorithm, explores the n term set, and if the value of upseqRa (Pxy) is less than minseqRa, the term set Pxy and all its supersets are pruned.
PREFERRED EMBODIMENTS
The sequence database sample in the preferred embodiment is shown in Table 1:
table 1: sequence database sample
SID
1.(a:6,b:10,c:10),(b:8,c:8,d:13),(a:5,b:6),(a:8,b:5,e:8),(a:4,b:7,c:6,d:10)
2.(d:14),(a:5,b:8,c:3,d:3),(a:6,c:15,d:8),(a:9,b:9,d:15),(a:10,b:6,c:14,e:13)
3.(b:7,d:10),(a:8,d:4),(a:5,c:15,d:12),(b:3,d:12,e:3),(a:9,b:11,d:12)
4.(a:6,b:12,d:14),(a:6,b:2,d:8),(a:9,c:6,d:6),(b:2,d:9),(b:5, d:8,e:6)
The HUPFPS-list structure was constructed as shown in tables 2,3 and 4:
table 2: HUPFPS-list of item set { a }
i-set {a}
Sid-list {1,2,3,4}
Tran-list [{1,3,4,5},{2,3,4,5},{2,3,5},{1,2,3}]
Uti-list[{456,380,608,304},{380,456,684,760},{608,380,684},{456,456,684}]
Table 3: HUPFPS-list of item set { d }
i-set {d}
Sid-list {1,2,3,4}
Tran-list [{2,5},{1,2,3,4},{1,2,3,4,5},{1,2,3,4,5}]
Uti-list [{533,410},{574,123,328,615},{410,164,492,492,492},{574,328,246,369,328}]
Table 4: HUPFPS-list of item set { a, d }
i-set {a,d}
Sid-list {1,2,3,4}
Tran-list [{5},{2,3,4},{2,3,5},{1,2,3}]
Uti-list [{714},{503,784,1299},{772,872,1176},{1030,784,930}]
Table 5: external watch
a b c d e
76 65 35 41 118
Firstly, the algorithm calculates huSeqRa ({ a }) to be more than or equal to minSeqRa, upSeqRa ({ a }) to be more than or equal to minSeqRa, huSeqRa ({ d }) to be more than or equal to minSeqRa and upSeqRa ({ d }) to be more than or equal to minSeqRa according to parameter values of 1 item set. Therefore, the item sets { a } and { d } and the algorithm scan the database to generate the HUPFPS-list of the 2 item sets through the intersection and expansion of the field information Sid-list, tran-list and Uti-list of the HUPFPS-list of the 1 item set of the high-utility periodic candidate mode, then the parameter values of the 2 item set mode are calculated through the HUPFPS-list information, and whether the expanded 2 item set is the HUPFPS is judged, and so on until a larger item set cannot be generated.
Table 1 shows the times and amounts at which four customers purchase the items a, b, c, d, e, as exemplified by the purchase list 1 of the first customer in table 1 (a: 6, b: that is, the first customer purchases 6 items a, 10 items b, 10 items c, 8 items b, 8 items c, 13 items d, and so on for the first time.
In Table 2, the set of items {1,3, 4} in Sid-list {1,2,3,4} representing that the first, second, third and fourth customers all purchased the a commodity, tran-list [ {1,3,4,5}, {2,3,5}, {1,2,3} ] represents that the first customer purchased the commodity a for the first time, the third time, the fourth time and the fifth time, and {2,3,4,5} represents that the second customer purchased the commodity a for the second time, the third time, the fourth time and the fifth time, and so on.
The external utility of the first customer who purchased 6 a commodities for the first time is 6 × 76=456, 608, 304, and 5 × 76=380for the third time, and so on in the aggregate {456, 380, 456, 684}, in the Uti-list [ {456, 380, 456, 684}, in {456, 380, 608, 304}, and so on.
In Table 4
Uti-list [ {714}, {503, 784, 1299}, {772, 872, 1176}, {1030, 784, 930} ], in combination with the external utility values for each of the commodities in Table 5, wherein the item set {714} is the external utility of 4 × 76+10 × 41=714 for the first customer who purchased 4 a commodities and 10 d commodities simultaneously the fifth time, and so on.
As can be seen from Table 1, in the HUPFPS-list of the pattern { a }, the Sid-list is {1,2,3,4}, the Tran-list of the pattern { a } is ({ 1,3,4,5}, {2,3,5}, {1,2,3 }), and the Uti-list of the pattern { a } is {456, 380, 608, 304}, {380, 456, 684, 760}, {608, 380, 684}, {456, 456, 684}. In the HUPFPS-list of the pattern { d }, sid-list is {1,2,3,4}, tran-list of the pattern { d } is ({ 2,5}, {1,2,3,4,5 }), and Uti-list of { d } is ({ 533, 410}, {574, 123, 328, 615}, {410, 164, 492, 92, 492}, {574, 328, 246, 369, 328 }).
The algorithm expands the HUPFPS-list intersection of patterns { a } and { d } to obtain the Tran-list of patterns { a, d } where Sid-list is {1,2,3,4}, { a, d } is ({ 5}, {2,3,4}, {2,3,5}, {1,2,3 }), and { a, d } where Uti-list is ({ 714}, {503, 784, 1299}, {772, 872, 1176}, {1030, 784, 930 }). The algorithm calculates the parameter values from the field information in the HUPFPS-list of { a, D }, then compares with the custom threshold to obtain the set huCand ({ a, D }) = { S2, S3, S4}, and calculates from the set upseqRa ({ a, D }) = | huCand ({ a, D }) | \| D = |0.75 ≧ minSeqRa, so the patterns { a, D } and their superset may be HUPFPS, and adds the HUPFPS-list of { a, D } to the set bound HUPFPS in order to extend the 3-item set.
Finally, a sequence set hupreq ({ a, d }) = { S2, S3, S4} is calculated according to parameter values, hupreq ({ a, d }) =3/4=0.75 is calculated, and an algorithm output two-term set { a, d } is HUPFPS. The recursive calling explorer of the algorithm HUPFPS explores a larger set of n terms.
It is to be understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make various changes, modifications, additions and substitutions within the spirit and scope of the present invention.

Claims (4)

1. A method for mining high-utility periodic frequent patterns in a purchase mode is characterized by comprising the following steps:
step 1, inputting a database of goods and quantity purchased by a customer within a period of time, and customizing five thresholds by a merchant, namely a minimum support rate threshold minsupRa, a maximum periodicity threshold maxPr, a maximum standard deviation threshold maxStd, a minimum high utility threshold minHuRa and a minimum sequence periodicity threshold minSeqRa;
step 2, scanning the database to construct 1 item set x HUPFPS-list, namely constructing a data list HUPFPS-list which is formed by the commodity x appearing in the purchase sequence of several users, appearing in sequence according to the time sequence and the utility of the commodity, and judging whether the 1 item set x is a high utility period frequent pattern HUPFPS, which specifically comprises the following steps:
step 2.1, scanning each sequence in the database and calculating the support rate supRa ({ x }, S), maximum periodicity maxPer ({ x }, S), utility ratio utiRa ({ x }, S) of 1 item set x and the period standard deviation standv ({ x }, S) of 1 item set x;
for the commodity x appearing in the purchase sequence S, if the purchase frequency of the commodity x is greater than the minimum purchase frequency, namely, supRa ({ x }, S) ≧ minSupRa, the time interval between two times of purchase of the commodity x does not exceed the maximum period threshold, namely, maxPer ({ x }, S) ≦ maxPr, the purchase period of the commodity x is stable within a certain range, namely, stanvev ({ x }, S) < maxStd, and the sales ratio utiRa ({ x }, S) of the commodity x in the shopping sequence of a client is greater than the merchant-defined minimum high-utility threshold, namely utiRa ({ x }, S) ≧ minHuRa, then 1 item set x is a high-utility frequent-period mode in the purchase sequence S of a client, and the algorithm stores the sequences of the item set 1 item set hux satisfying the condition into the set PrSeq (x);
step 2.2, calculating huSeqRa (x) according to the set huPrSeq (x), and if the high utility period sequence is more than or equal to minSeqRa (x), outputting 1 item set x which is a high utility period frequent pattern HUPFPS item set;
wherein, the number of sequences of the 1 item set x in the database satisfying the set huPrSeq (x) is | huprpseq (x) |, and the ratio of the high utility period sequences of the 1 item set x in the database is defined as hupeqra (x) = | huprpseq (x) |/| D |, wherein | D | is the number of sequences of the database;
step 3, pruning the search space according to the upper bound value upseqRa, adding HUPFPS-list of 1 item set x meeting the condition upseqRa (x) which is more than or equal to minseqRa into the set bound HUPFPS, and not expanding the condition;
step 4, intersecting and merging the 1 item set after pruning into 2 item sets, namely the combination of 2 commodity data information, by utilizing a set bound HUPFPS, constructing HUPFPS-list of the 2 item sets, storing the HUPFPS-list of the item set which accords with upseqRa (x) which is not less than minSeqRa into the bound HUPFPS so as to carry out a new iteration, and judging whether the 2 item set is HUPFPS or not;
and 5, recursively circulating the HUPFPS-list of the n-1 item set to generate an n item set, and outputting all high-utility periodic frequent item sets until the n item set cannot be expanded.
2. The method as claimed in claim 1, wherein the item set of commodity is item set 1X, the item set of multiple commodities is item set X, the item set X satisfies the number of trades supRa (X, S) ≧ minSUPRa in the database, the maximum periodicity maxPeer (X, S) ≦ maxPr and the utility ratio utiRa (X, S) ≧ minHuRa in the item set X are all sequence sets denoted huhud (X) = { S Cand (X) = { S) = 1 ,...,S n And the number of sequences in the set is recorded as UpSeqRa (X) = | huCand (X) |/| D |, and the upper bound of the value of the sequence ratio of the high utility period of the item set X in the database is defined as UpSeqRa (X) = | huCand (X) |/| D |.
3. The method of claim 2, wherein the support rate of item set X in sequence S is defined as supRa (X, S) = sup (X, S)/| S | where | S | is the total number of transactions contained in sequence S;
the number of times a transaction including the occurrence of a certain commodity X in the sequence S is defined as sup (X, S) = | TR (X, S) |.
4. The method of claim 2, wherein the total utility of the item set X in a purchase sequence S is u (X, S), the total utility of the sequence S is su (S), and the ratio thereof is defined as utiRa (X, S) = u (X, S)/su (S), wherein utiRa (X, S) is called utility ratio.
CN202211463101.3A 2022-11-22 2022-11-22 Method for mining high-utility periodic frequent pattern applied to purchase pattern Active CN115563192B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211463101.3A CN115563192B (en) 2022-11-22 2022-11-22 Method for mining high-utility periodic frequent pattern applied to purchase pattern

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211463101.3A CN115563192B (en) 2022-11-22 2022-11-22 Method for mining high-utility periodic frequent pattern applied to purchase pattern

Publications (2)

Publication Number Publication Date
CN115563192A CN115563192A (en) 2023-01-03
CN115563192B true CN115563192B (en) 2023-03-10

Family

ID=84769999

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211463101.3A Active CN115563192B (en) 2022-11-22 2022-11-22 Method for mining high-utility periodic frequent pattern applied to purchase pattern

Country Status (1)

Country Link
CN (1) CN115563192B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995882A (en) * 2014-05-28 2014-08-20 南京大学 Probability frequent item set excavating method based on MapReduce
CN106202430A (en) * 2016-07-13 2016-12-07 武汉斗鱼网络科技有限公司 Live platform user interest-degree digging system based on correlation rule and method for digging
CN107491988A (en) * 2017-08-09 2017-12-19 浙江工商大学 A kind of wisdom retail data method for digging based on genetic algorithm and improvement interest-degree
WO2018054352A1 (en) * 2016-09-23 2018-03-29 腾讯科技(深圳)有限公司 Item set determination method, apparatus, processing device, and storage medium
CN110471960A (en) * 2019-08-21 2019-11-19 桂林电子科技大学 A kind of effective item set mining method containing disutility
CN111930797A (en) * 2020-07-09 2020-11-13 西北工业大学 Uncertain periodic frequent item set mining method and device
WO2022036894A1 (en) * 2020-08-18 2022-02-24 齐鲁工业大学 Commodity recommendation system based on mining of high-utility negative sequential rule for decision-making, and working method of commodity recommendation system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018059298A1 (en) * 2016-09-27 2018-04-05 腾讯科技(深圳)有限公司 Pattern mining method, high-utility item-set mining method and relevant device
CN107515942A (en) * 2017-08-31 2017-12-26 齐鲁工业大学 In non-Frequent episodes excavate can decision-making negative sequence pattern buying behavior analysis method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995882A (en) * 2014-05-28 2014-08-20 南京大学 Probability frequent item set excavating method based on MapReduce
CN106202430A (en) * 2016-07-13 2016-12-07 武汉斗鱼网络科技有限公司 Live platform user interest-degree digging system based on correlation rule and method for digging
WO2018054352A1 (en) * 2016-09-23 2018-03-29 腾讯科技(深圳)有限公司 Item set determination method, apparatus, processing device, and storage medium
CN107491988A (en) * 2017-08-09 2017-12-19 浙江工商大学 A kind of wisdom retail data method for digging based on genetic algorithm and improvement interest-degree
CN110471960A (en) * 2019-08-21 2019-11-19 桂林电子科技大学 A kind of effective item set mining method containing disutility
CN111930797A (en) * 2020-07-09 2020-11-13 西北工业大学 Uncertain periodic frequent item set mining method and device
WO2022036894A1 (en) * 2020-08-18 2022-02-24 齐鲁工业大学 Commodity recommendation system based on mining of high-utility negative sequential rule for decision-making, and working method of commodity recommendation system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Efficient Incremental Itemset Tree for approximate Frequent Itemset mining on Data Stream;Pavitra Bai S;《IEEE》;20170427;全文 *
数据挖掘在企业备件管理中的应用;许冬冬等;《中国新通信》;20180120(第02期);全文 *

Also Published As

Publication number Publication date
CN115563192A (en) 2023-01-03

Similar Documents

Publication Publication Date Title
Lan et al. Applying the maximum utility measure in high utility sequential pattern mining
Kleinberg et al. A microeconomic view of data mining
US8880451B2 (en) Fast algorithm for mining high utility itemsets
Gan et al. Correlated utility-based pattern mining
Lin et al. A two-phase approach to mine short-period high-utility itemsets in transactional databases
Shankar et al. A fast algorithm for mining high utility itemsets
US11854022B2 (en) Proactively predicting transaction dates based on sparse transaction data
Neeraj et al. Overview of non-redundant association rule mining
Nouioua et al. Tkc: Mining top-k cross-level high utility itemsets
Li et al. Temporary rules of retail product sales time series based on the matrix profile
CN115563192B (en) Method for mining high-utility periodic frequent pattern applied to purchase pattern
Dinh et al. A survey of privacy preserving utility mining
Chen et al. High Utility Periodic Frequent Pattern Mining in Multiple Sequences.
Huang et al. Targeted mining of top-k high utility itemsets
Li et al. An efficient algorithm for mining high utility quantitative itemsets
CN115617881B (en) Multi-sequence periodic frequent pattern mining method in uncertain transaction database
Kenny Kumar et al. High average utility itemset mining: a survey
Kunjachan et al. Recommendation using frequent itemset mining in big data
Nouioua et al. CHUQI-Miner: Mining correlated quantitative high utility itemsets
Esmaeilpour et al. Cellular learning automata for mining customer behaviour in shopping activity
Dave et al. Efficient mining of high utility sequential pattern from incremental sequential dataset
Kavitha et al. High Utility Itemset Mining With Influential Cross Selling Items From Transactional Database
Lonlac et al. Extracting seasonal gradual patterns from temporal sequence data using periodic patterns mining
Murali et al. A Novel Mining Algorithm for High Utility Itemsets from Transactional Databases
Dalal et al. Various Research Opportunities in High Utility Itemset Mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant