CN114971794A - Time period-based high-utility sequence mode analysis method and system in group purchase - Google Patents

Time period-based high-utility sequence mode analysis method and system in group purchase Download PDF

Info

Publication number
CN114971794A
CN114971794A CN202210590304.2A CN202210590304A CN114971794A CN 114971794 A CN114971794 A CN 114971794A CN 202210590304 A CN202210590304 A CN 202210590304A CN 114971794 A CN114971794 A CN 114971794A
Authority
CN
China
Prior art keywords
sequence
utility
item
database
stability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210590304.2A
Other languages
Chinese (zh)
Inventor
徐田田
解士永
赵龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN202210590304.2A priority Critical patent/CN114971794A/en
Publication of CN114971794A publication Critical patent/CN114971794A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0633Lists, e.g. purchase orders, compilation or processing
    • G06Q30/0635Processing of requisition or of purchase orders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Development Economics (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a system for analyzing a high-utility sequence mode in group purchase based on a time period, which are used for acquiring and preprocessing group purchase order data of a customer to obtain a standardized time sequence database; calculating the utility of the item and the residual utility of the item in the sequence in the time sequence database, and generating a projection database based on the utility of the item, the residual utility of the item, the position of the item in the sequence and the position of the sequence containing the item in the time sequence database; calculating the periodicity, stability value and utility value of the given sequence by using a stability period high utility sequence algorithm; judging whether the utility value of the given sequence is not less than the minimum utility threshold value and whether the stability value is not greater than a periodic stability threshold value; if yes, searching based on the projection database to form recommendation information for products in the stability period sequence mode corresponding to the given sequence, and mining a high-utility stability period mode for recommendation.

Description

Time period-based high-utility sequence mode analysis method and system in group purchase
Technical Field
The invention belongs to the technical field of data mining, and particularly relates to a method and a system for analyzing a high-utility sequence pattern in group purchase based on a time period.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
Under the background of the modern big data era, online shopping becomes a normal consumption mode of residents, and a new consumption mode is rapidly developed, but the traditional online shopping mode cannot meet the requirements of residents on high-frequency consumption of fresh products, and community group-buying services such as vegetable washing, American group preference, more and more vegetable buying and the like are successively released by all big e-commerce platforms. Briefly, community group buying is that an e-commerce platform leads, upstream supply resources are integrated, a group leader is developed in a community by using the e-commerce platform, nearby residents select a community self-service point on the platform to place orders, commodities are uniformly distributed to the group leader by the e-commerce platform, and the group leader is responsible for delivery or self-service. The group buying in the community is switched in by the family high-frequency consumption fresh goods, the concentrated advantages of the geographic positions of buyers are utilized to carry out unified distribution, the distribution time is shortened, the distribution cost is reduced, and the prices of the commodities in the platform are high and low. Compared with the traditional electronic commerce, the method has the advantages that the express cost and the packaging cost can be saved by more than 70%; compared with stores, the price is generally reduced by about 10% -20%. Therefore, these consumption modes have grown rapidly in recent years, and become a new habit for residents to shop online. However, such marketing mode is independent of the customer, and most recommendations only stimulate the customer to purchase by the characteristics of low price, cheapness and the like, so that no differentiated selling points exist on the platform, and the turnover is reduced and the customer runs away.
High Utility Sequence Pattern Mining (HUSPM) is a recently developed study for mining High Utility sequence patterns in quantitative sequence databases, which has been widely explored and applied, such as business decision-making, customer market basket analysis, DNA sequence analysis, and stock market analysis. In the research branch of the HUSPM, researchers have proposed High Utility sequence pattern mining based on time constraints, such as Periodic High Utility Sequence Pattern Mining (PHUSPM), which not only limits the minimum Utility problem of the patterns in the HUSPM, but also uses the time stamps of the patterns in the sequence as a limiting condition, so that the mined patterns not only need to satisfy the minimum Utility condition, but also need to satisfy the maximum period condition. Thus, the patterns mined by such studies are not only highly useful (high profit), but also regularly appear in the database. The algorithm has wide application, and can be used for applications related to data mining and mode mining, such as customer behavior analysis, website click stream analysis, stock market analysis, basket analysis, biomedical applications and mobile computing.
The period of a pattern refers to the time interval or number of events between each successive occurrence of the pattern. The periodicity of a pattern is conventionally defined as its maximum period. A pattern is considered periodic if its periodicity is not greater than the customer-defined maxPeer threshold. But has an important disadvantage that it is too strict because if a pattern has only one period exceeding maxPer, it is discarded. For example, assuming maxPer 1 week and that there are customers buying < bread, cheese > every weekend, this pattern would be periodic, but if the customer skips a week without buying, it would be considered not periodic. Thus, the conventional PHUSPM may discard some useful patterns. In addition, when maxPer is set to be too large, PHUSPM also has large and small cycles of the partial modes obtained by excavation, and obviously these modes are not desirable.
At present, most of community group buying platforms are low-price and economical commodities recommended by customers or recommended according to a traditional support method, and although the requirements of part of customers can be met, the requirements of most of customers cannot be met, and the benefits of the platforms cannot be maximized. The use of the conventional HUSPM for mining recommended commodities can only satisfy the high profit of the merchant, but cannot find the commodities which the customer needs to buy regularly, so as to keep the customer. When the PHUSPM is used for mining recommended commodities, useful patterns, such as patterns which are not purchased in a period but are purchased in most periods, are discarded, and when the judgment threshold is set to be overlarge, the period values of some patterns are overlarge and are not patterns to be searched.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a time period-based efficient group buying analysis method and system by using sequence patterns.
In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
a method for analyzing a high utility sequence pattern in group purchase based on a time period comprises the following steps:
acquiring group purchase order data of a customer and preprocessing the data to obtain a standardized time sequence database;
calculating the utility of the item and the residual utility of the item in the sequence in the time sequence database, and generating a projection database based on the utility of the item, the residual utility of the item, the position of the item in the sequence and the position of the sequence containing the item in the time sequence database;
calculating the periodicity, stability value and utility value of the given sequence by using a stability period high utility sequence algorithm;
judging whether the utility value of the given sequence is not less than the minimum utility threshold value and whether the stability value is not greater than the periodic stability threshold value;
and if so, searching based on the projection database to form recommended information of the products in the stability period sequence mode corresponding to the given sequence.
Further, the pretreatment specifically comprises:
dividing the group purchase order data of the customers into different sequences according to different time periods;
dividing the items into different sets in a sequence according to the purchases of the customer at different times in the time period;
the items are divided into different items in an item set according to the type and quantity of products purchased by a customer at a point in time.
Further, generating the projection database specifically includes:
the projection database comprises a utility and position information table, an index table and a periodic table;
the utility and position information table stores item names, the utility of the items, the residual utility of the items and the next position of the items in the sequence;
the index table stores different entries and the location of the entry in the sequence at which it first appears;
the periodic table records sequence numbers for different sequences that occur in a standardized time series database.
Further, the utility of the item is the product of the external utility and the internal utility, the internal utility is the quantity of the product group purchase in the item, and the external utility is the price or profit of the product; the remaining utility of a term is the sum of the utilities of all terms after that term in the set of terms.
Further, the periodicity of the given sequence is calculated as:
Figure BDA0003667082770000041
maxper(t)=max(pes(t))
wherein alpha is 0 =0,α k+1 N, | s (t) | is the number of elements in s (t) set, maxper (t) is the maximum period of sequence t.
Further, the stability value for the given sequence is calculated as:
la(t,i)=max(0,la(t,i-1)+pes(t,i)-maxper)
maxla(t)=max(la(t))
where t is a given sequence, maxper is a custom value, i is a sequence with different labels, la (t, -1) is 0, max (la (t)) is the stability of the sequence t, and maxla (t) is the maximum instability of the sequence t.
A second aspect of the present invention provides a time-period-based high-utility series model in group buying analysis system, comprising:
the acquisition module is respectively configured to acquire and preprocess group purchase order data of a customer to obtain a standardized time sequence database;
a computing module configured to: calculating the periodicity, stability value and utility value of the given sequence by using a high utility sequence algorithm;
a projection database generation module configured to: calculating the utility of the item in the sequence in the time sequence database, and generating a projection database based on the utility of the item, the position of the item in the sequence and the position of the sequence containing the item in the time sequence database;
a determination module configured to: judging whether the utility value of the given sequence is not less than the minimum utility threshold value and whether the stability value is not greater than a periodic stability threshold value;
a recommendation module configured to: and if so, searching based on the projection database and recommending the products in the stability period sequence mode corresponding to the given sequence to the customer.
A third aspect of the invention provides a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a method for group buying analysis based on a time period based high utility sequence pattern as described above.
A fourth aspect of the invention provides a computer apparatus.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the steps of a time period based high utility sequence pattern in group buying analysis method as described above.
The above one or more technical solutions have the following beneficial effects:
the application of the high-utility sequence mode based on time constraint in the group buying of the network community mainly uses a stability period high-utility sequence mode mining (SPHUSPM) algorithm, the algorithm is established on the HUSPM algorithm, a minimum utility threshold judgment method is used instead of the traditional minimum support degree judgment method, and the method is used for mining not products with large sales volume but commodities with large profit.
The present invention, in addition to considering the high profit factor, also considers that the customer will regularly demand certain goods, and therefore adds the concept of stability to the traditional PHUSPM method, making the method more feasible in this application. The patterns mined by the SPHUSPM are periodically shown in the database, and the period of the patterns is generated according to the requirements of customers, and the appearance in the database is stable. Patterns with high utility (high profit), stability periods can be mined and then recommended again.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is an overall flow chart of the present invention.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention.
The embodiments and features of the embodiments of the invention may be combined with each other without conflict.
Example one
The embodiment discloses a method for analyzing a high-utility sequence pattern in group purchase based on time, which comprises the following steps:
step 1: acquiring group purchase order data of a customer and preprocessing the data to obtain a standardized time series database;
step 2: calculating the utility of the item and the residual utility of the item in the sequence in the time sequence database, and generating a projection database based on the utility of the item, the residual utility of the item, the position of the item in the sequence and the position of the sequence containing the item in the time sequence database;
and step 3: calculating the periodicity, the stability value and the utility value of the given sequence by using a stability period high utility sequence algorithm;
and 4, step 4: judging whether the utility value of the given sequence is not less than the minimum utility threshold value or not and whether the stability value is not greater than a period stability threshold value or not;
and 5: and if so, searching for the product in the stability period sequence mode corresponding to the given sequence to form recommendation information based on the projection database.
In this embodiment, a group purchase order of a customer in a certain community is used as a mining database, and the database is preprocessed to obtain a quantitative sequence database (Q-sequence database).
In step 1, a quantitative sequence database may be represented as S ═ S 1 ,s 2 ,…,s N }, s sid E.s (1 ≦ SID ≦ N), where SID is the unique label owned by each sequence, in each sequence S sid Contains several quantitative term sets (Q-term sets), and each sequence can be expressed as s ═ X 1 X 2 ...X M Where a set of items may be represented as X ═ i 1 ,q 1 )(i 2 ,q 2 )...(i m ,q m )],(i k ,q k ) Represented are quantitative terms (Q-terms), i k Is the name of an item, q k Is the internal utility of the item, i.e., the number of purchases of the product. The set of items is ordered in the sequence, while the items in the set of items are unordered, typically arranged in lexicographic order.
In this example, as shown in table 1:
SID Q-sequence
S 1 <[(a,1)(b,1)(e,3)],[(c,3)(d,2)(g,3)],[(b,2)(e,1)],[(d,3)]>
S 2 <[(a,3)(b,1)(c,3)(f,2)],[(a,5)(c,2)(g,5)],[(b,3)(d,2)(e,2)]>
S 3 <[(b,1)(c,1)(e,2)(g,5)],[(a,3)(b,2)(e,4)(f,2)],[(b,2)(c,1)(e,2)]>
S 4 <[(b,2)(c,3)],[(a,5)(e,1)],[(b,4)(d,3)(e,5)]>
S 5 <[(a,4)(c,3)],[(a,2)(b,5)(c,2)(d,4)(e,3)]>
S 6 <[(f,4)],[(a,5)(b,3)],[(a,3)(d,4)]>
TABLE 1
Table 1 shows group order data for a customer, where SID shows group order data for a week, and Q-sequence represents the product purchased by the customer, in s 1 For the purpose of example only,<[(a,1)(b,1)(e,3)],[(c,3)(d,2)(g,3)],[(b,2)(e,1)],[(d,3)]>wherein<>Representative sequences,[]Represents a set of entries, () represents an entry. [ (a,1) (b,1) (e,3)]The number of purchased goods and the number of purchased goods representing one consumption, i.e., a goods is purchased by 1, b goods is purchased by 1, and e goods is purchased by three, the number of purchased goods is referred to as internal utility, and the price or profit of the goods is referred to as external utility.
Table 2 shows the prices of the items, each item corresponding to a number, which is the external utility of the item (item).
Figure BDA0003667082770000071
Figure BDA0003667082770000081
TABLE 2
Definition 1: the Q-sequence comprises
In the Q-sequence, two Q-sequences are given s ═<I 1 I 2 ...I n >And s ═<I 1 'I 2 '...I n '>If there is an integer 1. ltoreq. j 1 ≤j 2 ≤…≤j n N is less than or equal to n so that
Figure RE-GDA0003776298900000082
1. ltoreq. k.ltoreq.n, we say that s 'comprises s or is a q-subsequence of s', denoted by
Figure RE-GDA0003776298900000083
Definition 2: matching of sequences
Given a sequence q-sequence s ═<(s 1 ,q 1 )(s 2 ,q 2 )...(s n ,q n )>And a sequence t ═<t 1 t 2 ...t m >If 1. ltoreq. k. ltoreq.n, n is m, s k =t k Then s matches t, denoted t-s.
Definition 3: effect of Q-sequences
The utility of the q-term (i, q) is defined as u (i, q) ═ p (i)X q (i), q-term set X ═ i 1 ,q 1 )(i 2 ,q 2 )...(i m ,q m )]Is defined as
Figure BDA0003667082770000083
q-sequence s ═ X 1 X 2 ...X n The utility of is defined as
Figure BDA0003667082770000084
Definition 4: utility of Q-database
The total utility of the q-database is the sum of all sequence utility values, and the q-database S ═ S 1 ,s 2 ,…,s N The effect of (a) } is expressed as
Figure BDA0003667082770000085
Definition 5: effect of sequences
Given a sequence t ═ t 1 ,t 2 ,...,t m >And a sequence s ═<X 1 ,X 2 ,...,X n >The utility of the sequence t in the sequence s is shown as
Figure BDA0003667082770000086
The utility in the database S is expressed as
Figure BDA0003667082770000087
Definition 6: maximum utility of sequence
Given a sequence t ═<t 1 ,t 2 ,...,t m >And a sequence s ═<X 1 ,X 2 ,...,X n >The maximum utility of the sequence t in the sequence s is denoted as u max (t, S) ═ max { v (t, S) }, and the maximum utility in the database S is denoted as u max (t)=∑u max (t,s):
Figure BDA0003667082770000088
Definition 7: residual utility
Given a sequence s ═<X 1 ,X 2 ,...,X n >And a set of items X contained in s k =[(i 1 ,q 1 )(i 2 ,q 2 )...(i m ,q m )]Then the residual utility (ru) is i m The sum of the utilities of all terms thereafter.
In this example, the quantitative sequence database of Table 1 is taken as an example, in which the sequence s 4 The utility of the q-term c in (a) is u (c,3) ═ 4 × 3 ═ 12.
The utility of the q-term set [ (b,2) (c,3) ] is u ([ (b,2) (c,3) ]) -u (b,2) + u (c,3) -3 × 2+4 × 3-18.
s 4 The utility of (A) is as follows:
u(s 4 )=u([(b,2)(c,3)])+u([(a,5)(e,1)])+u([(b,4)(d,3)(e,5)])=18+6+23=47。
given a sequence t ═<gb>Then in the sequence s 1 The effect of (C) is v (t, s) 1 )={u(<(g,3)(b,2)>) 12. The utility in the database is then expressed as:
v(t)={v(t,s 1 ),v(t,s 2 ),v(t,s 3 ),v(t,s 4 ),v(t,s 5 ),v(t,s 6 )}={v(t,s 1 ),v(t,s 2 ),v(t,s 3 )}={12,19,16,16} 。
given a sequence<ag>In the sequence s 1 The remaining utility of (a) is:
ru(<ag>)=u([(b,2)(e,1)])+u([(d,3)])=7+6=13。
definition 8: periodic set of sequences
A q-sequence database S is provided 1 ,s 2 ,…,s n And a sequence t, the q-sequence set containing t is denoted as s (t) ═ s α1 ,s α2 ,...,s αk Wherein 1 is less than or equal to alpha 12 <...<α k ≤n。
Suppose there are two q-sequences s α 、s β And a sequence t of sequences of the sequence,
Figure BDA0003667082770000091
and is
Figure BDA0003667082770000092
If and only if there is not one s γ E s (t), where α<γ<Beta, then s is considered α 、s β Is continuous. The period of these two consecutive q-sequences is defined as pe(s) α ,s β )=β-α。
A q-sequence database S (t) { s) } α1 ,s α2 ,...,s αk And a sequence t, the periodic set of sequence t being a list of periods, which is represented as:
Figure BDA0003667082770000093
wherein alpha is 0 =0,α k+1 N, | s (t) | is the number of elements in the s (t) set.
Definition 9: maximum period of the sequence
The maximum period of a given sequence t is defined as maxper (t) max (pes (t)).
Definition 10: instability of the sequence:
the instability of the sequence t is a list represented as la (t) < la (t,0), la (t,1),. > la (t, | s (t) | >), which contains | s (t) | +1 values, i.e., | la (t) | s (t) | +1 | pes (t) |, la (t), each instability value in la (t) is not less than zero.
The above la (t, i) ═ max (0, la (t, i-1) + pes (t, i) -maxper), where i ∈ [0, | s (t) | ], maxper is the user-defined value of the user, and la (X, -1) ═ 0.
Definition 11: stability of
The maximum instability of the sequence t is defined as maxla (t) ═ max (la (t)), also known as the stability of the sequence t.
In the present embodiment, taking Table 1 as an example, the sequence is given<(ab)>Appears at s 1 ,s 2 ,s 3 ,s 5 And s 6 In, therefore, pe(s) 1 ,s 2 )=2-1=1,pe(s 2 ,s 3 )=3-2=1,pe(s 3 ,s 5 )=5-3=2,pe(s 5 ,s 6 ) 6-5-1 according to definition 8
Figure BDA0003667082770000101
The periodic set of the sequence is denoted pes (<(ab)>)={1,1,1,2,1,0}。
Assuming that maxper is set to 1, the instability of the sequence is calculated as: la (t,0) ═ max (0, pes (t,0) -maxper) ═ max (0,1-1) ═ 0, then la (t,1) ═ 0, la (t,2) ═ 0, la (t,3) ═ 1, la (t,4) ═ 1 and la (t,5) ═ 0, so the list of instabilities for the sequence < (ab) > is: la (< (ab) >) - {0,0,0,1,1,0}, and according to the stability definition, maxla (< (ab) >) -max (la (< (ab) >)))) -1, and the stability value of the sequence < (ab) > is represented as 1.
Definition 12: stability period High Utility sequence Patterns (Stable Periodic High Utility sequence Patterns, SPHUSPs)
Assuming a sequence database D and a sequence t, the three defined thresholds are divided into a minimum efficiency threshold (minutil o r xi) >0, a maximum periodicity (maxPeer) >0 and a maximum stability threshold (maxLa) ≧ 0. The problem of mining SPHUSPs in database D is to enumerate each sequence t in database D such that maxLa (t) ≦ maxLa and u (t) ≧ ξ, where u (t) is the utility of sequence t.
In the step 3, because the original database (i.e. the quantitative sequence database) needs to be scanned and traversed for multiple times during the mining of the algorithm, a data structure is designed, which can store necessary information, and after the original data is scanned once, the data structure is constructed to form a Projection Database (PD), thereby greatly reducing the operation time and reducing the operation memory.
In this embodiment, a width pruning strategy (SWU pruning strategy) and a depth pruning strategy (LAS, IPS pruning strategy) in the HUSP-ULL algorithm in the HUSPM are introduced.
The SWU strategy is used for global pruning, and when a 1- (i.e., a sequence with only one item) sequence is generated, a larger utility value is calculated based on the 1-sequence (i.e., the utilities of the sequences comprising the 1-sequence are added), and if the utility value is smaller than a user-defined minimum utility threshold, the actual utility value of the 1-sequence is also smaller than the minimum utility threshold, and the sequence can be deleted from the database. The LAS and the IPS are used for local pruning, when the LAS and the IPS are applied to the generation of sequences above 1-sequence, new items are required to be added to the original sequence when a new sequence is generated, and the items are from a set C which is also called an expanded candidate item set. Before a new sequence is not generated, a new item is taken out of C, a pruning strategy is used for calculating a utility value slightly larger than that of the generated sequence (namely, the sum of the utility of the sequence in each sequence and the residual utility of the sequence in each sequence), the utility value is compared with a minimum utility threshold value, and if the utility value is smaller than the minimum utility threshold value, the item is removed from the set C, so that the time for combination and calculation is greatly reduced.
The algorithm applies the two pruning strategies, so that the search space of the algorithm is reduced, and the operation efficiency is better.
In the embodiment, a projection database is constructed, and utility and period information of each sequence are stored by introducing a period utility list (PUL-list) structure, wherein the projection database comprises a utility and position information table, an index table and a period table.
Storing an item name, a utility of the item, a remaining utility of the item, and a next location of the item in the sequence in a utility and location information table;
the index table stores different items and the position of the item in the sequence for the first time;
the periodic table records the sequence of different item compositions and the sequence numbers of different sequences appearing in the standardized time series database.
Table 3 gives an example of a periodic effectiveness table.
S 1 Periodic effect meter
Figure BDA0003667082770000121
Periodic table of (ab)
<(ab)> <1,2,3,5,6>
TABLE 3
In the utility and location information tables, e.g. S 1 The first item a in the list, utility 1, residual utility 41, and the next position being empty indicates that a is no longer present in the list.
In the index table, a group of different items and the first appearance position in the sequence after conversion are stored, the number in the index table represents the first appearance position of the item in the sequence, and (a,1) represents that the first appearance position of the item a in the sequence is the first position.
In the periodic table, the sequence t is recorded<(ab)>As it occurs in the original database (i.e. the quantitative sequence database),<1,2,3,5,6>representing a sequence<(ab)>In quantitative sequence databases 1 、S 2 、S 3 、S 5 、S 6 Are present.
In said step 4, by the calculated i j A stable value of (a), (b)<i j >) (ii) a Determine maxla: (<i j >) Not more than maxLa, whether the mode is a stable mode or not; u (b)<i j >) And d, judging whether the model is a high utility model, wherein u (D) is the total utility of the database.
Pseudo code of SPHUSPM algorithm
(1) Algorithm 1: SPHUSPM
Figure BDA0003667082770000131
(2) Projection search algorithm
Inputting: prefix, PD (Prefix), SPHUSPs, periodic list of prefixes
Figure BDA0003667082770000132
Figure BDA0003667082770000141
Example two
The embodiment aims to provide a method for analyzing a high-utility sequence pattern in group purchase based on a time period, which comprises the following steps:
the acquisition module is respectively configured to acquire and preprocess group purchase order data of a customer to obtain a standardized time sequence database;
a computing module configured to: calculating the periodicity, stability value and utility value of the given sequence by using a high utility sequence algorithm;
a projection database generation module configured to: calculating the utility of the item in the sequence in the time sequence database, and generating a projection database based on the utility of the item, the position of the item in the sequence and the position of the sequence containing the item in the time sequence database;
a determination module configured to: judging whether the utility value of the given sequence is not less than the minimum utility threshold value or not and whether the stability value is not greater than a periodic stability threshold value or not;
a recommendation module configured to: and if so, searching for product forming recommendation information in the stability period sequence mode corresponding to the given sequence based on the projection database.
EXAMPLE III
It is an object of the present embodiment to provide a computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the program.
Example four
An object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
The steps involved in the apparatuses of the above second, third and fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present invention.
It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented using general purpose computing apparatus, or alternatively, they may be implemented using program code executable by computing apparatus, whereby the modules or steps may be stored in a memory device and executed by computing apparatus, or separately fabricated into individual integrated circuit modules, or multiple modules or steps thereof may be fabricated into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. A method for analyzing a high-utility sequence pattern in group purchase based on a time period is characterized by comprising the following steps:
acquiring group purchase order data of a customer and preprocessing the data to obtain a standardized time sequence database;
calculating the utility of the item and the residual utility of the item in the sequence in the time sequence database, and generating a projection database based on the utility of the item, the residual utility of the item, the position of the item in the sequence and the position of the sequence containing the item in the time sequence database;
calculating the periodicity, stability value and utility value of the given sequence by using a stability period high utility sequence algorithm;
judging whether the utility value of the given sequence is not less than the minimum utility threshold value or not and whether the stability value is not greater than a periodic stability threshold value or not;
and if so, searching based on the projection database to form recommended information of the products in the stability period sequence mode corresponding to the given sequence.
2. The method for analyzing the high-utility sequence pattern in the group purchase according to claim 1, wherein the preprocessing specifically comprises:
dividing the group purchase order data of the customers into different sequences according to different time periods;
dividing into different sets of items in a sequence according to purchases made by the customer at different times during the time period;
the items are separated into different items in an item set according to the type and quantity of products purchased by a customer at a point in time.
3. The method as claimed in claim 1, wherein the step of generating the projection database comprises:
the projection database comprises a utility and position information table, an index table and a periodic table;
the utility and position information table stores item names, the utility of the items, the residual utility of the items and the next position of the items in the sequence;
the index table stores different entries and the location of the entry in the sequence at which it first appears;
the periodic table records sequence numbers for different sequences that occur in a standardized time series database.
4. The method as claimed in claim 3, wherein the utility of the item is a product of an external utility and an internal utility, the internal utility is the quantity of the product in the item for group purchase, and the external utility is the price or profit of the product; the remaining utility of a term is the sum of the utilities of all terms after that term in the set of terms.
5. The method as claimed in claim 1, wherein the useless patterns are pruned in advance by using width pruning and depth pruning strategies when the projection database is generated.
6. The method for analyzing the high-utility sequence pattern in the group purchase based on the time period as claimed in claim 1, wherein the periodicity of the given sequence is calculated as:
Figure FDA0003667082760000021
maxper(t)=max(pes(t))
wherein alpha is 0 =0,α k+1 Where n, | s (t) is the number of elements in the s (t) set, max per (t) is the maximum period of the sequence t.
7. The method for analyzing high-utility sequence patterns in group buying based on time period as claimed in claim 1, wherein the stability value of said given sequence is calculated as:
la(t,i)=max(0,la(t,i-1)+pes(t,i)-maxper)
maxla(t)=max(la(t))
where t is a given sequence, maxper is a custom value, i is a sequence with different labels, la (t, -1) is 0, max (la (t)) is the stability of the sequence t, and maxla (t) is the maximum instability of the sequence t.
8. A high-utility sequence mode in group purchase analysis system based on time periods is characterized in that:
the acquisition module is respectively configured to acquire and preprocess group purchase order data of a customer to obtain a standardized time sequence database;
a computing module configured to: calculating the periodicity, stability value and utility value of the given sequence by using a high utility sequence algorithm;
a projection database generation module configured to: calculating the utility of the item in the sequence in the time sequence database, and generating a projection database based on the utility of the item and the position of the item in the sequence and the position of the sequence containing the item in the time sequence database;
a determination module configured to: judging whether the utility value of the given sequence is not less than the minimum utility threshold value and whether the stability value is not greater than a periodic stability threshold value;
a recommendation module configured to: and if so, searching based on the projection database to form recommended information of the products in the stability period sequence mode corresponding to the given sequence.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of a method for time-period-based analysis of high-utility sequence patterns in group buying as claimed in any of claims 1 to 7.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of a method for time period based analysis of high utility sequence patterns in group buying as claimed in any of claims 1 to 7.
CN202210590304.2A 2022-05-27 2022-05-27 Time period-based high-utility sequence mode analysis method and system in group purchase Pending CN114971794A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210590304.2A CN114971794A (en) 2022-05-27 2022-05-27 Time period-based high-utility sequence mode analysis method and system in group purchase

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210590304.2A CN114971794A (en) 2022-05-27 2022-05-27 Time period-based high-utility sequence mode analysis method and system in group purchase

Publications (1)

Publication Number Publication Date
CN114971794A true CN114971794A (en) 2022-08-30

Family

ID=82957291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210590304.2A Pending CN114971794A (en) 2022-05-27 2022-05-27 Time period-based high-utility sequence mode analysis method and system in group purchase

Country Status (1)

Country Link
CN (1) CN114971794A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115964415A (en) * 2023-03-16 2023-04-14 山东科技大学 Pre-HUSPM-based database sequence insertion processing method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115964415A (en) * 2023-03-16 2023-04-14 山东科技大学 Pre-HUSPM-based database sequence insertion processing method

Similar Documents

Publication Publication Date Title
Ren et al. Demand forecasting in retail operations for fashionable products: methods, practices, and real case study
US20230018311A1 (en) Systems and methods for quantity determinations without predicting out of stock events
Caro et al. Inventory management of a fast-fashion retail network
Mostard et al. Forecasting demand for single-period products: A case study in the apparel industry
JP6014515B2 (en) RECOMMENDATION INFORMATION PROVIDING SYSTEM, RECOMMENDATION INFORMATION GENERATION DEVICE, RECOMMENDATION INFORMATION PROVIDING METHOD, AND PROGRAM
CN102282551A (en) Automated decision support for pricing entertainment tickets
JPH1125169A (en) Correlation extraction method
US20110131079A1 (en) System and Method for Modeling by Customer Segments
CN110347924A (en) Fruits and vegetables market management system and fruit-vegetable information method for pushing
Dachyar et al. Loyalty improvement of Indonesian local brand fashion customer based on customer lifetime value (CLV) segmentation
CN116720928B (en) Artificial intelligence-based personalized accurate shopping guide method for electronic commerce
EP3400571A1 (en) Consumer decision tree generation system
Huang et al. Evaluation of the allocation performance in a fashion retail chain using data envelopment analysis
CN114971794A (en) Time period-based high-utility sequence mode analysis method and system in group purchase
Hsu et al. Optimal lot sizing for deteriorating items with expiration date
US20210312259A1 (en) Systems and methods for automatic product usage model training and prediction
CN111127074A (en) Data recommendation method
JP6143930B1 (en) Marketing support method, program, computer storage medium, and marketing support system
Bohl et al. A Shapley-value index for market basket analysis: Weighting Shapley’s value
JPH11259564A (en) Sales prediction supporting system
CN111768213B (en) User label weight evaluation method
CN110046920A (en) A kind of method and apparatus calculating life cycle of commodities length
Hung Using Cloud Services to Develop Marketing Information System Applications
Harish et al. Customer Segment Prediction on Retail Transactional Data Using K-Means and Markov Model.
Hlupić et al. Time series model for sales predictions in the wholesale industry

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination