CN110399406A - Excavate the method, apparatus and computer storage medium of global effective sequence pattern - Google Patents

Excavate the method, apparatus and computer storage medium of global effective sequence pattern Download PDF

Info

Publication number
CN110399406A
CN110399406A CN201910692048.6A CN201910692048A CN110399406A CN 110399406 A CN110399406 A CN 110399406A CN 201910692048 A CN201910692048 A CN 201910692048A CN 110399406 A CN110399406 A CN 110399406A
Authority
CN
China
Prior art keywords
sequence
utility
value
pattern
global
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910692048.6A
Other languages
Chinese (zh)
Other versions
CN110399406B (en
Inventor
林浚玮
李圆法
陈伟
王巨宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Tencent Technology Shenzhen Co Ltd
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd, Shenzhen Graduate School Harbin Institute of Technology filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910692048.6A priority Critical patent/CN110399406B/en
Publication of CN110399406A publication Critical patent/CN110399406A/en
Application granted granted Critical
Publication of CN110399406B publication Critical patent/CN110399406B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Present disclose provides a kind of method, apparatus and computer readable storage medium for excavating global effective sequence pattern.This method comprises: the first category in sequence database is determined, wherein the first category is the item that global sequence's weight value of utility is higher than first threshold;Determine the value of utility chained list of each sequence in sequence database;According to identified first category, at least one candidate global effective sequence pattern is excavated from sequence database and determines first set, and wherein first set includes the mark of the sequence of at least one candidate global effective sequence pattern, global effective sequence pattern including each candidate and value of utility of the global effective sequence pattern in corresponding sequence of each candidate;And value of utility chained list and first set according to each sequence, global effective sequence pattern is excavated from the global effective sequence pattern of at least one candidate.

Description

Excavate the method, apparatus and computer storage medium of global effective sequence pattern
Technical field
This disclosure relates to data processing field, and in particular, to a kind of method for excavating global effective sequence pattern, dress It sets and computer readable storage medium.
Background technique
Sequential mode mining is the important technology of the field of data mining.Sequential mode mining is for sequence database. Sequence database may include a plurality of sequence (being referred to as affairs (transaction)), wherein each sequence may include At least one item collection (itemset), each item collection includes at least one (item), and there are collating sequences between item collection. By taking the purchase data of supermarket as an example, certain user had purchased commodity a and commodity b at first day, had purchased commodity a and commodity within second day C, third day have purchased commodity b.Purchase data of the user in this period can be abstracted as a sequence: < [a b], [a c], [b] >, wherein a, b and c are items, and the item in [] constitutes an item collection, and multiple item collections, which are arranged in order, constitutes sequence.Effective What Sequential Pattern Mining Algorithm was excavated is the grouping of commodities that value of utility is higher than preset threshold, i.e. sequence pattern (pattern).Sequence Column mode is the ordered arrangement of different item collections.
During excavating effective mode, effective mode is searched by calculating the total utility value of entire database Process need more calculating, the excavation of effective sequence pattern is even more so.Therefore, effective sequential mode mining is than passing The effective mode excavation and Frequent Sequential Patterns of system excavate more complicated.Current distribution and parallel mode excavation concentration It is excavated in effective mode excavation and Frequent Sequential Patterns, for example, effective mode excavation can be carried out in Hadoop platform It is excavated with Frequent Sequential Patterns.Therefore, there is not yet distributed and parallel effective sequential mode mining method.
Summary of the invention
For this purpose, present disclose provides a kind of method, apparatus for excavating global effective sequence pattern and computer-readable depositing Storage media.
According to one aspect of the disclosure, a kind of method for excavating global effective sequence pattern is provided, comprising: The first category in sequence database is determined, wherein the first category is the item that global sequence's weight value of utility is higher than first threshold; Determine the value of utility chained list of each sequence in the sequence database;According to identified first category, from the sequence data Library excavates at least one candidate global effective sequence pattern and determines first set, wherein the first set includes described The mark of the sequence of at least one candidate global effective sequence pattern, global effective sequence pattern including each candidate And value of utility of the global effective sequence pattern of each candidate in corresponding sequence;And the value of utility according to each sequence Chained list and the first set excavate global effective sequence from least one described candidate global effective sequence pattern Mode.
According to the disclosure example, wherein the first category in the determining sequence database comprises determining that sequence Each global sequence's weight value of utility in database;And it is global sequence's weight value of utility is true higher than the item of first threshold It is set to the first category.
According to the disclosure example, wherein determining each in sequence database global sequence's weight value of utility packet It includes: determining this in the local sequence weights value of utility of each subregion of sequence database;And according to identified local sequence Column weight value of utility determines global sequence's weight value of utility of this.
According to the disclosure example, wherein local sequence weights of this in each subregion of the sequence database Value of utility is to be determined in the subregion according to the value of utility for the sequence for including this.
According to the disclosure example, wherein the value of utility chained list for determining each sequence in sequence database includes: root According to the position of each in the sequence value of utility and each item in the sequence, the value of utility chained list of the sequence is determined.
According to the disclosure example, wherein according to identified first category, from the sequence database excavate to A few candidate global effective sequence pattern includes: according to identified first category, from each point of sequence database Excavate local sequence pattern in area;And determine at least one candidate's with sequence pattern according to the local excavated Global effective sequence pattern.
According to the disclosure example, wherein according to identified first category, from each of described sequence database It includes: one for belonging to the first category in each sequence for including for the subregion that subregion, which excavates local sequence pattern, , value of utility and surplus utility value of this in each sequence are calculated, wherein the surplus utility value of this in one sequence It is all the sum of value of utilities in the sequence, after this;Construct effectiveness list of this in each sequence;According to this Effectiveness list of the item in each sequence determines the value of utility chain of this;According to each in subregion value of utility chain, from this Subregion excavates local sequence pattern.
According to the disclosure example, wherein according to the value of utility chained list of each sequence and the first set, from institute Stating at least one candidate global effective sequential mode mining overall situation effective sequence pattern includes: the effect according to each sequence With value chained list and the first set, the local value of utility of the global effective sequence pattern of each candidate is determined;According to each The local value of utility of candidate global effective sequence pattern determines the global effect of the global effective sequence pattern of each candidate With value;And the sequence pattern that global value of utility is greater than first threshold is determined as global effective sequence pattern.
According to the disclosure example, the above method further include: according to load-balancing algorithm by the sequence database In sequence be divided into multiple subregions.
According to another aspect of the present disclosure, it provides a kind of for excavating the device of global effective sequence pattern, comprising: First determination unit, the first category being configured to determine that in sequence database, wherein the first category is global sequence's weight effect It is higher than the item of first threshold with value;Second determination unit is configured to determine that the effectiveness of each sequence in the sequence database It is worth chained list;First excavates unit, is configured as excavating at least one from the sequence database according to identified first category Candidate global effective sequence pattern simultaneously determines first set, wherein the first set includes at least one candidate The mark of the sequence of global effective sequence pattern, global effective sequence pattern including each candidate and each candidate Value of utility of the global effective sequence pattern in corresponding sequence;And second excavate unit, be configured as according to each sequence Value of utility chained list and the first set, excavated from least one described candidate global effective sequence pattern global high Effectiveness sequence pattern.
According to the disclosure example, wherein first determination unit is configured to determine that in the sequence database Each global sequence's weight value of utility;And the item that global sequence's weight value of utility is higher than first threshold is determined as first Category.
According to the disclosure example, wherein second determination unit is configured to determine that each item in sequence data The local sequence weights value of utility of each subregion in library;And determine this 's according to identified local sequence weights value of utility Global sequence's weight value of utility.
According to the disclosure example, wherein local sequence weights effectiveness of this in each subregion of sequence database Value is to be determined in the subregion according to the value of utility for the sequence for including this.
According to the disclosure example, wherein second determination unit is configured as according to item each in each sequence Value of utility and each item position in the sequence, determine the value of utility chained list of the sequence.
According to the disclosure example, wherein the first excavation unit is configured as according to the identified first kind , local sequence pattern is excavated from each subregion of sequence database;And according to the local sequence excavated Column mode determines at least one candidate global effective sequence pattern.
According to the disclosure example, wherein the first excavation unit is configured as each of sequence database An item for belonging to the first category in each sequence that subregion includes calculates value of utility of this in each sequence and remaining effect With value, wherein the surplus utility value of this in one sequence is all the sum of value of utilities in the sequence, after this; Construct effectiveness list of this in each sequence;The value of utility of this is determined according to effectiveness list of this in each sequence Chain;According to each in subregion value of utility chain, local sequence pattern is excavated from the subregion.
According to the disclosure example, wherein the second excavation unit is configured as the value of utility according to each sequence Chained list and the first set determine the local value of utility of the global effective sequence pattern of each candidate;According to each candidate Global effective sequence pattern local value of utility, determine the global effectiveness of the global effective sequence pattern of each candidate Value;And the sequence pattern that global value of utility is greater than first threshold is determined as global effective sequence pattern.
According to the disclosure example, above-mentioned apparatus further includes load allocation unit, is configured as according to load balancing Sequence in sequence database is divided into multiple subregions by algorithm.
According to another aspect of the present disclosure, it provides a kind of for excavating the device of global effective sequence pattern, comprising: Processor;And memory, wherein be stored with computer executable program in the memory, executed when by the processor When the computer executable program, the above method is executed.
According to another aspect of the present disclosure, a kind of computer readable storage medium is provided, instruction is stored thereon with, it is described Instruction is when being executed by processor, so that the processor executes the above method.
Pass through the method, apparatus and computer-readable storage medium of the global effective sequence pattern of excavation that the disclosure provides Matter, it is determined that the value of utility chained list and first set of each sequence in sequence database, and dug according to both data structures Global effective sequence pattern is dug, the plenty of time is saved, accelerates the calculating for calculating global value of utility in sequence database Process accelerates excavation speed, reduces time complexity.
Detailed description of the invention
The embodiment of the present disclosure is described in more detail in conjunction with the accompanying drawings, the above-mentioned and other purpose of the disclosure, Feature and advantage will be apparent.Attached drawing is used to provide to further understand the embodiment of the present disclosure, and constitutes explanation A part of book is used to explain the disclosure together with the embodiment of the present disclosure, does not constitute the limitation to the disclosure.In the accompanying drawings, Identical reference label typically represents same parts or step.
Fig. 1 is the system architecture that global effective sequence pattern is excavated according to the slave sequence database of the embodiment of the present disclosure Schematic diagram.
Fig. 2 is the flow chart according to the method for excavating global effective sequence pattern of the embodiment of the present disclosure.
Fig. 3 shows the schematic diagram of effectiveness list of a in sequence s1.
Fig. 4 shows the schematic diagram of the value of utility chain of an a.
Fig. 5 is according to the global efficiently from least one candidate global effective sequential mode mining of the embodiment of the present disclosure With the flow chart of the method for sequence pattern.
Fig. 6 is the structural schematic diagram according to the device for excavating global effective sequence pattern of the embodiment of the present disclosure.
Fig. 7 shows the framework of the computer equipment according to the embodiment of the present disclosure.
Specific embodiment
In order to enable the purposes, technical schemes and advantages of the disclosure become apparent, root is described in detail below with reference to accompanying drawings According to the example embodiment of the disclosure.In the accompanying drawings, identical reference label indicates identical element from beginning to end.It is understood that The embodiments described herein is merely illustrative, and is not necessarily to be construed as limiting the scope of the present disclosure.
In the disclosure, when the value of utility of sequence pattern is higher, for example, when the value of utility of sequence pattern is higher than default threshold When value, which can be known as " effective sequence pattern ".That is, " effective sequence pattern " can be effectiveness Value is higher than the sequence pattern of preset threshold.Here " preset threshold " can be fixed and invariable, or can be calculated with excavating The change of the application scenarios of method and change.
The present disclosure proposes a kind of technical solutions of distributed and parallel effective sequential mode mining.In the disclosure In, by realizing distributed and parallel effective sequential mode mining based on the distributed computing framework of Hadoop platform. In mining process, using the value of utility chained list and first set of sequence each in sequence database, in Lai Baocun mining process Necessary information reduces time complexity to accelerate excavation speed." distributed computing framework " mentioned herein can be Mapping and (MapReduce) frame is concluded, wherein Map is a key assignments (key-value) to being mapped to a new key assignments Right, Reduce is the identical value of key assignments centering key to be integrated, while being mapped to new key-value pair.In addition, executing the mould of Map operation Block is properly termed as Mapper, and the module for executing Reduce operation is properly termed as Reducer.
Firstly, describing to excavate global effective sequence mould according to the slave sequence database of the embodiment of the present disclosure referring to Fig.1 The system architecture of formula (Global-High Utility Sequence Pattern, G-HUSP).Fig. 1 is implemented according to the disclosure The slave sequence database of example excavates the schematic diagram of the system architecture of global effective sequence pattern.As shown in Figure 1, system architecture 100 may include three parts, and respectively part 120 and integrated part 130 are excavated in identification division 110, part.Identification division 110 It may include multiple Mapper and multiple Reducer, such as n Mapper and n Reducer, wherein n is positive integer.Identification Part 110 is determined for the first category in sequence database, which is that global sequence's weight value of utility is higher than The item of first threshold.First category is possible to constitute the item of effective sequence pattern, therefore is referred to as promising item (promising item).It may include multiple Mapper and multiple Reducer, such as n Mapper that part 120 is excavated in part With n Reducer, wherein n is positive integer.Part 120 is excavated in part can be used for being dug according to the first category from sequence database Excavate local sequence pattern (Local-High Utility Sequence Pattern, L-HUSP).The local May may not be for global effective sequence pattern, another part sequence pattern with a part of sequence pattern in sequence pattern Global effective sequence pattern, then another part block can be used as candidate global effective sequence pattern.In addition, Part excavate part 120 can be also used for determine first set (sidset can be expressed as), the first set may include to The mark of the sequence of a few candidate global effective sequence pattern, global effective sequence pattern including each candidate with And value of utility of the global effective sequence pattern of each candidate in corresponding sequence.Further, it is also possible to determine sequence database In each sequence value of utility chained list.Integrated part 130 may include multiple Mapper and multiple Reducer, such as n Mapper and n Reducer, wherein n is positive integer.Integrated part 130 can be used for the value of utility chained list according to each sequence With the first set, global effective sequence mould is excavated from least one described candidate global effective sequence pattern Formula.By system architecture shown in FIG. 1, the value of utility chained list and first set of each sequence in sequence database can be used, It saves necessary information in mining process, to accelerate the excavation speed of effective sequence pattern, reduces time complexity.
It will be appreciated that this is only schematical although triphasic MapReduce is shown in FIG. 1.According to this Disclosed embodiment can also be the MapReduce in less or more stage.In addition, the MapReduce in each stage includes Mapper and the number of Reducer may be the same or different.In addition, the MapReduce of different phase includes The number of Mapper and/or Reducer may be the same or different.
Moreover, it should be understood that in the disclosure, " part " is for a subregion of database and " overall situation " It is to be directed to database generally speaking.For example, " the local sequence pattern " in the disclosure can be one from database The effective sequence pattern that a subregion is excavated is the sequence pattern of effective for the subregion;And it is " complete in the disclosure Pair office's effective sequence pattern " can be from multiple locals effective sequence pattern excavated in sequence pattern, i.e., Generally speaking database is the sequence pattern of effective.In another example " the local sequence weights value of utility " in the disclosure can be The value of utility determined according to the data in database subregion;And " the global effective sequence pattern " in the disclosure can be with It is the value of utility determined according to all data in database.
The excavation overall situation effective sequence pattern of system framework according to figure 1 is specifically described below in conjunction with Fig. 2 Method flow chart.Fig. 2 is the stream according to the method 200 for excavating global effective sequence pattern of the embodiment of the present disclosure Cheng Tu.As shown in Fig. 2, in step s 201, the first category in sequence database is determined, wherein the first category is global sequence Weight value of utility (Global Sequence Weight Utility, GSWU) is higher than the item of first threshold.
In the disclosure, sequence database may include multiple sequences and identification information corresponding with each sequence.In In the disclosure, sequence can be Quantitative Sequence (quantitative sequence).Identification information corresponding with each sequence can With referred to as sequence identifier (sequence id, sid).S can be usedlIndicate the sequence identifier of the l articles sequence, wherein l is positive whole Number.Each sequence may include one or more item collections, and each item collection may include one or more items.Each item has inside Value of utility and external value of utility.In the database of type of transaction, internal value of utility can be the number of transaction of item.At other In the database of scape, the form of internal value of utility can adjust accordingly.The table of each external value of utility in database of record Lattice are properly termed as external value of utility table.In the database of type of transaction, external value of utility table can be profit flow table, i.e., external effect It can recorde unit profit value every in database with value table.In the database of other scenes, the shape of external value of utility table Formula can adjust accordingly.
Table 1 below illustrates an examples of sequence database.As shown in table 1, sequence database is type of transaction Database comprising 5 sequences, respectively s1~s5.Every sequence by same customer different time purchase inventory group At each purchase inventory is item collection, and the commodity of purchase are item.For example, sequence s1 indicates that customer first buys 2 commodity a and 3 A commodity c, then 3 commodity a, 1 commodity b and 2 commodity c are bought, then buy 4 commodity a, 5 commodity b and 4 commodity D finally has purchased 3 commodity e.
sid Sequence
s1 <[(a: 2) (c: 3)], [(a: 3) (b: 1) (c: 2)], [(a: 4) (b: 5) (d: 4)], [(e: 3)]>
s2 <[(a: 1) (e: 3)], [(a: 5) (b: 3) (d: 2)], [(b: 2) (c: 1) (d: 4) (e: 3)]>
s3 <[(e: 2)], [(c: 2) (d: 3)], [(a: 3) (e: 3)], [(b: 4) (d: 5)]>
s4 <[(b: 2) (c: 3)], [(a: 5) (e: 1)], [(b: 4) (d: 3) (e: 5)]>
s5 <[(a: 4) (c: 3)], [(a: 2) (b: 5) (c: 2) (d: 4) (e: 3)]>
The example of 1 sequence database of table
Table 2 below shows an examples of external value of utility table.As shown in table 2, the profit of commodity a is 5, commodity b Profit be 3, the profit of commodity c is 4, and the profit of commodity d is 2, and the profit that the profit of commodity e is 1 and commodity f is 6.
a b c d e f
Profit 5 3 4 2 1 6
The example of 2 outside value of utility table of table
In step s 201, each in sequence database global sequence's weight value of utility can be determined, and will be global The item that sequence weights value of utility is higher than first threshold is determined as the first category.Step S201 can be by identification part described above Divide 110 (i.e. first stage MapReduce) Lai Jinhang.
The process of each in determining sequence database global sequence's weight value of utility is described below.According to the disclosure An example can determine this in each subregion of sequence database first each item in sequence database Local sequence weights value of utility (Local Sequence Weight Utility, LSWU), then according to identified local sequence Column weight value of utility determines global sequence's weight value of utility of this.
For example, sequence database can be divided into multiple subregions first, and multiple subregion is distributed into the first stage Multiple Mapper in MapReduce.For example, sequence database can be divided into n subregion, and the 1st subregion is divided Mapper 1 in dispensing first stage MapReduce ..., k-th of subregion is distributed in first stage MapReduce Mapper k ..., n-th of subregion is distributed into the Mapper n in first stage MapReduce, wherein 1≤k≤n It and is positive integer.
Then, for each sequence in k-th of subregion, Mapper k can determine the value of utility of the sequence.For example, Mapper k can determine the value of utility of the sequence according to the method for the value of utility of traditional sequence of calculation.For example, sequence Value of utility can for form the sequence each item collection value of utility in the sequence adduction.In the disclosure, sequence sl's Value of utility can be expressed as u (sl)。
Then, for each item in the sequence, key-value pair is can be generated in Mapper k, and the key-value pair can be by this Item and the value of utility of the sequence are constituted.For example, for sequence slIn item i, Mapper k key-value pair (i, u (s can be generatedl))。 It can be seen that the content of the sequence identifier of sequence and the sequence can be used as a key-value pair input Mapper k, then, The one or more new key-value pairs of Mapper k output.
Further, since the different sequences in each subregion may include the same item, therefore, in these different sequences The same item, multiple key-value pairs can be generated in Mapper.In this case, a combination can be configured to each Mapper Module (such as being properly termed as combiner), with the same item of determination each subregion local sequence weights value of utility.Specifically Ground, this can be in the local sequence weights value of utility of each subregion of sequence database according to including this in the subregion What the value of utility of sequence determined.For example, this can be in the local sequence weights value of utility of each subregion of sequence database It include the sum of the value of utility of sequence of this in the subregion.In this way, it is possible to reduce Reducer which will be described Workload, to reduce the requirement to communications cost and haulage time.For example, can by following formula (1) come Determine item i in the local sequence weights value of utility of k-th of subregion of sequence database:
Wherein, i indicates item, DkIndicate that k-th of subregion of sequence database, s indicate the sequence including this, u (s) is indicated The value of utility of sequence.
It is weighed below with a specific example to describe a determining item in the local sequence of a subregion of sequence database The process of weight value of utility.For example, k-th of subregion in sequence database includes sequence s1With sequence s2Example in, Mapper k It can determine sequence s1With with sequence s2Value of utility be respectively u (s1) and u (s2).Then, for sequence s1In each item, i.e., Key-value pair (a, u (s can be generated in item a, item b, item c, item d and item e, Mapper k1)), (b, u (s1)), (c, u (s1)), (d, u (s1)), (e, u (s1)).For sequence s2In each item, i.e. key can be generated in item a, item b, item c, item d and item e, Mapper k Value is to (a, u (s2)), (b, u (s2)), (c, u (s2)), (d, u (s2)), (e, u (s2)).Therefore, for item a, there are two key assignments It is right, i.e. (a, u (s1)) and (a, u (s2)).The two key-value pairs of item a can also be expressed as (a, lu), wherein luBeing includes u (s1) With u (s2) set.Then, composite module is to set luIn element summation, i.e. u (s1)+u(s2), to obtain item a at k-th The local sequence weights value of utility LSWU of subregiona-k=u (s1)+u(s2).Similarly, a b, item c, item d and item e can be obtained to exist The local sequence weights value of utility of k-th of subregion.
It can be seen that the key-value pair of Mapper k output can be used as the defeated of composite module corresponding with the Mapper k Enter, then the composite module generates new key-value pair.The new key-value pair can by item and this k-th of subregion local sequence Column weight value of utility is constituted.For example, for item i, composite module corresponding with Mapper k can be generated key-value pair (i, LSWUi-k).In the example that item i is item a, composite module corresponding with Mapper k can export key-value pair (a, LSWUa-k)。
By mode above, each item can be determined in the local sequence weights effectiveness of each subregion of sequence database Value.Determined each item after the local sequence weights value of utility of each subregion of sequence database, can according to really Fixed local sequence weights value of utility determines global sequence's weight value of utility of this.For example, can be by each item in sequence number Global sequence's weight value of utility according to the sum of the local sequence weights value of utility of each subregion in library, as this.
Specifically, can by the output of multiple composite modules, the identical key-value pair of key assignments be input to the first stage In a Reducer in MapReduce.That is, by the output of multiple composite modules, it is corresponding with the same item more A key-value pair, for example, multiple key-value pair (i, LSWU corresponding with item ii-k), it is input in a Reducer.The Reducer can With global sequence's weight value of utility by the adduction of the local sequence weights value of utility in this multiple key-value pair, as item i GSWUi.For example, can determine global sequence's weight value of utility of an i by following formula (2):
Wherein, GSWU (i, D) indicates global sequence weight value of utility of the item i in sequence database D, DkIndicate sequence number According to k-th of subregion in library, LSWU (i, Dk) indicate item i in the local sequence weights value of utility of k-th of subregion of sequence database.
So far, it has been described that determine the process of each in sequence database global sequence's weight value of utility.True It is each in first stage MapReduce after having determined each in sequence database global sequence's weight value of utility The item that global sequence's weight value of utility is greater than or equal to first threshold can be determined as the first category by Reducer, and be abandoned complete Office's sequence weights value of utility is less than the item of first threshold.Each Reducer can export one or more new key-value pairs, wherein Each new key-value pair can be made of global sequence's weight value of utility of first category and first category.For example, working as When item i is the first category, some Reducer can export key-value pair (i, GSWUi)。
" first threshold " as described herein can be the total utility value and threshold factor according to database and determination.Example Such as, " first threshold " can be the total utility value of database and the product of threshold factor.It can be according to traditional calculating database The method of total utility value determine the total utility value of database.For example, the total utility value of database can be in database respectively The adduction of the value of utility of a affairs.The total utility value of database can be expressed as u (D).Threshold factor can be it is pre-set, It can be expressed as δ.Therefore, first threshold can be expressed as δ × u (D).
By step S201, the item for being hopeful to constitute effective sequence pattern can be identified.Unrecognized item out can To be dropped, and no longer need to consider.By step S201, the space for searching for effective sequence pattern is searched than original Rope space reduces very much, to improve search speed, accelerates excavation speed.
It returns to Fig. 2 and determines the value of utility chained list of each sequence in sequence database in step S202.Step S202 can To be executed before or after step S201, synchronous with step S201 can also execute.
It can be according to each in the sequence for a sequence in sequence database according to the disclosure example The position of the value of utility and each item of item in the sequence, determines the value of utility chained list of the sequence.The value of utility of item can be item Inside value of utility and external value of utility product.The position of each item in the sequence may include each initial position and Adjacent position, the initial position of middle term can be the position that item occurs for the first time in the sequence, and adjacent position can be item and exist The position occurred next time in sequence.In addition, the value of utility chained list of a sequence may include two rows, wherein the first row be can be Information about each value of utility and adjacent position (can be referred to as Utility Position Information, UP Information), the second row can be (can be referred to as about the information of the initial position of the non-duplicate item in sequence Header Table).Second row may include the initial position of non-duplicate item and each non-duplicate item.
Table 3 below shows the sequence s in table 11Value of utility chained list.As shown in table 3, sequence s1Value of utility chained list Including two rows, the first row shows sequence s1In each a, b, c, d, e value of utility and adjacent position, the second row show sequence Arrange s1In each a, b, c, d, e initial position.Specifically, " a " in the element in the first row (a, 10,3) indicates sequence s1 In the 1st item, " 10 " indicate item a in sequence s1In value of utility be 10, " 3 " indicate item a in sequence s1In next time occur Position." c " in element (c, 8, -) in the first row indicates sequence s1In the 5th item, " 8 " indicate item c in sequence s1In Value of utility is 8, and "-" indicates item c in sequence s1In there is no next position." a " in element (a, 1) in second row indicates sequence Arrange s1In item, " 1 " indicate item a in sequence s1In initial position.
3 sequence s of table1Value of utility chained list example
It is appreciated that the value of utility chained list of sequence is by the way that the sequence in raw data base is converted and extended and shape At, it has recorded the information about raw data base and needs public information calculated.By the value of utility chained list of sequence, The calculating speed of sequence pattern can be improved.This is because, target sequence mode may have multiple occurrences in single affairs, Therefore, the value of utility for calculating sequence pattern in affairs requires to look up all occurrences, then takes maximum utility value.Value of utility chained list The next position for having recorded affairs middle term, therefore, there is no need to Multiple-Scan affairs, as long as and next position of continuous search terms Set the maximum utility value that can calculate sequence pattern in affairs.
It returns to Fig. 2 and, according to identified first category, excavates at least one time from sequence database in step S203 The global effective sequence pattern of choosing simultaneously determines first set, wherein the first set includes the complete of at least one candidate The mark of the sequence of office's effective sequence pattern, global effective sequence pattern including each candidate and each candidate's is complete Value of utility of office's effective sequence pattern in corresponding sequence.Step S203 can excavate part by part described above 120 (i.e. second stage MapReduce) Lai Jinhang.
The sequence in sequence database can be assigned to before executing step S203 according to the disclosure example In multiple tasks (task).The quantity of task can be expressed as m, and wherein m is positive integer.For example, m can be second stage The multiple of the quantity of Mapper in MapReduce.In following example, it is equal in second stage MapReduce with m The disclosure is described for the quantity of Mapper.
In this example, the sequence in sequence database can be divided by multiple subregions according to load-balancing algorithm.Example Such as, the sequence in sequence database can be assigned in multiple tasks according to load-balancing algorithm.Specifically, for sequence number According to a sequence in library, the quantity (Num) for the first category that the sequence includes can be determined.Then, it is selected from multiple tasks The task p with minimum workload is selected, and the sequence is assigned to task p, while the first kind for including according to the sequence Quantity update the workload of task p.For example, the workload of p-th of task can be expressed as WLp, when one After sequence is assigned to the task, the workload of the task is by WLpIt is updated to (WLp+Num)。
In addition, in this example, the workload of each task can be initialized as 0 by algorithm.Therefore, the of algorithm In an iteration, since the workload of each task is 0, for a sequence in sequence database, Ke Yicong A task is randomly choosed in multiple tasks, and the sequence is assigned to the task.For example, can be selected from multiple tasks 1 task, and the sequence is assigned to the 1st task.
In addition, " task " as described herein is referred to as assignment file (task file).Hereinafter, it can replace Ground uses task and assignment file.
Cause the workload between node unbalanced by above-mentioned load-balancing algorithm, when can be to avoid partition database And mining algorithm is influenced, so that the workload between each node is balanced, to effectively improve the speed excavated and calculated.
Step S203 may include three sub-steps S2031~S2033.It, can be according to identified in step S2031 First category excavates local sequence pattern from each subregion of sequence database.It then, can be in step S2032 At least one candidate global effective sequence pattern is determined with sequence pattern according to the local excavated.Then, in step In rapid S2033, first set can be determined.Step S2033 can also be performed simultaneously with step S2033.
In the disclosure, local sequence pattern can be excavated from each task according to identified first category. A part of sequence pattern in these local sequence patterns may be global effective sequence pattern, another part sequence Mode may not be global effective sequence pattern.It can be using another part sequence pattern as candidate global effective sequence Column mode.
It will be described below in step S2031 and excavate local sequence pattern from each subregion of sequence database Process.Specifically, the item that the first category is belonged in each sequence for including for each subregion, calculates this in each sequence Value of utility and surplus utility value in column construct effectiveness list (utility list) of this in each sequence, according to this Effectiveness list of the item in each sequence determines the value of utility chain of this;According to each in subregion value of utility chain (utility chain) excavates local sequence pattern from the subregion.
In the disclosure, the surplus utility value of item in one sequence can be all items in the sequence, after this The sum of value of utility.In addition, the identification information that the effectiveness list of item in one sequence may include sequence (is represented by Sid), the identification information (being represented by tid) of each item collection where item, the effectiveness of this in each item collection in the sequence Value (being represented by acu) and surplus utility value (being represented by ru) and the instruction letter that next item collection is directed toward from an item collection It ceases (for example, pointer) (being represented by next).In addition, the value of utility chain of item may include effectiveness list of the item in each sequence.
An example of the effectiveness list of item in one sequence is given below.Assuming that a subregion includes shown in table 1 Sequence s1~s5, item a belongs to the first category, then for sequence s1, can determine that the identification information of sequence is 1.In addition, item a occurs In sequence s1The 1st item collection, accordingly, it is determined that the item a in the 1st item collection is in sequence s1In value of utility and surplus utility value, point It Wei 10 and 84.Since item a also appears in sequence s1The 2nd item collection, accordingly, it is determined that the item a in the 2nd item collection is in sequence s1 In value of utility and surplus utility value, respectively 15 and 57.Since item a also appears in sequence s1The 3rd item collection, therefore, really Item a in fixed 3rd item collection is in sequence s1In value of utility and surplus utility value, respectively 20 and 26.Therefore, item can be constructed A is in sequence s1In effectiveness list.Fig. 3 shows an a in sequence s1In effectiveness list schematic diagram.As shown in figure 3, first The 1st " 1 " expression sequence s in group data (1,1,10,84)1, the 2nd " 1 " expression sequence s1The 1st item collection, " 10 " indicate Item a in 1st item collection is in sequence s1In value of utility, " 84 " indicate the 1st item collection in item a in sequence s1In remaining effect With value." 1 " in second group of data (1,2,15,57) indicates sequence s1, " 2 " indicate sequence s1The 2nd item collection, " 15 " indicate Item a in 2nd item collection is in sequence s1In value of utility, " 57 " indicate the 2nd item collection in item a in sequence s1In remaining effect With value." 1 " in third group data (1,3,20,26) indicates sequence s1, " 3 " indicate sequence s1The 3rd item collection, " 20 " indicate Item a in 3rd item collection is in sequence s1In value of utility, " 26 " indicate the 3rd item collection in item a in sequence s1In remaining effect With value.Black arrow in Fig. 3 indicates the pointer that next item collection is directed toward from an item collection.
An example of the value of utility chain of item is given below.In the above example, similarly, an a can be determined in sequence Arrange s2~s5In effectiveness list.It is then possible to according to item a in sequence s1~s5In effectiveness list determine the value of utility of item a Chain.Fig. 4 shows the schematic diagram of the value of utility chain of an a.As shown in figure 4, the value of utility chain of item a includes item a in sequence s1In Effectiveness list, item a are in sequence s2In effectiveness list, item a is in sequence s3In effectiveness list, item a is in sequence s4In effectiveness column Table and item a are in sequence s5In effectiveness list.
Similarly, each value of utility for belonging to the first category in each sequence that each subregion includes can be determined Chain.It is then possible to excavate local sequence pattern from the subregion according to each in subregion value of utility chain.For example, Can using in the subregion each item and each value of utility chain as traditional effective sequence pattern algorithm (for example, HUS-Span algorithm) input, and one or more local sequence moulds corresponding with the subregion are exported by the algorithm Formula.In addition, value of utility and sequence of each local sequence pattern in corresponding sequence can also be exported by the algorithm The identification information of column.The output of the algorithm can be expressed as to key-value pair (pattern, { sid, utility }), wherein Pattern indicates that local sequence pattern, sid indicate the mark of the sequence comprising local sequence pattern, Utility indicates value of utility of the local sequence pattern in corresponding sequence.
The above-mentioned operation about step S2031 can be carried out by the Mapper in second stage MapReduce.For example, Multiple subregions of sequence database can be handled by multiple Mapper in second stage MapReduce respectively, thus each Mapper can excavate local sequence pattern from corresponding subregion.In this case, calculation described above Method exports the output that can be Mapper.That is, for a subregion of sequence database, it is corresponding with the subregion The output of Mapper is one or more key-value pairs (pattern, { sid, utility }), wherein one or more pattern It is the one or more local sequence patterns excavated from the subregion.
After step S2031, in step S2032, it can be determined according to the local excavated sequence pattern At least one candidate global effective sequence pattern.For example, can by the output of multiple Mapper, the identical key assignments of key assignments To in a Reducer being input in second stage MapReduce.That is, by the output of multiple Mapper and same The corresponding multiple key-value pairs of one pattern, for example, multiple key-value pairs corresponding with pattern x (pattern x, sid, Utility }), it is input in a Reducer.The Reducer can determine multiple value of utilities corresponding with pattern x Adduction, and according to the adduction and first threshold, to determine whether pattern x is global effective sequence pattern.If should add Be greater than or equal to first threshold, it is determined that pattern x is global effective sequence pattern.If the adduction is less than the first threshold Value, it is determined that the not global effective sequence pattern of pattern x, candidate global effective sequence pattern.
In addition, each Reducer can be by the one or more new key-value pairs of output, each new key-value pair can be by one The global effective mode of a candidate, the mark of sequence corresponding with the effective sequence pattern of the candidate and the candidate The value of utility of effective sequence pattern in the sequence is constituted.For example, the new key-value pair can be expressed as (sid, (pattern, Utility)), that is, the form of the key-value pair of Mapper output is had changed.
Can according to multiple Reducer in second stage MapReduce output come determine in step S2032 " extremely A few candidate global effective sequence pattern ".For example, can be according to multiple Reducer in second stage MapReduce Sequence pattern in the key-value pair of output determines " at least one candidate global effective sequence mould in step S2032 Formula ".For example, the output of multiple Reducer can be (s1, (pattern 1, utility 1)), (s2, (pattern 1, utility 1))、(s3, (pattern 2, utility 2)), (s3, (pattern 1, utility 1)), (s4, (pattern 2, utility 2)), then " at least one the candidate global effective sequence pattern " in step S2032 can be pattern 1 and pattern 2.
In addition, in step S2033, can determine first set after step S2032.For example, can be according to second The output of multiple Reducer in stage MapReduce, determines first set.First set may include it is described at least one The mark of the sequence of candidate global effective sequence pattern, the global effective sequence pattern including each candidate and each Value of utility of the candidate global effective sequence pattern in corresponding sequence.For example, first set may include multiple subclass, Each subclass includes that the mark of sequence, the sequence candidate global effective sequence pattern and the sequence for including are wrapped The value of utility of the candidate global effective sequence pattern included in the sequence.For example, multiple in second stage MapReduce The output of Reducer can be (s1, (pattern 1, utility 1)), (s2, (pattern 1, utility 1)), (s3, (pattern 2, utility 2)), (s3, (pattern 1, utility 1)), (s4, (pattern 2, utility 2)), Then first set may include four subclass, wherein the 1st subclass is (s1, (pattern 1, utility 1)), the 2nd A subclass is (s2, (pattern 1, utility 1)), the 3rd subclass is (S3, (pattern 2, utility 2), (pattern 1, utility 1)), the 4th subclass is (s4, (pattern 2, utility 2).
It is appreciated that candidate global effective sequence pattern can be accelerated by this data structure of first set The calculating of value of utility.Specifically, if sequence includes a candidate global effective sequence pattern, then the overall situation of the candidate The value of utility of effective sequence pattern can be obtained directly from first set, without calculating its value of utility again, because Computing repeatedly can take a lot of time.
In the above example, corresponding composite module is not configured for the Mapper in second stage MapReduce.So And the present disclosure is not limited thereto.For example, it is also possible to configure corresponding composite module for the Mapper in second stage MapReduce.
Return to Fig. 2, in step S204, according to the value of utility chained list of each sequence and the first set, from it is described to A few candidate global effective sequential mode mining overall situation effective sequence pattern.Step S204 can be by describe above Integrated part 130 (i.e. phase III MapReduce) Lai Jinhang.
Step S204 is specifically described below in conjunction with Fig. 5.Fig. 5 is according to the candidate from least one of the embodiment of the present disclosure Global effective sequential mode mining overall situation effective sequence pattern method 500 flow chart.As shown in figure 5, in step In S501, can value of utility chained list according to each sequence and the first set, determine the global effective sequence of each candidate The local value of utility of column mode.
It specifically, can be using at least one candidate global effective sequence pattern and first set as the phase III The input of multiple Mapper in MapReduce.For example, at least one candidate global effective sequence pattern can be divided For multiple groups, then multiple groups are inputted to multiple Mapper respectively.Furthermore, it is possible to which first set is inputted each Mapper.
Then, each Mapper can determine each of the global effective sequence pattern of corresponding multiple candidates The value of utility of candidate global effective sequence pattern.For example, high for multiple candidate overall situations corresponding with a Mapper A candidate global effective sequence pattern in effectiveness sequence pattern, can judge whether first set wraps by Mapper Include the global effective sequence pattern of the candidate.It, can be with when first set includes the global effective sequence pattern of the candidate The value of utility of the global effective sequence pattern of the candidate is determined according to first set.In addition, when first set does not include the time When the global effective sequence pattern of choosing, the global effective sequence of the candidate can be determined according to the value of utility chained list of sequence The value of utility of mode.
This is because, inquiry can be passed through when having calculated the value of utility of candidate global effective sequence pattern The sidset of the sequence of global effective sequence pattern including the candidate and the global effective sequence for directly obtaining the candidate The value of utility of mode.However, needing to check that it is when not calculating the value of utility of candidate global effective sequence pattern It is no to occur in particular sequence.If there is this situation, need to calculate candidate global effective sequence according to the particular sequence The value of utility of mode.It should be noted that the calculating of the operation is time-consuming, since it is desired that scanning the particular sequence, and wait There may be multiple matchings in the particular sequence for the global effective sequence pattern of choosing.Therefore, it is necessary to Multiple-Scan, this is specific Sequence, to find value of utility of the maximum matching as candidate global effective sequence pattern in the particular sequence.Therefore, it Complete mining task, it is necessary to take multiple scan to entire sequence database.The disclosure propose sequence value of utility chained list, be A compact data structure is suitable for processing big data problem.
It describes to determine candidate global effective sequence according to the value of utility chained list of sequence below in conjunction with specific example The example of the value of utility of column mode.For example, can the sequence s according to shown by above-mentioned table 21Value of utility chained list determine candidate Global effective sequence pattern<[a, c], b>value of utility.Specifically, since item a and item c is in same item collection, it can To find the position that all a, c occur according to the position of the appearance of item c, i.e. first position (1,2) and value of utility are 22, and Second position (3,5) and value of utility are 23.For first position (1,2) that item a, c meet, it can find what b met All positions, i.e., 4 and 7, then it can be 22+3=25 and 22+15=37 with the value of utility of computational item a, c, b altogether.For Second position (3,5) that a, c meet, can find all positions that a b meets, i.e., and 7, then it can be closed with computational item a, c, b The value of utility to get up is 23+15=38.Therefore, sequence pattern<[a, c], b>value of utility be max { 25,37,38 }=38.
In the disclosure, each Mapper in phase III MapReduce can export one or more new key assignments It is right, wherein each new key-value pair can be made of a candidate global effective sequence pattern and its value of utility.For example, should New key-value pair can be expressed as (pattern, utility).
In addition, the same Mapper may export multiple keys corresponding with the global effective sequence pattern of the same candidate Value is to (pattern, utility).For example, for candidate global effective sequence pattern pattern y, the same Mapper Two key-value pairs may be exported, respectively (pattern y, utility 1) and (pattern y, utility 2).The two Key-value pair can also be expressed as (pattern y, Gu), wherein GuIt is the set for including utility 1 and utility 2.
In addition, in this case, a composite module can be configured to each Mapper and (such as is properly termed as Combiner), with the local value of utility of the global effective sequence pattern of the same candidate of determination.Specifically, the same candidate The local value of utility of global effective sequence pattern can be according to corresponding with the global effective sequence pattern of the candidate What the value of utility in multiple key-value pairs determined.For example, the local value of utility of the global effective sequence pattern of the same candidate can To be the adduction of value of utility in multiple key-value pairs corresponding with the global effective sequence pattern of the candidate.For example, for waiting The global effective sequence pattern pattern y of choosing, the same Mapper may export two key-value pairs, respectively (pattern Y, utility 1) and (pattern y, utility 2), then for candidate global effective sequence pattern pattern y's Local value of utility local~utility is (utility 1+utility 2).
In the disclosure, composite module can also export one or more new key-value pairs, wherein each new key-value pair It can be made of a candidate global effective sequence pattern and its local value of utility.For example, the new key-value pair can be with table It is shown as (pattern, local-utility).It, should in the example that candidate global effective sequence pattern is pattern y Integrated mode can export key-value pair (pattern y, utility 1+utility 2).
Returning to Fig. 5 can be according to the local effectiveness of the global effective sequence pattern of each candidate in step S502 Value, determines the global value of utility of the global effective sequence pattern of each candidate.For example, for the global effective of each candidate Sequence pattern, can be according to multiple local value of utilities of the global effective sequence pattern of the candidate, to determine that the candidate's is complete The global value of utility of office's effective sequence pattern.For example, can be by multiple parts of the global effective sequence pattern of the candidate The adduction of value of utility, the global value of utility of the global effective sequence pattern as the candidate.
Specifically, can by the output of multiple composite modules, the identical key-value pair of key assignments be input to the phase III In a Reducer in MapReduce.That is, by the overall situation in the output of multiple composite modules, with the same candidate The corresponding multiple key-value pairs of effective sequence pattern, for example, corresponding with candidate global effective sequence pattern pattern y Multiple key-value pairs are input in a Reducer.The Reducer can adding the local value of utility in this multiple key-value pair With the global value of utility (global-utility) of the global effective sequence pattern as candidate.
Then, in step S503, the sequence pattern that global value of utility is greater than first threshold can be determined as to global height Effectiveness sequence pattern.For example, global value of utility can be greater than or equal to by each Reducer in phase III MapReduce The sequence pattern of first threshold is determined as global effective sequence pattern.Each Reducer can export one or more new Key-value pair, wherein each new key-value pair can be by a global effective sequence pattern and the overall situation effective sequence pattern Global value of utility is constituted.For example, some Reducer can export key assignments when pattern y is global effective sequence pattern To (pattern y, global-utility).Therefore, the key assignments of each Reducer output in phase III MapReduce The sequence pattern of centering is global effective sequence pattern.
The method of the global effective sequence pattern of the excavation provided through this embodiment, it is determined that each in sequence database The value of utility chained list and first set of sequence, and global effective sequence pattern is excavated according to both data structures, it saves Plenty of time accelerates the calculating process for calculating global value of utility in sequence database, accelerates excavation speed, reduce Time complexity.
Hereinafter, describing the device corresponding with method shown in Fig. 2 according to the embodiment of the present disclosure referring to Fig. 6.Fig. 6 is shown According to the structural schematic diagram of the device 600 for excavating global effective sequence pattern of the embodiment of the present disclosure.Due to device 600 function is identical as the details of method described above with reference to Fig. 2, therefore herein for simplicity, omission pair The detailed description of identical content.As shown in fig. 6, device 600 includes: the first determination unit 610, it is configured to determine that sequence data The first category in library, wherein the first category is the item that global sequence's weight value of utility is higher than first threshold;Second determination unit 620, it is configured to determine that the value of utility chained list of each sequence in sequence database;First excavates unit 630, is configured as basis Identified first category excavates at least one candidate global effective sequence pattern from sequence database and determines the first collection It closes, wherein the first set includes at least one described candidate global effective sequence pattern, including the complete of each candidate The effect of the mark of the sequence of office's effective sequence pattern and the global effective sequence pattern of each candidate in corresponding sequence With value;And second excavate unit 640, the value of utility chained list and the first set according to each sequence are configured as, from institute It states and excavates global effective sequence pattern at least one candidate global effective sequence pattern.In addition to this four units with Outside, device 600 can also include other component, however, since these components are unrelated with the content of the embodiment of the present disclosure, Here its diagram and description are omitted.
First determination unit 610 can determine each in sequence database global sequence's weight value of utility, and will be complete The item that office's sequence weights value of utility is higher than first threshold is determined as the first category.First determination unit 610 can be is retouched above The identification division 110 (i.e. first stage MapReduce) stated.
The first determination unit 610 is described below and determines each global sequence's weight value of utility in sequence database Process.According to the disclosure example, for each item in sequence database, the first determination unit 610 can first really Local sequence weights value of utility (Local Sequence Weight of this fixed in each subregion of sequence database Utility, LSWU), global sequence's weight value of utility of this is then determined according to identified local sequence weights value of utility.
For example, firstly, sequence database can be divided into multiple subregions by the first determination unit 610, and by multiple point Distinguish multiple Mapper in dispensing first stage MapReduce.For example, sequence database can be divided into n subregion, and And the 1st subregion is distributed into the Mapper 1 in first stage MapReduce ..., k-th of subregion is distributed to first Mapper k in stage MapReduce ..., n-th of subregion is distributed to the Mapper in first stage MapReduce N, wherein 1≤k≤n and be positive integer.
Then, for each sequence in k-th of subregion, Mapper k can determine the value of utility of the sequence.For example, Mapper k can determine the value of utility of the sequence according to the method for the value of utility of traditional sequence of calculation.For example, sequence Value of utility can for form the sequence each item collection value of utility in the sequence adduction.In the disclosure, sequence sl's Value of utility can be expressed as u (sl)。
Then, for each item in the sequence, key-value pair is can be generated in Mapper k, and the key-value pair can be by this Item and the value of utility of the sequence are constituted.For example, for sequence slIn item i, Mapper k key-value pair (i, u (s can be generatedl))。 It can be seen that the content of the sequence identifier of sequence and the sequence can be used as a key-value pair input Mapper k, then, The one or more new key-value pairs of Mapper k output.
Further, since the different sequences in each subregion may include the same item, therefore, in these different sequences The same item, multiple key-value pairs can be generated in Mapper.In this case, a combination can be configured to each Mapper Module (such as being properly termed as combiner), with the same item of determination each subregion local sequence weights value of utility.Specifically Ground, this can be in the local sequence weights value of utility of each subregion of sequence database according to including this in the subregion What the value of utility of sequence determined.For example, this can be in the local sequence weights value of utility of each subregion of sequence database It include the sum of the value of utility of sequence of this in the subregion.
It can be seen that the key-value pair of Mapper k output can be used as the defeated of composite module corresponding with the Mapper k Enter, then the composite module generates new key-value pair.The new key-value pair can by item and this k-th of subregion local sequence Column weight value of utility is constituted.For example, for item i, composite module corresponding with Mapper k can be generated key-value pair (i, LSWUi-k).In the example that item i is item a, composite module corresponding with Mapper k can export key-value pair (a, LSWUa-k)。
By mode above, first determination unit 610 can determine each item in each subregion of sequence database Local sequence weights value of utility.Determined each item each subregion of sequence database local sequence weights value of utility it Afterwards, the first determination unit 610 can determine that global sequence's weight of this is imitated according to identified local sequence weights value of utility With value.For example, can by each item sequence database the sum of the local sequence weights value of utility of each subregion, as this Global sequence's weight value of utility.
Specifically, can by the output of multiple composite modules, the identical key-value pair of key assignments be input to the first stage In a Reducer in MapReduce.That is, by the output of multiple composite modules, it is corresponding with the same item more A key-value pair, for example, multiple key-value pair (i, LSWU corresponding with item ii-k), it is input in a Reducer.The Reducer can With global sequence's weight value of utility by the adduction of the local sequence weights value of utility in this multiple key-value pair, as item i GSWUi
So far, it has been described that determine the process of each in sequence database global sequence's weight value of utility.True It is each in first stage MapReduce after having determined each in sequence database global sequence's weight value of utility The item that global sequence's weight value of utility is greater than or equal to first threshold can be determined as the first category by Reducer.Each Reducer can export one or more new key-value pairs, wherein each new key-value pair can be by first category and this Global sequence's weight value of utility of first category is constituted.For example, some Reducer can be with run-out key when item i is the first category Value is to (i, GSWUi)。
According to the disclosure example, for a sequence in sequence database, the second determination unit 620 can root According to the position of each in the sequence value of utility and each item in the sequence, the value of utility chained list of the sequence is determined. Value of utility can be the inside value of utility of item and the product of external value of utility.The position of each item in the sequence may include each The initial position of item and adjacent position, the initial position of middle term can be the position that item occurs for the first time in the sequence, adjacent Position can be the position that item occurs next time in the sequence.In addition, the value of utility chained list of a sequence may include two rows, Middle the first row, which can be, (can be referred to as Utility Position about each value of utility and the information of adjacent position Information, UP information), the second row can be the information of the initial position about the non-duplicate item in sequence (Header Table can be referred to as).Second row may include the initial position of non-duplicate item and each non-duplicate item.
It is appreciated that the value of utility chained list of sequence is by the way that the sequence in raw data base is converted and extended and shape At, it has recorded the information about raw data base and needs public information calculated.By the value of utility chained list of sequence, The calculating speed of sequence pattern can be improved.This is because, target sequence mode may have multiple occurrences in single affairs, Therefore, the value of utility for calculating sequence pattern in affairs requires to look up all occurrences, then takes maximum utility value.Value of utility chained list The next position for having recorded affairs middle term, therefore, there is no need to Multiple-Scan affairs, as long as and next position of continuous search terms Set the maximum utility value that can calculate sequence pattern in affairs.
In the disclosure, the first excavation unit 630 can be part described above and excavate 120 (i.e. second-order of part Section MapReduce).
According to the disclosure example, device 600 can also include load allocation unit (not shown), be configured For the sequence in sequence database is assigned in multiple tasks (task).The quantity of task can be expressed as m, and wherein m is positive Integer.For example, m can be the multiple of the quantity of the Mapper in second stage MapReduce.In following example, with m etc. The disclosure is described for the quantity of Mapper in second stage MapReduce.
In this example, the sequence in sequence database can be divided by load allocation unit according to load-balancing algorithm Multiple subregions.For example, the sequence in sequence database can be assigned in multiple tasks according to load-balancing algorithm.Specifically Ground can determine the quantity (Num) for the first category that the sequence includes for a sequence in sequence database.Then, from Selection has the task p of minimum workload in multiple tasks, and the sequence is assigned to task p, while according to the sequence Including the quantity of the first category update the workload of task p.For example, the workload of p-th of task can indicate For WLp, after a sequence is assigned to the task, the workload of the task is by WLpIt is updated to (WLp+Num)。
In addition, in this example, the workload of each task can be initialized as 0 by algorithm.Therefore, the of algorithm In an iteration, since the workload of each task is 0, for a sequence in sequence database, Ke Yicong A task is randomly choosed in multiple tasks, and the sequence is assigned to the task.For example, can be selected from multiple tasks 1 task, and the sequence is assigned to the 1st task.
In addition, " task " as described herein is referred to as assignment file (task file).Hereinafter, it can replace Ground uses task and assignment file.
In the disclosure, the first excavation unit 630 can be according to identified first category, from each of sequence database Subregion excavates local sequence pattern.Then, the first excavation unit 630 can be according to the local sequence excavated Mode determines at least one candidate global effective sequence pattern.Then, the first excavation unit 630 can determine the first collection It closes.
In the disclosure, local sequence pattern can be excavated from each task according to identified first category. A part of sequence pattern in these local sequence patterns may be global effective sequence pattern, another part sequence Mode may not be global effective sequence pattern.It can be using another part sequence pattern as candidate global effective sequence Column mode.
The first excavation unit 630 is described below and excavates local sequence pattern from each subregion of sequence database Process.Specifically, the item that the first category is belonged in each sequence for including for each subregion, calculates this each Value of utility and surplus utility value in sequence construct effectiveness list (utility list) of this in each sequence, according to Effectiveness list of this in each sequence determines the value of utility chain of this;According to each in subregion value of utility chain (utility chain) excavates local sequence pattern from the subregion.
In the disclosure, the surplus utility value of item in one sequence can be all items in the sequence, after this The sum of value of utility.In addition, the identification information that the effectiveness list of item in one sequence may include sequence (is represented by Sid), the identification information (being represented by tid) of each item collection where item, the effectiveness of this in each item collection in the sequence Value (being represented by acu) and surplus utility value (being represented by ru) and the instruction letter that next item collection is directed toward from an item collection It ceases (for example, pointer) (being represented by next).In addition, the value of utility chain of item may include effectiveness list of the item in each sequence.
Similarly, each value of utility for belonging to the first category in each sequence that each subregion includes can be determined Chain.It is then possible to excavate local sequence pattern from the subregion according to each in subregion value of utility chain.For example, Can using in the subregion each item and each value of utility chain as the input of traditional effective sequence pattern algorithm, and One or more local sequence patterns corresponding with the subregion are exported by the algorithm.In addition, may be used also by the algorithm To export the identification information of each value of utility of the local sequence pattern in corresponding sequence and sequence.It can be by the calculation The output of method is expressed as key-value pair (pattern, { sid, utility }), and wherein pattern indicates local sequence mould Formula, sid indicate the mark of the sequence comprising local sequence pattern, and utility indicates that local sequence pattern exists Value of utility in corresponding sequence.
Aforesaid operations can be carried out by the Mapper in second stage MapReduce.For example, sequence database is multiple Subregion can be handled by multiple Mapper in second stage MapReduce respectively, so that each Mapper can be from right with it The subregion answered excavates local sequence pattern.In this case, algorithm described above output can be Mapper Output.That is, the output of Mapper corresponding with the subregion is one or more for a subregion of sequence database A key-value pair (pattern, { sid, utility }), wherein one or more pattern be from the subregion excavate one or Multiple local sequence patterns.
Then, the first excavation unit 630 can determine at least one time with sequence pattern according to the local excavated The global effective sequence pattern of choosing.For example, can by the output of multiple Mapper, the identical key-value pair of key assignments be input to In a Reducer in two-stage MapReduce.That is, by the output of multiple Mapper, with it is same The corresponding multiple key-value pairs of pattern, for example, multiple key-value pairs corresponding with pattern x (pattern x, sid, Utility }), it is input in a Reducer.The Reducer can determine multiple value of utilities corresponding with pattern x Adduction, and according to the adduction and first threshold, to determine whether pattern x is global effective sequence pattern.If should add Be greater than or equal to first threshold, it is determined that pattern x is global effective sequence pattern.If the adduction is less than the first threshold Value, it is determined that the not global effective sequence pattern of pattern x, candidate global effective sequence pattern.
In addition, each Reducer can be by the one or more new key-value pairs of output, each new key-value pair can be by one The global effective mode of a candidate, the mark of sequence corresponding with the effective sequence pattern of the candidate and the candidate The value of utility of effective sequence pattern in the sequence is constituted.For example, the new key-value pair can be expressed as (sid, (pattern, Utility)), that is, the form of the key-value pair of Mapper output is had changed.
It can determine that " at least one candidate's is complete according to the output of multiple Reducer in second stage MapReduce Office's effective sequence pattern ".For example, can be according in the key-value pair that multiple Reducer in second stage MapReduce are exported Sequence pattern determine " at least one candidate global effective sequence pattern " in step S2032.For example, multiple The output of Reducer can be (s1, (pattern 1, utility 1)), (s2, (pattern 1, utility 1)), (s3, (pattern 2, utility 2)), (S3, (pattern 1, utility 1)), (s4, (pattern 2, utility 2)), Then " at least one the candidate global effective sequence pattern " in step S2032 can be pattern 1 and pattern 2.
In addition, the first excavation unit 630 can determine first set.For example, can be according in second stage MapReduce Multiple Reducer output, determine first set.First set may include at least one described candidate global effective The mark of the sequence of sequence pattern, global effective sequence pattern including each candidate and the global effective of each candidate Value of utility of the sequence pattern in corresponding sequence.For example, first set may include multiple subclass, each subclass includes sequence The overall situation of candidate included by the candidate global effective sequence pattern and the sequence that the mark of column, the sequence include is high The value of utility of effectiveness sequence pattern in the sequence.For example, the output of multiple Reducer in second stage MapReduce can Think (s1, (pattern 1, utility 1)), (s2, (pattern 1, utility 1)), (s3, (pattern 2, utility 2))、(s3, (pattern 1, utility 1)), (s4, (pattern 2, utility2)), then first set can To include four subclass, wherein the 1st subclass is (s1, (pattern 1, utility 1)), the 2nd subclass is (s2, (pattern 1, utility 1)), the 3rd subclass is (s3, (pattern 2, utility 2), (pattern 1, Utility 1)), the 4th subclass is (s4, (pattern 2, utility 2).
In the above example, corresponding composite module is not configured for the Mapper in second stage MapReduce.So And the present disclosure is not limited thereto.For example, it is also possible to configure corresponding composite module for the Mapper in second stage MapReduce.
In addition, in the disclosure, the second excavation unit 640 can be 130 (i.e. third rank of integrated part described above Section MapReduce).
Second excavate unit 640 can value of utility chained list according to each sequence and the first set, determine each time The local value of utility of the global effective sequence pattern of choosing.
It specifically, can be using at least one candidate global effective sequence pattern and first set as the phase III The input of multiple Mapper in MapReduce.For example, at least one candidate global effective sequence pattern can be divided For multiple groups, then multiple groups are inputted to multiple Mapper respectively.Furthermore, it is possible to which first set is inputted each Mapper.
Then, each Mapper can determine each of the global effective sequence pattern of corresponding multiple candidates The value of utility of candidate global effective sequence pattern.For example, high for multiple candidate overall situations corresponding with a Mapper A candidate global effective sequence pattern in effectiveness sequence pattern, can judge whether first set wraps by Mapper Include the global effective sequence pattern of the candidate.It, can be with when first set includes the global effective sequence pattern of the candidate The value of utility of the global effective sequence pattern of the candidate is determined according to first set.In addition, when first set does not include the time When the global effective sequence pattern of choosing, the global effective sequence of the candidate can be determined according to the value of utility chained list of sequence The value of utility of mode.
In the disclosure, each Mapper in phase III MapReduce can export one or more new key assignments It is right, wherein each new key-value pair can be made of a candidate global effective sequence pattern and its value of utility.For example, should New key-value pair can be expressed as (pattern, utility).
In addition, the same Mapper may export multiple keys corresponding with the global effective sequence pattern of the same candidate Value is to (pattern, utility).For example, for candidate global effective sequence pattern pattern y, the same Mapper Two key-value pairs may be exported, respectively (pattern y, utility 1) and (pattern y, utility 2).The two Key-value pair can also be expressed as (pattern y, Gu), wherein GuIt is the set for including utility 1 and utility 2.
In addition, in this case, a composite module can be configured to each Mapper and (such as is properly termed as Combiner), with the local value of utility of the global effective sequence pattern of the same candidate of determination.Specifically, the same candidate The local value of utility of global effective sequence pattern can be according to corresponding with the global effective sequence pattern of the candidate What the value of utility in multiple key-value pairs determined.For example, the local value of utility of the global effective sequence pattern of the same candidate can To be the adduction of value of utility in multiple key-value pairs corresponding with the global effective sequence pattern of the candidate.For example, for waiting The global effective sequence pattern pattern y of choosing, the same Mapper may export two key-value pairs, respectively (patterny, utility 1) and (pattern y, utility 2), then for candidate global effective sequence pattern The local value of utility local-utility of pattern y is (utility 1+utility 2).
In the disclosure, composite module can also export one or more new key-value pairs, wherein each new key-value pair It can be made of a candidate global effective sequence pattern and its local value of utility.For example, the new key-value pair can be with table It is shown as (pattern, local-utility).It, should in the example that candidate global effective sequence pattern is pattern y Integrated mode can export key-value pair (pattern y, utility 1+utility 2).
Then, second excavate unit 640 can according to the local value of utility of the global effective sequence pattern of each candidate, Determine the global value of utility of the global effective sequence pattern of each candidate.For example, for the global effective sequence of each candidate Column mode, can be according to multiple local value of utilities of the global effective sequence pattern of the candidate, to determine the overall situation of the candidate The global value of utility of effective sequence pattern.For example, multiple parts of the global effective sequence pattern of the candidate can be imitated With the adduction of value, the global value of utility of the global effective sequence pattern as the candidate.
Specifically, can by the output of multiple composite modules, the identical key-value pair of key assignments be input to the phase III In a Reducer in MapReduce.That is, by the overall situation in the output of multiple composite modules, with the same candidate The corresponding multiple key-value pairs of effective sequence pattern, for example, corresponding with candidate global effective sequence pattern pattern y Multiple key-value pairs are input in a Reducer.The Reducer can adding the local value of utility in this multiple key-value pair With the global value of utility (global-utility) of the global effective sequence pattern as candidate.
Then, the sequence pattern that global value of utility is greater than first threshold can be determined as the overall situation by the second excavation unit 640 Effective sequence pattern.For example, global value of utility can be higher than or be waited by each Reducer in phase III MapReduce It is determined as global effective sequence pattern in the sequence pattern of first threshold.Each Reducer can export one or more new Key-value pair, wherein each new key-value pair can be by a global effective sequence pattern and the overall situation effective sequence pattern Global value of utility constitute.For example, some Reducer can be with run-out key when pattern y is global effective sequence pattern Value is to (pattern y, global-utility).Therefore, the key of each Reducer output in phase III MapReduce The sequence pattern of value centering is global effective sequence pattern.
The device of the global effective sequence pattern of the excavation provided through this embodiment, it is determined that each in sequence database The value of utility chained list and first set of sequence, and global effective sequence pattern is excavated according to both data structures, it saves Plenty of time accelerates the calculating process for calculating global value of utility in sequence database, accelerates excavation speed, reduce Time complexity.
In addition, can also be realized by means of the framework shown in Fig. 7 for calculating equipment according to the device of the embodiment of the present disclosure. Fig. 7 shows the framework of the calculating equipment.As shown in fig. 7, calculating equipment 700 may include bus 710, one or more CPU 720, read-only memory (ROM) 730, random access memory (RAM) 740, the communication port 750 for being connected to network, input/defeated Component 760, hard disk 770 etc. out.The storage equipment in equipment 700 is calculated, such as ROM 730 or hard disk 770 can store calculating Program instruction performed by the various data or file and CPU that machine processing and/or communication use.Calculating equipment 700 can be with Including user interface 780.Certainly, framework shown in Fig. 7 is only exemplary, when realizing different equipment, according to practical need It wants, it is convenient to omit one or more components in calculating equipment shown in Fig. 7.
Embodiment of the disclosure also may be implemented as computer readable storage medium.According to the calculating of the embodiment of the present disclosure Computer-readable instruction is stored on machine readable storage medium storing program for executing.It, can be with when the computer-readable instruction is run by processor Execute the method according to the embodiment of the present disclosure referring to the figures above description.The computer readable storage medium includes but unlimited In such as volatile memory and/or nonvolatile memory.The volatile memory for example may include that arbitrary access is deposited Reservoir (RAM) and/or cache memory (cache) etc..The nonvolatile memory for example may include read-only storage Device (ROM), hard disk, flash memory etc..
It will be appreciated by those skilled in the art that a variety of variations and modifications can occur in content disclosed by the disclosure.For example, Various equipment described above or component can also pass through one in software, firmware or three by hardware realization A little or whole combinations is realized.
In addition, as shown in the disclosure and claims, unless context clearly prompts exceptional situation, " one ", " one It is a ", the words such as "an" and/or "the" not refer in particular to odd number, may also comprise plural number." first ", " second " used in the disclosure And similar word is not offered as any sequence, quantity or importance, and be used only to distinguish different component parts.Together The similar word such as sample, " comprising " or "comprising" means to occur after element or object before the word cover and appear in the word The element that face is enumerated perhaps object and its equivalent and be not excluded for other elements or object." connection " or " connected " etc. are similar Word be not limited to physics or mechanical connection, but may include electrical connection, either it is direct still Indirectly.
In addition, flow chart has been used to be used to illustrate behaviour performed by system according to an embodiment of the present disclosure in the disclosure Make.It should be understood that front or following operate not necessarily accurately carry out in sequence.On the contrary, can according to inverted order or Various steps are handled simultaneously.It is also possible to during other operations are added to these, or from these processes remove a certain step Or number step operation.
Unless otherwise defined, all terms (including technical and scientific term) used herein have leads with belonging to the present invention The identical meanings that the those of ordinary skill in domain is commonly understood by.It is also understood that those of definition term such as in usual dictionary The meaning consistent with their meanings in the context of the relevant technologies should be interpreted as having, without application idealization or The meaning of extremely formalization explains, unless being clearly defined herein.
The disclosure is described in detail above, but it will be apparent to a person skilled in the art that the disclosure not limits The fixed embodiment illustrated in this manual.The disclosure is not departing from the disclosure determined by the record of claims Under the premise of objective and range, modifications and changes mode can be used as to implement.Therefore, the record of this specification is said with example For the purpose of bright, for purposes of this disclosure not with the meaning of any restrictions.

Claims (15)

1. a kind of method for excavating global effective sequence pattern, comprising:
The first category in sequence database is determined, wherein the first category is that global sequence's weight value of utility is higher than first threshold ;
Determine the value of utility chained list of each sequence in the sequence database;
According to identified first category, at least one candidate global effective sequence pattern is excavated from the sequence database And determine first set, wherein the first set include at least one described candidate global effective sequence pattern including The global effective sequence pattern of the mark of the sequence of the global effective sequence pattern of each candidate and each candidate are in phase Answer the value of utility in sequence;And
According to the value of utility chained list of each sequence and the first set, from least one described candidate global effective sequence Global effective sequence pattern is excavated in mode.
2. the method as described in claim 1, wherein the first category in the determining sequence database includes:
Determine each in sequence database global sequence's weight value of utility;And
The item that global sequence's weight value of utility is higher than first threshold is determined as the first category.
3. method according to claim 2, wherein determining each in sequence database global sequence's weight value of utility packet It includes:
Determine this in the local sequence weights value of utility of each subregion of sequence database;And
Global sequence's weight value of utility of this is determined according to identified local sequence weights value of utility.
4. method as claimed in claim 3, wherein local sequence weights of this in each subregion of the sequence database Value of utility is to be determined in the subregion according to the value of utility for the sequence for including this.
5. such as the described in any item methods of Claims 1-4, wherein in the determining sequence database each sequence value of utility Chained list includes:
According to the position of each value of utility and each item in the sequence in the sequence, the value of utility chain of the sequence is determined Table.
6. such as the described in any item methods of Claims 1-4, wherein first category according to determined by, from the sequence At least one candidate global effective sequence pattern of database mining includes:
According to identified first category, local sequence pattern is excavated from each subregion of the sequence database;With And
At least one candidate global effective sequence pattern is determined with sequence pattern according to the local excavated.
7. method as claimed in claim 6, wherein according to identified first category, from each of described sequence database point Local is excavated with sequence pattern in area
Belong to an item of the first category in each sequence for including for the subregion,
Value of utility and surplus utility value of this in each sequence are calculated, wherein the surplus utility value of this in one sequence It is all the sum of value of utilities in the sequence, after this;
Construct effectiveness list of this in each sequence;
The value of utility chain of this is determined according to effectiveness list of this in each sequence;
According to each in subregion value of utility chain, local sequence pattern is excavated from the subregion.
8. such as the described in any item methods of Claims 1-4, wherein according to the value of utility chained list of each sequence and first collection It closes, includes: from least one described candidate global effective sequential mode mining overall situation effective sequence pattern
According to the value of utility chained list of each sequence and the first set, the global effective sequence pattern of each candidate is determined Local value of utility;
According to the local value of utility of the global effective sequence pattern of each candidate, the global effective sequence of each candidate is determined The global value of utility of mode;And
The sequence pattern that global value of utility is greater than first threshold is determined as global effective sequence pattern.
9. method as claimed in claim 6, further includes:
The sequence in the sequence database is divided into multiple subregions according to load-balancing algorithm.
10. a kind of for excavating the device of global effective sequence pattern, comprising:
First determination unit, the first category being configured to determine that in sequence database, wherein the first category is global sequence's power Weight value of utility is higher than the item of first threshold;
Second determination unit is configured to determine that the value of utility chained list of each sequence in the sequence database;
First excavates unit, is configured as excavating at least one time from the sequence database according to identified first category The global effective sequence pattern of choosing simultaneously determines first set, wherein the first set includes the complete of at least one candidate The mark of the sequence of office's effective sequence pattern, global effective sequence pattern including each candidate and each candidate's is complete Value of utility of office's effective sequence pattern in corresponding sequence;And
Second excavates unit, the value of utility chained list and the first set according to each sequence is configured as, from described at least one Global effective sequence pattern is excavated in the global effective sequence pattern of a candidate.
11. device as claimed in claim 10, wherein first determination unit is configured to determine that the sequence database In each global sequence's weight value of utility;And the item that global sequence's weight value of utility is higher than first threshold is determined as the One category.
12. device as described in claim 10 or 11, wherein second determination unit is configured as according in each sequence The position of each value of utility and each item in the sequence determines the value of utility chained list of the sequence.
13. device as described in claim 10 or 11, wherein the second excavation unit is configured as according to each sequence Value of utility chained list and the first set determine the local value of utility of the global effective sequence pattern of each candidate;According to each The local value of utility of the global effective sequence pattern of a candidate determines the overall situation of the global effective sequence pattern of each candidate Value of utility;And the sequence pattern that global value of utility is greater than first threshold is determined as global effective sequence pattern.
14. a kind of for excavating the device of global effective sequence pattern, comprising:
Processor;And
Memory, wherein computer executable program is stored in the memory, when by the processor execution calculating When machine executable program, perform claim requires method described in any one of 1-9.
15. a kind of computer readable storage medium is stored thereon with instruction, described instruction is when being executed by processor, so that institute It states processor and executes method as claimed in any one of claims 1-9 wherein.
CN201910692048.6A 2019-07-26 2019-07-26 Method, device and computer storage medium for mining global high utility sequence pattern Active CN110399406B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910692048.6A CN110399406B (en) 2019-07-26 2019-07-26 Method, device and computer storage medium for mining global high utility sequence pattern

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910692048.6A CN110399406B (en) 2019-07-26 2019-07-26 Method, device and computer storage medium for mining global high utility sequence pattern

Publications (2)

Publication Number Publication Date
CN110399406A true CN110399406A (en) 2019-11-01
CN110399406B CN110399406B (en) 2024-06-04

Family

ID=68326602

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910692048.6A Active CN110399406B (en) 2019-07-26 2019-07-26 Method, device and computer storage medium for mining global high utility sequence pattern

Country Status (1)

Country Link
CN (1) CN110399406B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120130964A1 (en) * 2010-11-18 2012-05-24 Yen Show-Jane Fast algorithm for mining high utility itemsets
KR20140064077A (en) * 2012-11-19 2014-05-28 충북대학교 산학협력단 Method of mining high utility patterns
CN109446235A (en) * 2018-10-18 2019-03-08 哈尔滨工业大学(深圳) Multidimensional effective sequence pattern processing method, device and computer equipment
CN109460424A (en) * 2018-10-18 2019-03-12 哈尔滨工业大学(深圳) Effective sequence pattern processing method, device and computer equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120130964A1 (en) * 2010-11-18 2012-05-24 Yen Show-Jane Fast algorithm for mining high utility itemsets
KR20140064077A (en) * 2012-11-19 2014-05-28 충북대학교 산학협력단 Method of mining high utility patterns
CN109446235A (en) * 2018-10-18 2019-03-08 哈尔滨工业大学(深圳) Multidimensional effective sequence pattern processing method, device and computer equipment
CN109460424A (en) * 2018-10-18 2019-03-12 哈尔滨工业大学(深圳) Effective sequence pattern processing method, device and computer equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JERRY CHUN-WEI LIN ET AL: "High-Utility Sequential Pattern Mining with Multiple Minimum Utility Thresholds", APWEB-WAIM 2017, PART I, 31 December 2017 (2017-12-31), pages 215 - 229 *
JUNQIANG LIU ET AL: "Mining High Utility Patterns in One Phase without Generating Candidates", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, vol. 28, 17 December 2015 (2015-12-17), pages 1245 - 1257, XP011604910, DOI: 10.1109/TKDE.2015.2510012 *
MORTEZA ZIHAYAT ET AL: "Distributed and Parallel High Utility Sequential Pattern Mining", 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 6 February 2017 (2017-02-06), pages 853 - 862 *

Also Published As

Publication number Publication date
CN110399406B (en) 2024-06-04

Similar Documents

Publication Publication Date Title
Yang et al. MapReduce as a programming model for association rules algorithm on Hadoop
Raj et al. EAFIM: efficient apriori-based frequent itemset mining algorithm on Spark for big transactional data
Xin et al. ELM∗: distributed extreme learning machine with MapReduce
CN101739281A (en) Infrastructure for parallel programming of clusters of machines
Ngu et al. B+-tree construction on massive data with Hadoop
CN112287015A (en) Image generation system, image generation method, electronic device, and storage medium
Chen et al. Highly scalable sequential pattern mining based on mapreduce model on the cloud
EP3494487A1 (en) Learned data filtering
CN106326475A (en) High-efficiency static hash table implement method and system
CN112052404A (en) Group discovery method, system, device and medium for multi-source heterogeneous relation network
CN107102999A (en) Association analysis method and device
CN104731925A (en) MapReduce-based FP-Growth load balance parallel computing method
Huynh et al. An efficient method for mining frequent sequential patterns using multi-core processors
CN104077438A (en) Power grid large-scale topological structure construction method and system
CN104834709B (en) A kind of parallel cosine mode method for digging based on load balancing
CN103577455A (en) Data processing method and system for database aggregating operation
CN106445645A (en) Method and device for executing distributed computation tasks
CN111915011A (en) Single-amplitude quantum computation simulation method
Tar et al. Parallel search paths for the simplex algorithm
Engström et al. PageRank for networks, graphs, and Markov chains
Lin et al. Mining high-utility sequential patterns from big datasets
Guan An incremental updating algorithm of attribute reduction set in decision tables
CN110399406A (en) Excavate the method, apparatus and computer storage medium of global effective sequence pattern
JP5464017B2 (en) Distributed memory database system, database server, data processing method and program thereof
CN109857832A (en) A kind of preprocess method and device of payment data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TG01 Patent term adjustment
TG01 Patent term adjustment