CN110399406A - Excavate the method, apparatus and computer storage medium of global effective sequence pattern - Google Patents
Excavate the method, apparatus and computer storage medium of global effective sequence pattern Download PDFInfo
- Publication number
- CN110399406A CN110399406A CN201910692048.6A CN201910692048A CN110399406A CN 110399406 A CN110399406 A CN 110399406A CN 201910692048 A CN201910692048 A CN 201910692048A CN 110399406 A CN110399406 A CN 110399406A
- Authority
- CN
- China
- Prior art keywords
- sequence
- utility
- value
- pattern
- global
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000009412 basement excavation Methods 0.000 claims description 28
- 238000005065 mining Methods 0.000 claims description 18
- 239000003638 chemical reducing agent Substances 0.000 description 54
- 239000002131 composite material Substances 0.000 description 25
- 230000008569 process Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 9
- 230000000694 effects Effects 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Present disclose provides a kind of method, apparatus and computer readable storage medium for excavating global effective sequence pattern.This method comprises: the first category in sequence database is determined, wherein the first category is the item that global sequence's weight value of utility is higher than first threshold;Determine the value of utility chained list of each sequence in sequence database;According to identified first category, at least one candidate global effective sequence pattern is excavated from sequence database and determines first set, and wherein first set includes the mark of the sequence of at least one candidate global effective sequence pattern, global effective sequence pattern including each candidate and value of utility of the global effective sequence pattern in corresponding sequence of each candidate;And value of utility chained list and first set according to each sequence, global effective sequence pattern is excavated from the global effective sequence pattern of at least one candidate.
Description
Technical field
This disclosure relates to data processing field, and in particular, to a kind of method for excavating global effective sequence pattern, dress
It sets and computer readable storage medium.
Background technique
Sequential mode mining is the important technology of the field of data mining.Sequential mode mining is for sequence database.
Sequence database may include a plurality of sequence (being referred to as affairs (transaction)), wherein each sequence may include
At least one item collection (itemset), each item collection includes at least one (item), and there are collating sequences between item collection.
By taking the purchase data of supermarket as an example, certain user had purchased commodity a and commodity b at first day, had purchased commodity a and commodity within second day
C, third day have purchased commodity b.Purchase data of the user in this period can be abstracted as a sequence: < [a b], [a c],
[b] >, wherein a, b and c are items, and the item in [] constitutes an item collection, and multiple item collections, which are arranged in order, constitutes sequence.Effective
What Sequential Pattern Mining Algorithm was excavated is the grouping of commodities that value of utility is higher than preset threshold, i.e. sequence pattern (pattern).Sequence
Column mode is the ordered arrangement of different item collections.
During excavating effective mode, effective mode is searched by calculating the total utility value of entire database
Process need more calculating, the excavation of effective sequence pattern is even more so.Therefore, effective sequential mode mining is than passing
The effective mode excavation and Frequent Sequential Patterns of system excavate more complicated.Current distribution and parallel mode excavation concentration
It is excavated in effective mode excavation and Frequent Sequential Patterns, for example, effective mode excavation can be carried out in Hadoop platform
It is excavated with Frequent Sequential Patterns.Therefore, there is not yet distributed and parallel effective sequential mode mining method.
Summary of the invention
For this purpose, present disclose provides a kind of method, apparatus for excavating global effective sequence pattern and computer-readable depositing
Storage media.
According to one aspect of the disclosure, a kind of method for excavating global effective sequence pattern is provided, comprising:
The first category in sequence database is determined, wherein the first category is the item that global sequence's weight value of utility is higher than first threshold;
Determine the value of utility chained list of each sequence in the sequence database;According to identified first category, from the sequence data
Library excavates at least one candidate global effective sequence pattern and determines first set, wherein the first set includes described
The mark of the sequence of at least one candidate global effective sequence pattern, global effective sequence pattern including each candidate
And value of utility of the global effective sequence pattern of each candidate in corresponding sequence;And the value of utility according to each sequence
Chained list and the first set excavate global effective sequence from least one described candidate global effective sequence pattern
Mode.
According to the disclosure example, wherein the first category in the determining sequence database comprises determining that sequence
Each global sequence's weight value of utility in database;And it is global sequence's weight value of utility is true higher than the item of first threshold
It is set to the first category.
According to the disclosure example, wherein determining each in sequence database global sequence's weight value of utility packet
It includes: determining this in the local sequence weights value of utility of each subregion of sequence database;And according to identified local sequence
Column weight value of utility determines global sequence's weight value of utility of this.
According to the disclosure example, wherein local sequence weights of this in each subregion of the sequence database
Value of utility is to be determined in the subregion according to the value of utility for the sequence for including this.
According to the disclosure example, wherein the value of utility chained list for determining each sequence in sequence database includes: root
According to the position of each in the sequence value of utility and each item in the sequence, the value of utility chained list of the sequence is determined.
According to the disclosure example, wherein according to identified first category, from the sequence database excavate to
A few candidate global effective sequence pattern includes: according to identified first category, from each point of sequence database
Excavate local sequence pattern in area;And determine at least one candidate's with sequence pattern according to the local excavated
Global effective sequence pattern.
According to the disclosure example, wherein according to identified first category, from each of described sequence database
It includes: one for belonging to the first category in each sequence for including for the subregion that subregion, which excavates local sequence pattern,
, value of utility and surplus utility value of this in each sequence are calculated, wherein the surplus utility value of this in one sequence
It is all the sum of value of utilities in the sequence, after this;Construct effectiveness list of this in each sequence;According to this
Effectiveness list of the item in each sequence determines the value of utility chain of this;According to each in subregion value of utility chain, from this
Subregion excavates local sequence pattern.
According to the disclosure example, wherein according to the value of utility chained list of each sequence and the first set, from institute
Stating at least one candidate global effective sequential mode mining overall situation effective sequence pattern includes: the effect according to each sequence
With value chained list and the first set, the local value of utility of the global effective sequence pattern of each candidate is determined;According to each
The local value of utility of candidate global effective sequence pattern determines the global effect of the global effective sequence pattern of each candidate
With value;And the sequence pattern that global value of utility is greater than first threshold is determined as global effective sequence pattern.
According to the disclosure example, the above method further include: according to load-balancing algorithm by the sequence database
In sequence be divided into multiple subregions.
According to another aspect of the present disclosure, it provides a kind of for excavating the device of global effective sequence pattern, comprising:
First determination unit, the first category being configured to determine that in sequence database, wherein the first category is global sequence's weight effect
It is higher than the item of first threshold with value;Second determination unit is configured to determine that the effectiveness of each sequence in the sequence database
It is worth chained list;First excavates unit, is configured as excavating at least one from the sequence database according to identified first category
Candidate global effective sequence pattern simultaneously determines first set, wherein the first set includes at least one candidate
The mark of the sequence of global effective sequence pattern, global effective sequence pattern including each candidate and each candidate
Value of utility of the global effective sequence pattern in corresponding sequence;And second excavate unit, be configured as according to each sequence
Value of utility chained list and the first set, excavated from least one described candidate global effective sequence pattern global high
Effectiveness sequence pattern.
According to the disclosure example, wherein first determination unit is configured to determine that in the sequence database
Each global sequence's weight value of utility;And the item that global sequence's weight value of utility is higher than first threshold is determined as first
Category.
According to the disclosure example, wherein second determination unit is configured to determine that each item in sequence data
The local sequence weights value of utility of each subregion in library;And determine this 's according to identified local sequence weights value of utility
Global sequence's weight value of utility.
According to the disclosure example, wherein local sequence weights effectiveness of this in each subregion of sequence database
Value is to be determined in the subregion according to the value of utility for the sequence for including this.
According to the disclosure example, wherein second determination unit is configured as according to item each in each sequence
Value of utility and each item position in the sequence, determine the value of utility chained list of the sequence.
According to the disclosure example, wherein the first excavation unit is configured as according to the identified first kind
, local sequence pattern is excavated from each subregion of sequence database;And according to the local sequence excavated
Column mode determines at least one candidate global effective sequence pattern.
According to the disclosure example, wherein the first excavation unit is configured as each of sequence database
An item for belonging to the first category in each sequence that subregion includes calculates value of utility of this in each sequence and remaining effect
With value, wherein the surplus utility value of this in one sequence is all the sum of value of utilities in the sequence, after this;
Construct effectiveness list of this in each sequence;The value of utility of this is determined according to effectiveness list of this in each sequence
Chain;According to each in subregion value of utility chain, local sequence pattern is excavated from the subregion.
According to the disclosure example, wherein the second excavation unit is configured as the value of utility according to each sequence
Chained list and the first set determine the local value of utility of the global effective sequence pattern of each candidate;According to each candidate
Global effective sequence pattern local value of utility, determine the global effectiveness of the global effective sequence pattern of each candidate
Value;And the sequence pattern that global value of utility is greater than first threshold is determined as global effective sequence pattern.
According to the disclosure example, above-mentioned apparatus further includes load allocation unit, is configured as according to load balancing
Sequence in sequence database is divided into multiple subregions by algorithm.
According to another aspect of the present disclosure, it provides a kind of for excavating the device of global effective sequence pattern, comprising:
Processor;And memory, wherein be stored with computer executable program in the memory, executed when by the processor
When the computer executable program, the above method is executed.
According to another aspect of the present disclosure, a kind of computer readable storage medium is provided, instruction is stored thereon with, it is described
Instruction is when being executed by processor, so that the processor executes the above method.
Pass through the method, apparatus and computer-readable storage medium of the global effective sequence pattern of excavation that the disclosure provides
Matter, it is determined that the value of utility chained list and first set of each sequence in sequence database, and dug according to both data structures
Global effective sequence pattern is dug, the plenty of time is saved, accelerates the calculating for calculating global value of utility in sequence database
Process accelerates excavation speed, reduces time complexity.
Detailed description of the invention
The embodiment of the present disclosure is described in more detail in conjunction with the accompanying drawings, the above-mentioned and other purpose of the disclosure,
Feature and advantage will be apparent.Attached drawing is used to provide to further understand the embodiment of the present disclosure, and constitutes explanation
A part of book is used to explain the disclosure together with the embodiment of the present disclosure, does not constitute the limitation to the disclosure.In the accompanying drawings,
Identical reference label typically represents same parts or step.
Fig. 1 is the system architecture that global effective sequence pattern is excavated according to the slave sequence database of the embodiment of the present disclosure
Schematic diagram.
Fig. 2 is the flow chart according to the method for excavating global effective sequence pattern of the embodiment of the present disclosure.
Fig. 3 shows the schematic diagram of effectiveness list of a in sequence s1.
Fig. 4 shows the schematic diagram of the value of utility chain of an a.
Fig. 5 is according to the global efficiently from least one candidate global effective sequential mode mining of the embodiment of the present disclosure
With the flow chart of the method for sequence pattern.
Fig. 6 is the structural schematic diagram according to the device for excavating global effective sequence pattern of the embodiment of the present disclosure.
Fig. 7 shows the framework of the computer equipment according to the embodiment of the present disclosure.
Specific embodiment
In order to enable the purposes, technical schemes and advantages of the disclosure become apparent, root is described in detail below with reference to accompanying drawings
According to the example embodiment of the disclosure.In the accompanying drawings, identical reference label indicates identical element from beginning to end.It is understood that
The embodiments described herein is merely illustrative, and is not necessarily to be construed as limiting the scope of the present disclosure.
In the disclosure, when the value of utility of sequence pattern is higher, for example, when the value of utility of sequence pattern is higher than default threshold
When value, which can be known as " effective sequence pattern ".That is, " effective sequence pattern " can be effectiveness
Value is higher than the sequence pattern of preset threshold.Here " preset threshold " can be fixed and invariable, or can be calculated with excavating
The change of the application scenarios of method and change.
The present disclosure proposes a kind of technical solutions of distributed and parallel effective sequential mode mining.In the disclosure
In, by realizing distributed and parallel effective sequential mode mining based on the distributed computing framework of Hadoop platform.
In mining process, using the value of utility chained list and first set of sequence each in sequence database, in Lai Baocun mining process
Necessary information reduces time complexity to accelerate excavation speed." distributed computing framework " mentioned herein can be
Mapping and (MapReduce) frame is concluded, wherein Map is a key assignments (key-value) to being mapped to a new key assignments
Right, Reduce is the identical value of key assignments centering key to be integrated, while being mapped to new key-value pair.In addition, executing the mould of Map operation
Block is properly termed as Mapper, and the module for executing Reduce operation is properly termed as Reducer.
Firstly, describing to excavate global effective sequence mould according to the slave sequence database of the embodiment of the present disclosure referring to Fig.1
The system architecture of formula (Global-High Utility Sequence Pattern, G-HUSP).Fig. 1 is implemented according to the disclosure
The slave sequence database of example excavates the schematic diagram of the system architecture of global effective sequence pattern.As shown in Figure 1, system architecture
100 may include three parts, and respectively part 120 and integrated part 130 are excavated in identification division 110, part.Identification division 110
It may include multiple Mapper and multiple Reducer, such as n Mapper and n Reducer, wherein n is positive integer.Identification
Part 110 is determined for the first category in sequence database, which is that global sequence's weight value of utility is higher than
The item of first threshold.First category is possible to constitute the item of effective sequence pattern, therefore is referred to as promising item
(promising item).It may include multiple Mapper and multiple Reducer, such as n Mapper that part 120 is excavated in part
With n Reducer, wherein n is positive integer.Part 120 is excavated in part can be used for being dug according to the first category from sequence database
Excavate local sequence pattern (Local-High Utility Sequence Pattern, L-HUSP).The local
May may not be for global effective sequence pattern, another part sequence pattern with a part of sequence pattern in sequence pattern
Global effective sequence pattern, then another part block can be used as candidate global effective sequence pattern.In addition,
Part excavate part 120 can be also used for determine first set (sidset can be expressed as), the first set may include to
The mark of the sequence of a few candidate global effective sequence pattern, global effective sequence pattern including each candidate with
And value of utility of the global effective sequence pattern of each candidate in corresponding sequence.Further, it is also possible to determine sequence database
In each sequence value of utility chained list.Integrated part 130 may include multiple Mapper and multiple Reducer, such as n
Mapper and n Reducer, wherein n is positive integer.Integrated part 130 can be used for the value of utility chained list according to each sequence
With the first set, global effective sequence mould is excavated from least one described candidate global effective sequence pattern
Formula.By system architecture shown in FIG. 1, the value of utility chained list and first set of each sequence in sequence database can be used,
It saves necessary information in mining process, to accelerate the excavation speed of effective sequence pattern, reduces time complexity.
It will be appreciated that this is only schematical although triphasic MapReduce is shown in FIG. 1.According to this
Disclosed embodiment can also be the MapReduce in less or more stage.In addition, the MapReduce in each stage includes
Mapper and the number of Reducer may be the same or different.In addition, the MapReduce of different phase includes
The number of Mapper and/or Reducer may be the same or different.
Moreover, it should be understood that in the disclosure, " part " is for a subregion of database and " overall situation "
It is to be directed to database generally speaking.For example, " the local sequence pattern " in the disclosure can be one from database
The effective sequence pattern that a subregion is excavated is the sequence pattern of effective for the subregion;And it is " complete in the disclosure
Pair office's effective sequence pattern " can be from multiple locals effective sequence pattern excavated in sequence pattern, i.e.,
Generally speaking database is the sequence pattern of effective.In another example " the local sequence weights value of utility " in the disclosure can be
The value of utility determined according to the data in database subregion;And " the global effective sequence pattern " in the disclosure can be with
It is the value of utility determined according to all data in database.
The excavation overall situation effective sequence pattern of system framework according to figure 1 is specifically described below in conjunction with Fig. 2
Method flow chart.Fig. 2 is the stream according to the method 200 for excavating global effective sequence pattern of the embodiment of the present disclosure
Cheng Tu.As shown in Fig. 2, in step s 201, the first category in sequence database is determined, wherein the first category is global sequence
Weight value of utility (Global Sequence Weight Utility, GSWU) is higher than the item of first threshold.
In the disclosure, sequence database may include multiple sequences and identification information corresponding with each sequence.In
In the disclosure, sequence can be Quantitative Sequence (quantitative sequence).Identification information corresponding with each sequence can
With referred to as sequence identifier (sequence id, sid).S can be usedlIndicate the sequence identifier of the l articles sequence, wherein l is positive whole
Number.Each sequence may include one or more item collections, and each item collection may include one or more items.Each item has inside
Value of utility and external value of utility.In the database of type of transaction, internal value of utility can be the number of transaction of item.At other
In the database of scape, the form of internal value of utility can adjust accordingly.The table of each external value of utility in database of record
Lattice are properly termed as external value of utility table.In the database of type of transaction, external value of utility table can be profit flow table, i.e., external effect
It can recorde unit profit value every in database with value table.In the database of other scenes, the shape of external value of utility table
Formula can adjust accordingly.
Table 1 below illustrates an examples of sequence database.As shown in table 1, sequence database is type of transaction
Database comprising 5 sequences, respectively s1~s5.Every sequence by same customer different time purchase inventory group
At each purchase inventory is item collection, and the commodity of purchase are item.For example, sequence s1 indicates that customer first buys 2 commodity a and 3
A commodity c, then 3 commodity a, 1 commodity b and 2 commodity c are bought, then buy 4 commodity a, 5 commodity b and 4 commodity
D finally has purchased 3 commodity e.
sid | Sequence |
s1 | <[(a: 2) (c: 3)], [(a: 3) (b: 1) (c: 2)], [(a: 4) (b: 5) (d: 4)], [(e: 3)]> |
s2 | <[(a: 1) (e: 3)], [(a: 5) (b: 3) (d: 2)], [(b: 2) (c: 1) (d: 4) (e: 3)]> |
s3 | <[(e: 2)], [(c: 2) (d: 3)], [(a: 3) (e: 3)], [(b: 4) (d: 5)]> |
s4 | <[(b: 2) (c: 3)], [(a: 5) (e: 1)], [(b: 4) (d: 3) (e: 5)]> |
s5 | <[(a: 4) (c: 3)], [(a: 2) (b: 5) (c: 2) (d: 4) (e: 3)]> |
The example of 1 sequence database of table
Table 2 below shows an examples of external value of utility table.As shown in table 2, the profit of commodity a is 5, commodity b
Profit be 3, the profit of commodity c is 4, and the profit of commodity d is 2, and the profit that the profit of commodity e is 1 and commodity f is 6.
a | b | c | d | e | f | |
Profit | 5 | 3 | 4 | 2 | 1 | 6 |
The example of 2 outside value of utility table of table
In step s 201, each in sequence database global sequence's weight value of utility can be determined, and will be global
The item that sequence weights value of utility is higher than first threshold is determined as the first category.Step S201 can be by identification part described above
Divide 110 (i.e. first stage MapReduce) Lai Jinhang.
The process of each in determining sequence database global sequence's weight value of utility is described below.According to the disclosure
An example can determine this in each subregion of sequence database first each item in sequence database
Local sequence weights value of utility (Local Sequence Weight Utility, LSWU), then according to identified local sequence
Column weight value of utility determines global sequence's weight value of utility of this.
For example, sequence database can be divided into multiple subregions first, and multiple subregion is distributed into the first stage
Multiple Mapper in MapReduce.For example, sequence database can be divided into n subregion, and the 1st subregion is divided
Mapper 1 in dispensing first stage MapReduce ..., k-th of subregion is distributed in first stage MapReduce
Mapper k ..., n-th of subregion is distributed into the Mapper n in first stage MapReduce, wherein 1≤k≤n
It and is positive integer.
Then, for each sequence in k-th of subregion, Mapper k can determine the value of utility of the sequence.For example,
Mapper k can determine the value of utility of the sequence according to the method for the value of utility of traditional sequence of calculation.For example, sequence
Value of utility can for form the sequence each item collection value of utility in the sequence adduction.In the disclosure, sequence sl's
Value of utility can be expressed as u (sl)。
Then, for each item in the sequence, key-value pair is can be generated in Mapper k, and the key-value pair can be by this
Item and the value of utility of the sequence are constituted.For example, for sequence slIn item i, Mapper k key-value pair (i, u (s can be generatedl))。
It can be seen that the content of the sequence identifier of sequence and the sequence can be used as a key-value pair input Mapper k, then,
The one or more new key-value pairs of Mapper k output.
Further, since the different sequences in each subregion may include the same item, therefore, in these different sequences
The same item, multiple key-value pairs can be generated in Mapper.In this case, a combination can be configured to each Mapper
Module (such as being properly termed as combiner), with the same item of determination each subregion local sequence weights value of utility.Specifically
Ground, this can be in the local sequence weights value of utility of each subregion of sequence database according to including this in the subregion
What the value of utility of sequence determined.For example, this can be in the local sequence weights value of utility of each subregion of sequence database
It include the sum of the value of utility of sequence of this in the subregion.In this way, it is possible to reduce Reducer which will be described
Workload, to reduce the requirement to communications cost and haulage time.For example, can by following formula (1) come
Determine item i in the local sequence weights value of utility of k-th of subregion of sequence database:
Wherein, i indicates item, DkIndicate that k-th of subregion of sequence database, s indicate the sequence including this, u (s) is indicated
The value of utility of sequence.
It is weighed below with a specific example to describe a determining item in the local sequence of a subregion of sequence database
The process of weight value of utility.For example, k-th of subregion in sequence database includes sequence s1With sequence s2Example in, Mapper k
It can determine sequence s1With with sequence s2Value of utility be respectively u (s1) and u (s2).Then, for sequence s1In each item, i.e.,
Key-value pair (a, u (s can be generated in item a, item b, item c, item d and item e, Mapper k1)), (b, u (s1)), (c, u (s1)), (d, u
(s1)), (e, u (s1)).For sequence s2In each item, i.e. key can be generated in item a, item b, item c, item d and item e, Mapper k
Value is to (a, u (s2)), (b, u (s2)), (c, u (s2)), (d, u (s2)), (e, u (s2)).Therefore, for item a, there are two key assignments
It is right, i.e. (a, u (s1)) and (a, u (s2)).The two key-value pairs of item a can also be expressed as (a, lu), wherein luBeing includes u (s1)
With u (s2) set.Then, composite module is to set luIn element summation, i.e. u (s1)+u(s2), to obtain item a at k-th
The local sequence weights value of utility LSWU of subregiona-k=u (s1)+u(s2).Similarly, a b, item c, item d and item e can be obtained to exist
The local sequence weights value of utility of k-th of subregion.
It can be seen that the key-value pair of Mapper k output can be used as the defeated of composite module corresponding with the Mapper k
Enter, then the composite module generates new key-value pair.The new key-value pair can by item and this k-th of subregion local sequence
Column weight value of utility is constituted.For example, for item i, composite module corresponding with Mapper k can be generated key-value pair (i,
LSWUi-k).In the example that item i is item a, composite module corresponding with Mapper k can export key-value pair (a, LSWUa-k)。
By mode above, each item can be determined in the local sequence weights effectiveness of each subregion of sequence database
Value.Determined each item after the local sequence weights value of utility of each subregion of sequence database, can according to really
Fixed local sequence weights value of utility determines global sequence's weight value of utility of this.For example, can be by each item in sequence number
Global sequence's weight value of utility according to the sum of the local sequence weights value of utility of each subregion in library, as this.
Specifically, can by the output of multiple composite modules, the identical key-value pair of key assignments be input to the first stage
In a Reducer in MapReduce.That is, by the output of multiple composite modules, it is corresponding with the same item more
A key-value pair, for example, multiple key-value pair (i, LSWU corresponding with item ii-k), it is input in a Reducer.The Reducer can
With global sequence's weight value of utility by the adduction of the local sequence weights value of utility in this multiple key-value pair, as item i
GSWUi.For example, can determine global sequence's weight value of utility of an i by following formula (2):
Wherein, GSWU (i, D) indicates global sequence weight value of utility of the item i in sequence database D, DkIndicate sequence number
According to k-th of subregion in library, LSWU (i, Dk) indicate item i in the local sequence weights value of utility of k-th of subregion of sequence database.
So far, it has been described that determine the process of each in sequence database global sequence's weight value of utility.True
It is each in first stage MapReduce after having determined each in sequence database global sequence's weight value of utility
The item that global sequence's weight value of utility is greater than or equal to first threshold can be determined as the first category by Reducer, and be abandoned complete
Office's sequence weights value of utility is less than the item of first threshold.Each Reducer can export one or more new key-value pairs, wherein
Each new key-value pair can be made of global sequence's weight value of utility of first category and first category.For example, working as
When item i is the first category, some Reducer can export key-value pair (i, GSWUi)。
" first threshold " as described herein can be the total utility value and threshold factor according to database and determination.Example
Such as, " first threshold " can be the total utility value of database and the product of threshold factor.It can be according to traditional calculating database
The method of total utility value determine the total utility value of database.For example, the total utility value of database can be in database respectively
The adduction of the value of utility of a affairs.The total utility value of database can be expressed as u (D).Threshold factor can be it is pre-set,
It can be expressed as δ.Therefore, first threshold can be expressed as δ × u (D).
By step S201, the item for being hopeful to constitute effective sequence pattern can be identified.Unrecognized item out can
To be dropped, and no longer need to consider.By step S201, the space for searching for effective sequence pattern is searched than original
Rope space reduces very much, to improve search speed, accelerates excavation speed.
It returns to Fig. 2 and determines the value of utility chained list of each sequence in sequence database in step S202.Step S202 can
To be executed before or after step S201, synchronous with step S201 can also execute.
It can be according to each in the sequence for a sequence in sequence database according to the disclosure example
The position of the value of utility and each item of item in the sequence, determines the value of utility chained list of the sequence.The value of utility of item can be item
Inside value of utility and external value of utility product.The position of each item in the sequence may include each initial position and
Adjacent position, the initial position of middle term can be the position that item occurs for the first time in the sequence, and adjacent position can be item and exist
The position occurred next time in sequence.In addition, the value of utility chained list of a sequence may include two rows, wherein the first row be can be
Information about each value of utility and adjacent position (can be referred to as Utility Position Information, UP
Information), the second row can be (can be referred to as about the information of the initial position of the non-duplicate item in sequence
Header Table).Second row may include the initial position of non-duplicate item and each non-duplicate item.
Table 3 below shows the sequence s in table 11Value of utility chained list.As shown in table 3, sequence s1Value of utility chained list
Including two rows, the first row shows sequence s1In each a, b, c, d, e value of utility and adjacent position, the second row show sequence
Arrange s1In each a, b, c, d, e initial position.Specifically, " a " in the element in the first row (a, 10,3) indicates sequence s1
In the 1st item, " 10 " indicate item a in sequence s1In value of utility be 10, " 3 " indicate item a in sequence s1In next time occur
Position." c " in element (c, 8, -) in the first row indicates sequence s1In the 5th item, " 8 " indicate item c in sequence s1In
Value of utility is 8, and "-" indicates item c in sequence s1In there is no next position." a " in element (a, 1) in second row indicates sequence
Arrange s1In item, " 1 " indicate item a in sequence s1In initial position.
3 sequence s of table1Value of utility chained list example
It is appreciated that the value of utility chained list of sequence is by the way that the sequence in raw data base is converted and extended and shape
At, it has recorded the information about raw data base and needs public information calculated.By the value of utility chained list of sequence,
The calculating speed of sequence pattern can be improved.This is because, target sequence mode may have multiple occurrences in single affairs,
Therefore, the value of utility for calculating sequence pattern in affairs requires to look up all occurrences, then takes maximum utility value.Value of utility chained list
The next position for having recorded affairs middle term, therefore, there is no need to Multiple-Scan affairs, as long as and next position of continuous search terms
Set the maximum utility value that can calculate sequence pattern in affairs.
It returns to Fig. 2 and, according to identified first category, excavates at least one time from sequence database in step S203
The global effective sequence pattern of choosing simultaneously determines first set, wherein the first set includes the complete of at least one candidate
The mark of the sequence of office's effective sequence pattern, global effective sequence pattern including each candidate and each candidate's is complete
Value of utility of office's effective sequence pattern in corresponding sequence.Step S203 can excavate part by part described above
120 (i.e. second stage MapReduce) Lai Jinhang.
The sequence in sequence database can be assigned to before executing step S203 according to the disclosure example
In multiple tasks (task).The quantity of task can be expressed as m, and wherein m is positive integer.For example, m can be second stage
The multiple of the quantity of Mapper in MapReduce.In following example, it is equal in second stage MapReduce with m
The disclosure is described for the quantity of Mapper.
In this example, the sequence in sequence database can be divided by multiple subregions according to load-balancing algorithm.Example
Such as, the sequence in sequence database can be assigned in multiple tasks according to load-balancing algorithm.Specifically, for sequence number
According to a sequence in library, the quantity (Num) for the first category that the sequence includes can be determined.Then, it is selected from multiple tasks
The task p with minimum workload is selected, and the sequence is assigned to task p, while the first kind for including according to the sequence
Quantity update the workload of task p.For example, the workload of p-th of task can be expressed as WLp, when one
After sequence is assigned to the task, the workload of the task is by WLpIt is updated to (WLp+Num)。
In addition, in this example, the workload of each task can be initialized as 0 by algorithm.Therefore, the of algorithm
In an iteration, since the workload of each task is 0, for a sequence in sequence database, Ke Yicong
A task is randomly choosed in multiple tasks, and the sequence is assigned to the task.For example, can be selected from multiple tasks
1 task, and the sequence is assigned to the 1st task.
In addition, " task " as described herein is referred to as assignment file (task file).Hereinafter, it can replace
Ground uses task and assignment file.
Cause the workload between node unbalanced by above-mentioned load-balancing algorithm, when can be to avoid partition database
And mining algorithm is influenced, so that the workload between each node is balanced, to effectively improve the speed excavated and calculated.
Step S203 may include three sub-steps S2031~S2033.It, can be according to identified in step S2031
First category excavates local sequence pattern from each subregion of sequence database.It then, can be in step S2032
At least one candidate global effective sequence pattern is determined with sequence pattern according to the local excavated.Then, in step
In rapid S2033, first set can be determined.Step S2033 can also be performed simultaneously with step S2033.
In the disclosure, local sequence pattern can be excavated from each task according to identified first category.
A part of sequence pattern in these local sequence patterns may be global effective sequence pattern, another part sequence
Mode may not be global effective sequence pattern.It can be using another part sequence pattern as candidate global effective sequence
Column mode.
It will be described below in step S2031 and excavate local sequence pattern from each subregion of sequence database
Process.Specifically, the item that the first category is belonged in each sequence for including for each subregion, calculates this in each sequence
Value of utility and surplus utility value in column construct effectiveness list (utility list) of this in each sequence, according to this
Effectiveness list of the item in each sequence determines the value of utility chain of this;According to each in subregion value of utility chain
(utility chain) excavates local sequence pattern from the subregion.
In the disclosure, the surplus utility value of item in one sequence can be all items in the sequence, after this
The sum of value of utility.In addition, the identification information that the effectiveness list of item in one sequence may include sequence (is represented by
Sid), the identification information (being represented by tid) of each item collection where item, the effectiveness of this in each item collection in the sequence
Value (being represented by acu) and surplus utility value (being represented by ru) and the instruction letter that next item collection is directed toward from an item collection
It ceases (for example, pointer) (being represented by next).In addition, the value of utility chain of item may include effectiveness list of the item in each sequence.
An example of the effectiveness list of item in one sequence is given below.Assuming that a subregion includes shown in table 1
Sequence s1~s5, item a belongs to the first category, then for sequence s1, can determine that the identification information of sequence is 1.In addition, item a occurs
In sequence s1The 1st item collection, accordingly, it is determined that the item a in the 1st item collection is in sequence s1In value of utility and surplus utility value, point
It Wei 10 and 84.Since item a also appears in sequence s1The 2nd item collection, accordingly, it is determined that the item a in the 2nd item collection is in sequence s1
In value of utility and surplus utility value, respectively 15 and 57.Since item a also appears in sequence s1The 3rd item collection, therefore, really
Item a in fixed 3rd item collection is in sequence s1In value of utility and surplus utility value, respectively 20 and 26.Therefore, item can be constructed
A is in sequence s1In effectiveness list.Fig. 3 shows an a in sequence s1In effectiveness list schematic diagram.As shown in figure 3, first
The 1st " 1 " expression sequence s in group data (1,1,10,84)1, the 2nd " 1 " expression sequence s1The 1st item collection, " 10 " indicate
Item a in 1st item collection is in sequence s1In value of utility, " 84 " indicate the 1st item collection in item a in sequence s1In remaining effect
With value." 1 " in second group of data (1,2,15,57) indicates sequence s1, " 2 " indicate sequence s1The 2nd item collection, " 15 " indicate
Item a in 2nd item collection is in sequence s1In value of utility, " 57 " indicate the 2nd item collection in item a in sequence s1In remaining effect
With value." 1 " in third group data (1,3,20,26) indicates sequence s1, " 3 " indicate sequence s1The 3rd item collection, " 20 " indicate
Item a in 3rd item collection is in sequence s1In value of utility, " 26 " indicate the 3rd item collection in item a in sequence s1In remaining effect
With value.Black arrow in Fig. 3 indicates the pointer that next item collection is directed toward from an item collection.
An example of the value of utility chain of item is given below.In the above example, similarly, an a can be determined in sequence
Arrange s2~s5In effectiveness list.It is then possible to according to item a in sequence s1~s5In effectiveness list determine the value of utility of item a
Chain.Fig. 4 shows the schematic diagram of the value of utility chain of an a.As shown in figure 4, the value of utility chain of item a includes item a in sequence s1In
Effectiveness list, item a are in sequence s2In effectiveness list, item a is in sequence s3In effectiveness list, item a is in sequence s4In effectiveness column
Table and item a are in sequence s5In effectiveness list.
Similarly, each value of utility for belonging to the first category in each sequence that each subregion includes can be determined
Chain.It is then possible to excavate local sequence pattern from the subregion according to each in subregion value of utility chain.For example,
Can using in the subregion each item and each value of utility chain as traditional effective sequence pattern algorithm (for example,
HUS-Span algorithm) input, and one or more local sequence moulds corresponding with the subregion are exported by the algorithm
Formula.In addition, value of utility and sequence of each local sequence pattern in corresponding sequence can also be exported by the algorithm
The identification information of column.The output of the algorithm can be expressed as to key-value pair (pattern, { sid, utility }), wherein
Pattern indicates that local sequence pattern, sid indicate the mark of the sequence comprising local sequence pattern,
Utility indicates value of utility of the local sequence pattern in corresponding sequence.
The above-mentioned operation about step S2031 can be carried out by the Mapper in second stage MapReduce.For example,
Multiple subregions of sequence database can be handled by multiple Mapper in second stage MapReduce respectively, thus each
Mapper can excavate local sequence pattern from corresponding subregion.In this case, calculation described above
Method exports the output that can be Mapper.That is, for a subregion of sequence database, it is corresponding with the subregion
The output of Mapper is one or more key-value pairs (pattern, { sid, utility }), wherein one or more pattern
It is the one or more local sequence patterns excavated from the subregion.
After step S2031, in step S2032, it can be determined according to the local excavated sequence pattern
At least one candidate global effective sequence pattern.For example, can by the output of multiple Mapper, the identical key assignments of key assignments
To in a Reducer being input in second stage MapReduce.That is, by the output of multiple Mapper and same
The corresponding multiple key-value pairs of one pattern, for example, multiple key-value pairs corresponding with pattern x (pattern x, sid,
Utility }), it is input in a Reducer.The Reducer can determine multiple value of utilities corresponding with pattern x
Adduction, and according to the adduction and first threshold, to determine whether pattern x is global effective sequence pattern.If should add
Be greater than or equal to first threshold, it is determined that pattern x is global effective sequence pattern.If the adduction is less than the first threshold
Value, it is determined that the not global effective sequence pattern of pattern x, candidate global effective sequence pattern.
In addition, each Reducer can be by the one or more new key-value pairs of output, each new key-value pair can be by one
The global effective mode of a candidate, the mark of sequence corresponding with the effective sequence pattern of the candidate and the candidate
The value of utility of effective sequence pattern in the sequence is constituted.For example, the new key-value pair can be expressed as (sid, (pattern,
Utility)), that is, the form of the key-value pair of Mapper output is had changed.
Can according to multiple Reducer in second stage MapReduce output come determine in step S2032 " extremely
A few candidate global effective sequence pattern ".For example, can be according to multiple Reducer in second stage MapReduce
Sequence pattern in the key-value pair of output determines " at least one candidate global effective sequence mould in step S2032
Formula ".For example, the output of multiple Reducer can be (s1, (pattern 1, utility 1)), (s2, (pattern 1,
utility 1))、(s3, (pattern 2, utility 2)), (s3, (pattern 1, utility 1)), (s4, (pattern
2, utility 2)), then " at least one the candidate global effective sequence pattern " in step S2032 can be pattern
1 and pattern 2.
In addition, in step S2033, can determine first set after step S2032.For example, can be according to second
The output of multiple Reducer in stage MapReduce, determines first set.First set may include it is described at least one
The mark of the sequence of candidate global effective sequence pattern, the global effective sequence pattern including each candidate and each
Value of utility of the candidate global effective sequence pattern in corresponding sequence.For example, first set may include multiple subclass,
Each subclass includes that the mark of sequence, the sequence candidate global effective sequence pattern and the sequence for including are wrapped
The value of utility of the candidate global effective sequence pattern included in the sequence.For example, multiple in second stage MapReduce
The output of Reducer can be (s1, (pattern 1, utility 1)), (s2, (pattern 1, utility 1)), (s3,
(pattern 2, utility 2)), (s3, (pattern 1, utility 1)), (s4, (pattern 2, utility 2)),
Then first set may include four subclass, wherein the 1st subclass is (s1, (pattern 1, utility 1)), the 2nd
A subclass is (s2, (pattern 1, utility 1)), the 3rd subclass is (S3, (pattern 2, utility 2),
(pattern 1, utility 1)), the 4th subclass is (s4, (pattern 2, utility 2).
It is appreciated that candidate global effective sequence pattern can be accelerated by this data structure of first set
The calculating of value of utility.Specifically, if sequence includes a candidate global effective sequence pattern, then the overall situation of the candidate
The value of utility of effective sequence pattern can be obtained directly from first set, without calculating its value of utility again, because
Computing repeatedly can take a lot of time.
In the above example, corresponding composite module is not configured for the Mapper in second stage MapReduce.So
And the present disclosure is not limited thereto.For example, it is also possible to configure corresponding composite module for the Mapper in second stage MapReduce.
Return to Fig. 2, in step S204, according to the value of utility chained list of each sequence and the first set, from it is described to
A few candidate global effective sequential mode mining overall situation effective sequence pattern.Step S204 can be by describe above
Integrated part 130 (i.e. phase III MapReduce) Lai Jinhang.
Step S204 is specifically described below in conjunction with Fig. 5.Fig. 5 is according to the candidate from least one of the embodiment of the present disclosure
Global effective sequential mode mining overall situation effective sequence pattern method 500 flow chart.As shown in figure 5, in step
In S501, can value of utility chained list according to each sequence and the first set, determine the global effective sequence of each candidate
The local value of utility of column mode.
It specifically, can be using at least one candidate global effective sequence pattern and first set as the phase III
The input of multiple Mapper in MapReduce.For example, at least one candidate global effective sequence pattern can be divided
For multiple groups, then multiple groups are inputted to multiple Mapper respectively.Furthermore, it is possible to which first set is inputted each Mapper.
Then, each Mapper can determine each of the global effective sequence pattern of corresponding multiple candidates
The value of utility of candidate global effective sequence pattern.For example, high for multiple candidate overall situations corresponding with a Mapper
A candidate global effective sequence pattern in effectiveness sequence pattern, can judge whether first set wraps by Mapper
Include the global effective sequence pattern of the candidate.It, can be with when first set includes the global effective sequence pattern of the candidate
The value of utility of the global effective sequence pattern of the candidate is determined according to first set.In addition, when first set does not include the time
When the global effective sequence pattern of choosing, the global effective sequence of the candidate can be determined according to the value of utility chained list of sequence
The value of utility of mode.
This is because, inquiry can be passed through when having calculated the value of utility of candidate global effective sequence pattern
The sidset of the sequence of global effective sequence pattern including the candidate and the global effective sequence for directly obtaining the candidate
The value of utility of mode.However, needing to check that it is when not calculating the value of utility of candidate global effective sequence pattern
It is no to occur in particular sequence.If there is this situation, need to calculate candidate global effective sequence according to the particular sequence
The value of utility of mode.It should be noted that the calculating of the operation is time-consuming, since it is desired that scanning the particular sequence, and wait
There may be multiple matchings in the particular sequence for the global effective sequence pattern of choosing.Therefore, it is necessary to Multiple-Scan, this is specific
Sequence, to find value of utility of the maximum matching as candidate global effective sequence pattern in the particular sequence.Therefore, it
Complete mining task, it is necessary to take multiple scan to entire sequence database.The disclosure propose sequence value of utility chained list, be
A compact data structure is suitable for processing big data problem.
It describes to determine candidate global effective sequence according to the value of utility chained list of sequence below in conjunction with specific example
The example of the value of utility of column mode.For example, can the sequence s according to shown by above-mentioned table 21Value of utility chained list determine candidate
Global effective sequence pattern<[a, c], b>value of utility.Specifically, since item a and item c is in same item collection, it can
To find the position that all a, c occur according to the position of the appearance of item c, i.e. first position (1,2) and value of utility are 22, and
Second position (3,5) and value of utility are 23.For first position (1,2) that item a, c meet, it can find what b met
All positions, i.e., 4 and 7, then it can be 22+3=25 and 22+15=37 with the value of utility of computational item a, c, b altogether.For
Second position (3,5) that a, c meet, can find all positions that a b meets, i.e., and 7, then it can be closed with computational item a, c, b
The value of utility to get up is 23+15=38.Therefore, sequence pattern<[a, c], b>value of utility be max { 25,37,38 }=38.
In the disclosure, each Mapper in phase III MapReduce can export one or more new key assignments
It is right, wherein each new key-value pair can be made of a candidate global effective sequence pattern and its value of utility.For example, should
New key-value pair can be expressed as (pattern, utility).
In addition, the same Mapper may export multiple keys corresponding with the global effective sequence pattern of the same candidate
Value is to (pattern, utility).For example, for candidate global effective sequence pattern pattern y, the same Mapper
Two key-value pairs may be exported, respectively (pattern y, utility 1) and (pattern y, utility 2).The two
Key-value pair can also be expressed as (pattern y, Gu), wherein GuIt is the set for including utility 1 and utility 2.
In addition, in this case, a composite module can be configured to each Mapper and (such as is properly termed as
Combiner), with the local value of utility of the global effective sequence pattern of the same candidate of determination.Specifically, the same candidate
The local value of utility of global effective sequence pattern can be according to corresponding with the global effective sequence pattern of the candidate
What the value of utility in multiple key-value pairs determined.For example, the local value of utility of the global effective sequence pattern of the same candidate can
To be the adduction of value of utility in multiple key-value pairs corresponding with the global effective sequence pattern of the candidate.For example, for waiting
The global effective sequence pattern pattern y of choosing, the same Mapper may export two key-value pairs, respectively (pattern
Y, utility 1) and (pattern y, utility 2), then for candidate global effective sequence pattern pattern y's
Local value of utility local~utility is (utility 1+utility 2).
In the disclosure, composite module can also export one or more new key-value pairs, wherein each new key-value pair
It can be made of a candidate global effective sequence pattern and its local value of utility.For example, the new key-value pair can be with table
It is shown as (pattern, local-utility).It, should in the example that candidate global effective sequence pattern is pattern y
Integrated mode can export key-value pair (pattern y, utility 1+utility 2).
Returning to Fig. 5 can be according to the local effectiveness of the global effective sequence pattern of each candidate in step S502
Value, determines the global value of utility of the global effective sequence pattern of each candidate.For example, for the global effective of each candidate
Sequence pattern, can be according to multiple local value of utilities of the global effective sequence pattern of the candidate, to determine that the candidate's is complete
The global value of utility of office's effective sequence pattern.For example, can be by multiple parts of the global effective sequence pattern of the candidate
The adduction of value of utility, the global value of utility of the global effective sequence pattern as the candidate.
Specifically, can by the output of multiple composite modules, the identical key-value pair of key assignments be input to the phase III
In a Reducer in MapReduce.That is, by the overall situation in the output of multiple composite modules, with the same candidate
The corresponding multiple key-value pairs of effective sequence pattern, for example, corresponding with candidate global effective sequence pattern pattern y
Multiple key-value pairs are input in a Reducer.The Reducer can adding the local value of utility in this multiple key-value pair
With the global value of utility (global-utility) of the global effective sequence pattern as candidate.
Then, in step S503, the sequence pattern that global value of utility is greater than first threshold can be determined as to global height
Effectiveness sequence pattern.For example, global value of utility can be greater than or equal to by each Reducer in phase III MapReduce
The sequence pattern of first threshold is determined as global effective sequence pattern.Each Reducer can export one or more new
Key-value pair, wherein each new key-value pair can be by a global effective sequence pattern and the overall situation effective sequence pattern
Global value of utility is constituted.For example, some Reducer can export key assignments when pattern y is global effective sequence pattern
To (pattern y, global-utility).Therefore, the key assignments of each Reducer output in phase III MapReduce
The sequence pattern of centering is global effective sequence pattern.
The method of the global effective sequence pattern of the excavation provided through this embodiment, it is determined that each in sequence database
The value of utility chained list and first set of sequence, and global effective sequence pattern is excavated according to both data structures, it saves
Plenty of time accelerates the calculating process for calculating global value of utility in sequence database, accelerates excavation speed, reduce
Time complexity.
Hereinafter, describing the device corresponding with method shown in Fig. 2 according to the embodiment of the present disclosure referring to Fig. 6.Fig. 6 is shown
According to the structural schematic diagram of the device 600 for excavating global effective sequence pattern of the embodiment of the present disclosure.Due to device
600 function is identical as the details of method described above with reference to Fig. 2, therefore herein for simplicity, omission pair
The detailed description of identical content.As shown in fig. 6, device 600 includes: the first determination unit 610, it is configured to determine that sequence data
The first category in library, wherein the first category is the item that global sequence's weight value of utility is higher than first threshold;Second determination unit
620, it is configured to determine that the value of utility chained list of each sequence in sequence database;First excavates unit 630, is configured as basis
Identified first category excavates at least one candidate global effective sequence pattern from sequence database and determines the first collection
It closes, wherein the first set includes at least one described candidate global effective sequence pattern, including the complete of each candidate
The effect of the mark of the sequence of office's effective sequence pattern and the global effective sequence pattern of each candidate in corresponding sequence
With value;And second excavate unit 640, the value of utility chained list and the first set according to each sequence are configured as, from institute
It states and excavates global effective sequence pattern at least one candidate global effective sequence pattern.In addition to this four units with
Outside, device 600 can also include other component, however, since these components are unrelated with the content of the embodiment of the present disclosure,
Here its diagram and description are omitted.
First determination unit 610 can determine each in sequence database global sequence's weight value of utility, and will be complete
The item that office's sequence weights value of utility is higher than first threshold is determined as the first category.First determination unit 610 can be is retouched above
The identification division 110 (i.e. first stage MapReduce) stated.
The first determination unit 610 is described below and determines each global sequence's weight value of utility in sequence database
Process.According to the disclosure example, for each item in sequence database, the first determination unit 610 can first really
Local sequence weights value of utility (Local Sequence Weight of this fixed in each subregion of sequence database
Utility, LSWU), global sequence's weight value of utility of this is then determined according to identified local sequence weights value of utility.
For example, firstly, sequence database can be divided into multiple subregions by the first determination unit 610, and by multiple point
Distinguish multiple Mapper in dispensing first stage MapReduce.For example, sequence database can be divided into n subregion, and
And the 1st subregion is distributed into the Mapper 1 in first stage MapReduce ..., k-th of subregion is distributed to first
Mapper k in stage MapReduce ..., n-th of subregion is distributed to the Mapper in first stage MapReduce
N, wherein 1≤k≤n and be positive integer.
Then, for each sequence in k-th of subregion, Mapper k can determine the value of utility of the sequence.For example,
Mapper k can determine the value of utility of the sequence according to the method for the value of utility of traditional sequence of calculation.For example, sequence
Value of utility can for form the sequence each item collection value of utility in the sequence adduction.In the disclosure, sequence sl's
Value of utility can be expressed as u (sl)。
Then, for each item in the sequence, key-value pair is can be generated in Mapper k, and the key-value pair can be by this
Item and the value of utility of the sequence are constituted.For example, for sequence slIn item i, Mapper k key-value pair (i, u (s can be generatedl))。
It can be seen that the content of the sequence identifier of sequence and the sequence can be used as a key-value pair input Mapper k, then,
The one or more new key-value pairs of Mapper k output.
Further, since the different sequences in each subregion may include the same item, therefore, in these different sequences
The same item, multiple key-value pairs can be generated in Mapper.In this case, a combination can be configured to each Mapper
Module (such as being properly termed as combiner), with the same item of determination each subregion local sequence weights value of utility.Specifically
Ground, this can be in the local sequence weights value of utility of each subregion of sequence database according to including this in the subregion
What the value of utility of sequence determined.For example, this can be in the local sequence weights value of utility of each subregion of sequence database
It include the sum of the value of utility of sequence of this in the subregion.
It can be seen that the key-value pair of Mapper k output can be used as the defeated of composite module corresponding with the Mapper k
Enter, then the composite module generates new key-value pair.The new key-value pair can by item and this k-th of subregion local sequence
Column weight value of utility is constituted.For example, for item i, composite module corresponding with Mapper k can be generated key-value pair (i,
LSWUi-k).In the example that item i is item a, composite module corresponding with Mapper k can export key-value pair (a, LSWUa-k)。
By mode above, first determination unit 610 can determine each item in each subregion of sequence database
Local sequence weights value of utility.Determined each item each subregion of sequence database local sequence weights value of utility it
Afterwards, the first determination unit 610 can determine that global sequence's weight of this is imitated according to identified local sequence weights value of utility
With value.For example, can by each item sequence database the sum of the local sequence weights value of utility of each subregion, as this
Global sequence's weight value of utility.
Specifically, can by the output of multiple composite modules, the identical key-value pair of key assignments be input to the first stage
In a Reducer in MapReduce.That is, by the output of multiple composite modules, it is corresponding with the same item more
A key-value pair, for example, multiple key-value pair (i, LSWU corresponding with item ii-k), it is input in a Reducer.The Reducer can
With global sequence's weight value of utility by the adduction of the local sequence weights value of utility in this multiple key-value pair, as item i
GSWUi。
So far, it has been described that determine the process of each in sequence database global sequence's weight value of utility.True
It is each in first stage MapReduce after having determined each in sequence database global sequence's weight value of utility
The item that global sequence's weight value of utility is greater than or equal to first threshold can be determined as the first category by Reducer.Each
Reducer can export one or more new key-value pairs, wherein each new key-value pair can be by first category and this
Global sequence's weight value of utility of first category is constituted.For example, some Reducer can be with run-out key when item i is the first category
Value is to (i, GSWUi)。
According to the disclosure example, for a sequence in sequence database, the second determination unit 620 can root
According to the position of each in the sequence value of utility and each item in the sequence, the value of utility chained list of the sequence is determined.
Value of utility can be the inside value of utility of item and the product of external value of utility.The position of each item in the sequence may include each
The initial position of item and adjacent position, the initial position of middle term can be the position that item occurs for the first time in the sequence, adjacent
Position can be the position that item occurs next time in the sequence.In addition, the value of utility chained list of a sequence may include two rows,
Middle the first row, which can be, (can be referred to as Utility Position about each value of utility and the information of adjacent position
Information, UP information), the second row can be the information of the initial position about the non-duplicate item in sequence
(Header Table can be referred to as).Second row may include the initial position of non-duplicate item and each non-duplicate item.
It is appreciated that the value of utility chained list of sequence is by the way that the sequence in raw data base is converted and extended and shape
At, it has recorded the information about raw data base and needs public information calculated.By the value of utility chained list of sequence,
The calculating speed of sequence pattern can be improved.This is because, target sequence mode may have multiple occurrences in single affairs,
Therefore, the value of utility for calculating sequence pattern in affairs requires to look up all occurrences, then takes maximum utility value.Value of utility chained list
The next position for having recorded affairs middle term, therefore, there is no need to Multiple-Scan affairs, as long as and next position of continuous search terms
Set the maximum utility value that can calculate sequence pattern in affairs.
In the disclosure, the first excavation unit 630 can be part described above and excavate 120 (i.e. second-order of part
Section MapReduce).
According to the disclosure example, device 600 can also include load allocation unit (not shown), be configured
For the sequence in sequence database is assigned in multiple tasks (task).The quantity of task can be expressed as m, and wherein m is positive
Integer.For example, m can be the multiple of the quantity of the Mapper in second stage MapReduce.In following example, with m etc.
The disclosure is described for the quantity of Mapper in second stage MapReduce.
In this example, the sequence in sequence database can be divided by load allocation unit according to load-balancing algorithm
Multiple subregions.For example, the sequence in sequence database can be assigned in multiple tasks according to load-balancing algorithm.Specifically
Ground can determine the quantity (Num) for the first category that the sequence includes for a sequence in sequence database.Then, from
Selection has the task p of minimum workload in multiple tasks, and the sequence is assigned to task p, while according to the sequence
Including the quantity of the first category update the workload of task p.For example, the workload of p-th of task can indicate
For WLp, after a sequence is assigned to the task, the workload of the task is by WLpIt is updated to (WLp+Num)。
In addition, in this example, the workload of each task can be initialized as 0 by algorithm.Therefore, the of algorithm
In an iteration, since the workload of each task is 0, for a sequence in sequence database, Ke Yicong
A task is randomly choosed in multiple tasks, and the sequence is assigned to the task.For example, can be selected from multiple tasks
1 task, and the sequence is assigned to the 1st task.
In addition, " task " as described herein is referred to as assignment file (task file).Hereinafter, it can replace
Ground uses task and assignment file.
In the disclosure, the first excavation unit 630 can be according to identified first category, from each of sequence database
Subregion excavates local sequence pattern.Then, the first excavation unit 630 can be according to the local sequence excavated
Mode determines at least one candidate global effective sequence pattern.Then, the first excavation unit 630 can determine the first collection
It closes.
In the disclosure, local sequence pattern can be excavated from each task according to identified first category.
A part of sequence pattern in these local sequence patterns may be global effective sequence pattern, another part sequence
Mode may not be global effective sequence pattern.It can be using another part sequence pattern as candidate global effective sequence
Column mode.
The first excavation unit 630 is described below and excavates local sequence pattern from each subregion of sequence database
Process.Specifically, the item that the first category is belonged in each sequence for including for each subregion, calculates this each
Value of utility and surplus utility value in sequence construct effectiveness list (utility list) of this in each sequence, according to
Effectiveness list of this in each sequence determines the value of utility chain of this;According to each in subregion value of utility chain
(utility chain) excavates local sequence pattern from the subregion.
In the disclosure, the surplus utility value of item in one sequence can be all items in the sequence, after this
The sum of value of utility.In addition, the identification information that the effectiveness list of item in one sequence may include sequence (is represented by
Sid), the identification information (being represented by tid) of each item collection where item, the effectiveness of this in each item collection in the sequence
Value (being represented by acu) and surplus utility value (being represented by ru) and the instruction letter that next item collection is directed toward from an item collection
It ceases (for example, pointer) (being represented by next).In addition, the value of utility chain of item may include effectiveness list of the item in each sequence.
Similarly, each value of utility for belonging to the first category in each sequence that each subregion includes can be determined
Chain.It is then possible to excavate local sequence pattern from the subregion according to each in subregion value of utility chain.For example,
Can using in the subregion each item and each value of utility chain as the input of traditional effective sequence pattern algorithm, and
One or more local sequence patterns corresponding with the subregion are exported by the algorithm.In addition, may be used also by the algorithm
To export the identification information of each value of utility of the local sequence pattern in corresponding sequence and sequence.It can be by the calculation
The output of method is expressed as key-value pair (pattern, { sid, utility }), and wherein pattern indicates local sequence mould
Formula, sid indicate the mark of the sequence comprising local sequence pattern, and utility indicates that local sequence pattern exists
Value of utility in corresponding sequence.
Aforesaid operations can be carried out by the Mapper in second stage MapReduce.For example, sequence database is multiple
Subregion can be handled by multiple Mapper in second stage MapReduce respectively, so that each Mapper can be from right with it
The subregion answered excavates local sequence pattern.In this case, algorithm described above output can be Mapper
Output.That is, the output of Mapper corresponding with the subregion is one or more for a subregion of sequence database
A key-value pair (pattern, { sid, utility }), wherein one or more pattern be from the subregion excavate one or
Multiple local sequence patterns.
Then, the first excavation unit 630 can determine at least one time with sequence pattern according to the local excavated
The global effective sequence pattern of choosing.For example, can by the output of multiple Mapper, the identical key-value pair of key assignments be input to
In a Reducer in two-stage MapReduce.That is, by the output of multiple Mapper, with it is same
The corresponding multiple key-value pairs of pattern, for example, multiple key-value pairs corresponding with pattern x (pattern x, sid,
Utility }), it is input in a Reducer.The Reducer can determine multiple value of utilities corresponding with pattern x
Adduction, and according to the adduction and first threshold, to determine whether pattern x is global effective sequence pattern.If should add
Be greater than or equal to first threshold, it is determined that pattern x is global effective sequence pattern.If the adduction is less than the first threshold
Value, it is determined that the not global effective sequence pattern of pattern x, candidate global effective sequence pattern.
In addition, each Reducer can be by the one or more new key-value pairs of output, each new key-value pair can be by one
The global effective mode of a candidate, the mark of sequence corresponding with the effective sequence pattern of the candidate and the candidate
The value of utility of effective sequence pattern in the sequence is constituted.For example, the new key-value pair can be expressed as (sid, (pattern,
Utility)), that is, the form of the key-value pair of Mapper output is had changed.
It can determine that " at least one candidate's is complete according to the output of multiple Reducer in second stage MapReduce
Office's effective sequence pattern ".For example, can be according in the key-value pair that multiple Reducer in second stage MapReduce are exported
Sequence pattern determine " at least one candidate global effective sequence pattern " in step S2032.For example, multiple
The output of Reducer can be (s1, (pattern 1, utility 1)), (s2, (pattern 1, utility 1)), (s3,
(pattern 2, utility 2)), (S3, (pattern 1, utility 1)), (s4, (pattern 2, utility 2)),
Then " at least one the candidate global effective sequence pattern " in step S2032 can be pattern 1 and pattern 2.
In addition, the first excavation unit 630 can determine first set.For example, can be according in second stage MapReduce
Multiple Reducer output, determine first set.First set may include at least one described candidate global effective
The mark of the sequence of sequence pattern, global effective sequence pattern including each candidate and the global effective of each candidate
Value of utility of the sequence pattern in corresponding sequence.For example, first set may include multiple subclass, each subclass includes sequence
The overall situation of candidate included by the candidate global effective sequence pattern and the sequence that the mark of column, the sequence include is high
The value of utility of effectiveness sequence pattern in the sequence.For example, the output of multiple Reducer in second stage MapReduce can
Think (s1, (pattern 1, utility 1)), (s2, (pattern 1, utility 1)), (s3, (pattern 2,
utility 2))、(s3, (pattern 1, utility 1)), (s4, (pattern 2, utility2)), then first set can
To include four subclass, wherein the 1st subclass is (s1, (pattern 1, utility 1)), the 2nd subclass is
(s2, (pattern 1, utility 1)), the 3rd subclass is (s3, (pattern 2, utility 2), (pattern 1,
Utility 1)), the 4th subclass is (s4, (pattern 2, utility 2).
In the above example, corresponding composite module is not configured for the Mapper in second stage MapReduce.So
And the present disclosure is not limited thereto.For example, it is also possible to configure corresponding composite module for the Mapper in second stage MapReduce.
In addition, in the disclosure, the second excavation unit 640 can be 130 (i.e. third rank of integrated part described above
Section MapReduce).
Second excavate unit 640 can value of utility chained list according to each sequence and the first set, determine each time
The local value of utility of the global effective sequence pattern of choosing.
It specifically, can be using at least one candidate global effective sequence pattern and first set as the phase III
The input of multiple Mapper in MapReduce.For example, at least one candidate global effective sequence pattern can be divided
For multiple groups, then multiple groups are inputted to multiple Mapper respectively.Furthermore, it is possible to which first set is inputted each Mapper.
Then, each Mapper can determine each of the global effective sequence pattern of corresponding multiple candidates
The value of utility of candidate global effective sequence pattern.For example, high for multiple candidate overall situations corresponding with a Mapper
A candidate global effective sequence pattern in effectiveness sequence pattern, can judge whether first set wraps by Mapper
Include the global effective sequence pattern of the candidate.It, can be with when first set includes the global effective sequence pattern of the candidate
The value of utility of the global effective sequence pattern of the candidate is determined according to first set.In addition, when first set does not include the time
When the global effective sequence pattern of choosing, the global effective sequence of the candidate can be determined according to the value of utility chained list of sequence
The value of utility of mode.
In the disclosure, each Mapper in phase III MapReduce can export one or more new key assignments
It is right, wherein each new key-value pair can be made of a candidate global effective sequence pattern and its value of utility.For example, should
New key-value pair can be expressed as (pattern, utility).
In addition, the same Mapper may export multiple keys corresponding with the global effective sequence pattern of the same candidate
Value is to (pattern, utility).For example, for candidate global effective sequence pattern pattern y, the same Mapper
Two key-value pairs may be exported, respectively (pattern y, utility 1) and (pattern y, utility 2).The two
Key-value pair can also be expressed as (pattern y, Gu), wherein GuIt is the set for including utility 1 and utility 2.
In addition, in this case, a composite module can be configured to each Mapper and (such as is properly termed as
Combiner), with the local value of utility of the global effective sequence pattern of the same candidate of determination.Specifically, the same candidate
The local value of utility of global effective sequence pattern can be according to corresponding with the global effective sequence pattern of the candidate
What the value of utility in multiple key-value pairs determined.For example, the local value of utility of the global effective sequence pattern of the same candidate can
To be the adduction of value of utility in multiple key-value pairs corresponding with the global effective sequence pattern of the candidate.For example, for waiting
The global effective sequence pattern pattern y of choosing, the same Mapper may export two key-value pairs, respectively
(patterny, utility 1) and (pattern y, utility 2), then for candidate global effective sequence pattern
The local value of utility local-utility of pattern y is (utility 1+utility 2).
In the disclosure, composite module can also export one or more new key-value pairs, wherein each new key-value pair
It can be made of a candidate global effective sequence pattern and its local value of utility.For example, the new key-value pair can be with table
It is shown as (pattern, local-utility).It, should in the example that candidate global effective sequence pattern is pattern y
Integrated mode can export key-value pair (pattern y, utility 1+utility 2).
Then, second excavate unit 640 can according to the local value of utility of the global effective sequence pattern of each candidate,
Determine the global value of utility of the global effective sequence pattern of each candidate.For example, for the global effective sequence of each candidate
Column mode, can be according to multiple local value of utilities of the global effective sequence pattern of the candidate, to determine the overall situation of the candidate
The global value of utility of effective sequence pattern.For example, multiple parts of the global effective sequence pattern of the candidate can be imitated
With the adduction of value, the global value of utility of the global effective sequence pattern as the candidate.
Specifically, can by the output of multiple composite modules, the identical key-value pair of key assignments be input to the phase III
In a Reducer in MapReduce.That is, by the overall situation in the output of multiple composite modules, with the same candidate
The corresponding multiple key-value pairs of effective sequence pattern, for example, corresponding with candidate global effective sequence pattern pattern y
Multiple key-value pairs are input in a Reducer.The Reducer can adding the local value of utility in this multiple key-value pair
With the global value of utility (global-utility) of the global effective sequence pattern as candidate.
Then, the sequence pattern that global value of utility is greater than first threshold can be determined as the overall situation by the second excavation unit 640
Effective sequence pattern.For example, global value of utility can be higher than or be waited by each Reducer in phase III MapReduce
It is determined as global effective sequence pattern in the sequence pattern of first threshold.Each Reducer can export one or more new
Key-value pair, wherein each new key-value pair can be by a global effective sequence pattern and the overall situation effective sequence pattern
Global value of utility constitute.For example, some Reducer can be with run-out key when pattern y is global effective sequence pattern
Value is to (pattern y, global-utility).Therefore, the key of each Reducer output in phase III MapReduce
The sequence pattern of value centering is global effective sequence pattern.
The device of the global effective sequence pattern of the excavation provided through this embodiment, it is determined that each in sequence database
The value of utility chained list and first set of sequence, and global effective sequence pattern is excavated according to both data structures, it saves
Plenty of time accelerates the calculating process for calculating global value of utility in sequence database, accelerates excavation speed, reduce
Time complexity.
In addition, can also be realized by means of the framework shown in Fig. 7 for calculating equipment according to the device of the embodiment of the present disclosure.
Fig. 7 shows the framework of the calculating equipment.As shown in fig. 7, calculating equipment 700 may include bus 710, one or more CPU
720, read-only memory (ROM) 730, random access memory (RAM) 740, the communication port 750 for being connected to network, input/defeated
Component 760, hard disk 770 etc. out.The storage equipment in equipment 700 is calculated, such as ROM 730 or hard disk 770 can store calculating
Program instruction performed by the various data or file and CPU that machine processing and/or communication use.Calculating equipment 700 can be with
Including user interface 780.Certainly, framework shown in Fig. 7 is only exemplary, when realizing different equipment, according to practical need
It wants, it is convenient to omit one or more components in calculating equipment shown in Fig. 7.
Embodiment of the disclosure also may be implemented as computer readable storage medium.According to the calculating of the embodiment of the present disclosure
Computer-readable instruction is stored on machine readable storage medium storing program for executing.It, can be with when the computer-readable instruction is run by processor
Execute the method according to the embodiment of the present disclosure referring to the figures above description.The computer readable storage medium includes but unlimited
In such as volatile memory and/or nonvolatile memory.The volatile memory for example may include that arbitrary access is deposited
Reservoir (RAM) and/or cache memory (cache) etc..The nonvolatile memory for example may include read-only storage
Device (ROM), hard disk, flash memory etc..
It will be appreciated by those skilled in the art that a variety of variations and modifications can occur in content disclosed by the disclosure.For example,
Various equipment described above or component can also pass through one in software, firmware or three by hardware realization
A little or whole combinations is realized.
In addition, as shown in the disclosure and claims, unless context clearly prompts exceptional situation, " one ", " one
It is a ", the words such as "an" and/or "the" not refer in particular to odd number, may also comprise plural number." first ", " second " used in the disclosure
And similar word is not offered as any sequence, quantity or importance, and be used only to distinguish different component parts.Together
The similar word such as sample, " comprising " or "comprising" means to occur after element or object before the word cover and appear in the word
The element that face is enumerated perhaps object and its equivalent and be not excluded for other elements or object." connection " or " connected " etc. are similar
Word be not limited to physics or mechanical connection, but may include electrical connection, either it is direct still
Indirectly.
In addition, flow chart has been used to be used to illustrate behaviour performed by system according to an embodiment of the present disclosure in the disclosure
Make.It should be understood that front or following operate not necessarily accurately carry out in sequence.On the contrary, can according to inverted order or
Various steps are handled simultaneously.It is also possible to during other operations are added to these, or from these processes remove a certain step
Or number step operation.
Unless otherwise defined, all terms (including technical and scientific term) used herein have leads with belonging to the present invention
The identical meanings that the those of ordinary skill in domain is commonly understood by.It is also understood that those of definition term such as in usual dictionary
The meaning consistent with their meanings in the context of the relevant technologies should be interpreted as having, without application idealization or
The meaning of extremely formalization explains, unless being clearly defined herein.
The disclosure is described in detail above, but it will be apparent to a person skilled in the art that the disclosure not limits
The fixed embodiment illustrated in this manual.The disclosure is not departing from the disclosure determined by the record of claims
Under the premise of objective and range, modifications and changes mode can be used as to implement.Therefore, the record of this specification is said with example
For the purpose of bright, for purposes of this disclosure not with the meaning of any restrictions.
Claims (15)
1. a kind of method for excavating global effective sequence pattern, comprising:
The first category in sequence database is determined, wherein the first category is that global sequence's weight value of utility is higher than first threshold
;
Determine the value of utility chained list of each sequence in the sequence database;
According to identified first category, at least one candidate global effective sequence pattern is excavated from the sequence database
And determine first set, wherein the first set include at least one described candidate global effective sequence pattern including
The global effective sequence pattern of the mark of the sequence of the global effective sequence pattern of each candidate and each candidate are in phase
Answer the value of utility in sequence;And
According to the value of utility chained list of each sequence and the first set, from least one described candidate global effective sequence
Global effective sequence pattern is excavated in mode.
2. the method as described in claim 1, wherein the first category in the determining sequence database includes:
Determine each in sequence database global sequence's weight value of utility;And
The item that global sequence's weight value of utility is higher than first threshold is determined as the first category.
3. method according to claim 2, wherein determining each in sequence database global sequence's weight value of utility packet
It includes:
Determine this in the local sequence weights value of utility of each subregion of sequence database;And
Global sequence's weight value of utility of this is determined according to identified local sequence weights value of utility.
4. method as claimed in claim 3, wherein local sequence weights of this in each subregion of the sequence database
Value of utility is to be determined in the subregion according to the value of utility for the sequence for including this.
5. such as the described in any item methods of Claims 1-4, wherein in the determining sequence database each sequence value of utility
Chained list includes:
According to the position of each value of utility and each item in the sequence in the sequence, the value of utility chain of the sequence is determined
Table.
6. such as the described in any item methods of Claims 1-4, wherein first category according to determined by, from the sequence
At least one candidate global effective sequence pattern of database mining includes:
According to identified first category, local sequence pattern is excavated from each subregion of the sequence database;With
And
At least one candidate global effective sequence pattern is determined with sequence pattern according to the local excavated.
7. method as claimed in claim 6, wherein according to identified first category, from each of described sequence database point
Local is excavated with sequence pattern in area
Belong to an item of the first category in each sequence for including for the subregion,
Value of utility and surplus utility value of this in each sequence are calculated, wherein the surplus utility value of this in one sequence
It is all the sum of value of utilities in the sequence, after this;
Construct effectiveness list of this in each sequence;
The value of utility chain of this is determined according to effectiveness list of this in each sequence;
According to each in subregion value of utility chain, local sequence pattern is excavated from the subregion.
8. such as the described in any item methods of Claims 1-4, wherein according to the value of utility chained list of each sequence and first collection
It closes, includes: from least one described candidate global effective sequential mode mining overall situation effective sequence pattern
According to the value of utility chained list of each sequence and the first set, the global effective sequence pattern of each candidate is determined
Local value of utility;
According to the local value of utility of the global effective sequence pattern of each candidate, the global effective sequence of each candidate is determined
The global value of utility of mode;And
The sequence pattern that global value of utility is greater than first threshold is determined as global effective sequence pattern.
9. method as claimed in claim 6, further includes:
The sequence in the sequence database is divided into multiple subregions according to load-balancing algorithm.
10. a kind of for excavating the device of global effective sequence pattern, comprising:
First determination unit, the first category being configured to determine that in sequence database, wherein the first category is global sequence's power
Weight value of utility is higher than the item of first threshold;
Second determination unit is configured to determine that the value of utility chained list of each sequence in the sequence database;
First excavates unit, is configured as excavating at least one time from the sequence database according to identified first category
The global effective sequence pattern of choosing simultaneously determines first set, wherein the first set includes the complete of at least one candidate
The mark of the sequence of office's effective sequence pattern, global effective sequence pattern including each candidate and each candidate's is complete
Value of utility of office's effective sequence pattern in corresponding sequence;And
Second excavates unit, the value of utility chained list and the first set according to each sequence is configured as, from described at least one
Global effective sequence pattern is excavated in the global effective sequence pattern of a candidate.
11. device as claimed in claim 10, wherein first determination unit is configured to determine that the sequence database
In each global sequence's weight value of utility;And the item that global sequence's weight value of utility is higher than first threshold is determined as the
One category.
12. device as described in claim 10 or 11, wherein second determination unit is configured as according in each sequence
The position of each value of utility and each item in the sequence determines the value of utility chained list of the sequence.
13. device as described in claim 10 or 11, wherein the second excavation unit is configured as according to each sequence
Value of utility chained list and the first set determine the local value of utility of the global effective sequence pattern of each candidate;According to each
The local value of utility of the global effective sequence pattern of a candidate determines the overall situation of the global effective sequence pattern of each candidate
Value of utility;And the sequence pattern that global value of utility is greater than first threshold is determined as global effective sequence pattern.
14. a kind of for excavating the device of global effective sequence pattern, comprising:
Processor;And
Memory, wherein computer executable program is stored in the memory, when by the processor execution calculating
When machine executable program, perform claim requires method described in any one of 1-9.
15. a kind of computer readable storage medium is stored thereon with instruction, described instruction is when being executed by processor, so that institute
It states processor and executes method as claimed in any one of claims 1-9 wherein.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910692048.6A CN110399406B (en) | 2019-07-26 | 2019-07-26 | Method, device and computer storage medium for mining global high utility sequence pattern |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910692048.6A CN110399406B (en) | 2019-07-26 | 2019-07-26 | Method, device and computer storage medium for mining global high utility sequence pattern |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110399406A true CN110399406A (en) | 2019-11-01 |
CN110399406B CN110399406B (en) | 2024-06-04 |
Family
ID=68326602
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910692048.6A Active CN110399406B (en) | 2019-07-26 | 2019-07-26 | Method, device and computer storage medium for mining global high utility sequence pattern |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110399406B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120130964A1 (en) * | 2010-11-18 | 2012-05-24 | Yen Show-Jane | Fast algorithm for mining high utility itemsets |
KR20140064077A (en) * | 2012-11-19 | 2014-05-28 | 충북대학교 산학협력단 | Method of mining high utility patterns |
CN109446235A (en) * | 2018-10-18 | 2019-03-08 | 哈尔滨工业大学(深圳) | Multidimensional effective sequence pattern processing method, device and computer equipment |
CN109460424A (en) * | 2018-10-18 | 2019-03-12 | 哈尔滨工业大学(深圳) | Effective sequence pattern processing method, device and computer equipment |
-
2019
- 2019-07-26 CN CN201910692048.6A patent/CN110399406B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120130964A1 (en) * | 2010-11-18 | 2012-05-24 | Yen Show-Jane | Fast algorithm for mining high utility itemsets |
KR20140064077A (en) * | 2012-11-19 | 2014-05-28 | 충북대학교 산학협력단 | Method of mining high utility patterns |
CN109446235A (en) * | 2018-10-18 | 2019-03-08 | 哈尔滨工业大学(深圳) | Multidimensional effective sequence pattern processing method, device and computer equipment |
CN109460424A (en) * | 2018-10-18 | 2019-03-12 | 哈尔滨工业大学(深圳) | Effective sequence pattern processing method, device and computer equipment |
Non-Patent Citations (3)
Title |
---|
JERRY CHUN-WEI LIN ET AL: "High-Utility Sequential Pattern Mining with Multiple Minimum Utility Thresholds", APWEB-WAIM 2017, PART I, 31 December 2017 (2017-12-31), pages 215 - 229 * |
JUNQIANG LIU ET AL: "Mining High Utility Patterns in One Phase without Generating Candidates", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, vol. 28, 17 December 2015 (2015-12-17), pages 1245 - 1257, XP011604910, DOI: 10.1109/TKDE.2015.2510012 * |
MORTEZA ZIHAYAT ET AL: "Distributed and Parallel High Utility Sequential Pattern Mining", 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 6 February 2017 (2017-02-06), pages 853 - 862 * |
Also Published As
Publication number | Publication date |
---|---|
CN110399406B (en) | 2024-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yang et al. | MapReduce as a programming model for association rules algorithm on Hadoop | |
Raj et al. | EAFIM: efficient apriori-based frequent itemset mining algorithm on Spark for big transactional data | |
Xin et al. | ELM∗: distributed extreme learning machine with MapReduce | |
CN101739281A (en) | Infrastructure for parallel programming of clusters of machines | |
Ngu et al. | B+-tree construction on massive data with Hadoop | |
CN112287015A (en) | Image generation system, image generation method, electronic device, and storage medium | |
Chen et al. | Highly scalable sequential pattern mining based on mapreduce model on the cloud | |
EP3494487A1 (en) | Learned data filtering | |
CN106326475A (en) | High-efficiency static hash table implement method and system | |
CN112052404A (en) | Group discovery method, system, device and medium for multi-source heterogeneous relation network | |
CN107102999A (en) | Association analysis method and device | |
CN104731925A (en) | MapReduce-based FP-Growth load balance parallel computing method | |
Huynh et al. | An efficient method for mining frequent sequential patterns using multi-core processors | |
CN104077438A (en) | Power grid large-scale topological structure construction method and system | |
CN104834709B (en) | A kind of parallel cosine mode method for digging based on load balancing | |
CN103577455A (en) | Data processing method and system for database aggregating operation | |
CN106445645A (en) | Method and device for executing distributed computation tasks | |
CN111915011A (en) | Single-amplitude quantum computation simulation method | |
Tar et al. | Parallel search paths for the simplex algorithm | |
Engström et al. | PageRank for networks, graphs, and Markov chains | |
Lin et al. | Mining high-utility sequential patterns from big datasets | |
Guan | An incremental updating algorithm of attribute reduction set in decision tables | |
CN110399406A (en) | Excavate the method, apparatus and computer storage medium of global effective sequence pattern | |
JP5464017B2 (en) | Distributed memory database system, database server, data processing method and program thereof | |
CN109857832A (en) | A kind of preprocess method and device of payment data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TG01 | Patent term adjustment | ||
TG01 | Patent term adjustment |