LU102289B1

LU102289B1 - Method for mining large-scale high utility patterns

Info

Publication number: LU102289B1
Application number: LU102289A
Authority: LU
Inventors: Chen Chien-Ming; Teng Qian; PAN Jeng-Shyang; Ming-Tai WU Jimmy; Wu Tsu-Yang
Original assignee: Univ Shandong Science & Tech
Priority date: 2020-12-16
Filing date: 2020-12-16
Publication date: 2021-06-18

Abstract

A method of mining large-scale high utility patterns includes acquiring an ordered location list, generating a searching graph for mining process according to the ordered location list, determining pruning strategies based on the searching graph, acquiring travel record information, mining high-utility patterns by using a efficient high utility itemset mining algorithm based on the pruning strategies, and determining a list of points of interest suggestions with a different number of locations according to the high-utility patterns.

Description

METHOD FOR MINING LARGE-SCALE HIGH UTILITY PATTERNS LU102289

FIELD

[0001] The present disclosure relates to a technical field of data mining technology, specifically a method for mining large-scale high utility patterns.

BACKGROUND

[0002] One well-known type of Mobile Ad-hoc Network (MANET) is known as a Vehicular Ad Hoc Network (VANET). The functions of such a network are integrated into a new generation of wireless networks for vehicles, which has established a strong self-organizing network that exists between roadside units and mobile vehicles. Within a VANET, each and every vehicle can be considered an intelligent mobile node, they can provide various information and communicate with other nodes in the network.

[0003] a new issue proposed to establish a service for providing POI (Points of Interest) information. It is usually an abom-inable work to seek some worth visiting point while visitors arrive at a tourist attraction especially for limited travel time. The traditional HUI (High-utility Itemset) mining algorithm cannot handle the data from VANET.

SUMMARY

[0004] The present disclosure provide a method for mining large-scale high utility patterns in VANET environment and an electronic device, to mine more popular Points of Interest (POIs). A problem of excessive data volume in VANET can be solved according to a more reasonable pruning strategy. When applying the method for mining large-scale high utility patterns to the MapReduce architecture, the feasibility in practical applications can be improved. It culminates with some experimental results that clearly show that the method for mining large-scale high utility patterns can 1 perform well to mine the POIs pattern in a big-data dataset and shows great performance in a Hadoop, 102289 computing cluster.

[0005] The system collects the user’s daily travel history and the ranking of each point in a current region by VANET. According to this information, the proposed framework can reveal high- utility patterns (the collections of some locations with high ranking) to provide users some valuable tourist suggestions. It is well known that the size of data from VANET is enormous.

[0006] The system collects the ratings of the tourist attractions (locations) and the users’ traffic history in a pre-defined period. After performing the proposed Efficient High Utility Itemset Mining algorithm (EHUM) algorithm, the user can obtain a list of POI suggestions with a different number of locations (HUPs) in a specific region. Therefore, users can focus on their tours and don’t need to waste a lot of time to arrange their schedules.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007] FIG. 1 shows a flowchart of a method for mining large-scale high utility patterns according to an embodiment of the present disclosure.

[0008] FIG.2(a) shows a schematic diagram of a travel history database according to an embodiment of the present disclosure.

[0009] FIG.2(b) shows a schematic diagram of a ranking table according to an embodiment of the present disclosure.

[00010] FIG.2(c) shows a schematic diagram of a record-weighted utility table according to an embodiment of the present disclosure.

[00011] FIG.3 shows a schematic diagram of whole Hadoop framework by providing POI suggestions according to an embodiment of the present disclosure.

2

[00012] FIG.4 shows pseudo code of constructing searching graph of a Algorithm 1 according 02289 to an embodiment of the present disclosure.

[00013] FIG.5 shows pseudo code of building child nodes of a Algorithm 2 according to an embodiment of the present disclosure.

[00014] FIG.6 shows a schematic diagram of a searching graph according to an embodiment of the present disclosure.

[00015] FIG.7 shows a schematic diagram of a proposed framework of EHUM according to an embodiment of the present disclosure.

[00016] FIG.8 shows pseudo code of Mapper of MapReduce 1 of a third Algorithm according to an embodiment of the present disclosure.

[00017] FIG.9 shows pseudo code of Combiner of MapReducel of a fourth Algorithm according to an embodiment of the present disclosure.

[00018] FIG.10 shows pseudo code of Reducer of MapReduce 1 in a fifth algorithm according to an embodiment of the present disclosure.

[00019] FIG.11 shows pseudo code of generating the task file for candidate k-HUPs in a sixth algorithm according to an embodiment of the present disclosure.

[00020] FIG.12 shows a schematic diagram of an example for generating task file according to an embodiment of the present disclosure.

[00021] FIG.13 shows pseudo code of Mapper of MapReduce 2 in a seventh algorithm according to an embodiment of the present disclosure.

[00022] FIG.14 shows pseudo code of Combiner of MapReduce 2 in a eighth algorithm according to an embodiment of the present disclosure.

[00023] FIG.15 shows pseudo code of Reducer of MapReduce 2 in a ninth algorithm according to an embodiment of the present disclosure.

3

DETAILED DESCRIPTION

[00024] The drawings are to be combined with the detailed description to illustrate the embodiments of the present disclosure hereinafter. It is noted that embodiments of the present disclosure and features of the embodiments can be combined when there is no conflict.

[00025] Various details are described in the following descriptions for a better understanding of the present disclosure, however, the present disclosure may also be implemented in other ways other than those described herein. The scope of the present disclosure is not to be limited by the specific embodiments disclosed below.

[00026] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present disclosure belongs. The terms used herein in the present disclosure are only for the purpose of describing specific embodiments, and are not intended to limit the present disclosure.

[00027] Optionally, the method of the present disclosure 1s applied to one or more blockchain node devices. The blockchain node device includes hardware such as a microprocessor and an Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA), Digital Signal Processor (DSP), embedded devices, etc, but is not limited thereto.

[00028] The blockchain node device may be a device such as a desktop computer, a notebook, a palmtop computer, or a cloud server. The computer device can interact with users through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.

[00029] FIG. 1 is a flowchart of a method of mining large-scale high utility patterns according to an embodiment of the present disclosure. According to different needs, the order of the steps in the flowchart can be changed and some can be omitted.

4

[00030] In this section, formatted definitions of a tourist history dataset and location ranking 02289 table are provided below. In fact, the proposed framework map these two definitions into transaction datasets and profit table in the traditional HUI (High-utility Itemset) algorithms. It also includes the following related definitions of utility, high-utility patterns, and et al. In one embodiment, the POI version definitions of high-utility pattern mining are provided below.

[00031] Let! = {i,, iy, ..., Im} be a finite set of m distinct locations. A quantitative database is a set of travel history D = {T,, T2, ..., T,}, where each travel history , T, € D(1 <q < n) is a subset of I and has a unique identifier gq, called its TID. A location ranking table ptable = {pr(i,), pr(iz),…, PT(in)} indicates the ranking value of each location i;. A set of Æ distinct items X = {iy, la, …, Ix} such that X © I is said to be a k-itemset (set of locations), where Æ is the length of the itemset. An itemset Xis said to be contained in a history record T, if X © T,. A minimum ranking threshold (called as minimum utility threshold in the traditional utility itemset mining) is set as ö according to users’ preference. As shown in FIG.2(a), a travel history database is shown. As shown in FIG.2(b), a ranking table is shown. As shown in FIG.2(c), a record-weighted utility table is shown.

[00032] In one embodiment, the ranking utility of a location i; in the record 77 is denoted as u( ij, Ty) and is defined as: Ce saw prit.) LE eT, RATS {PO HET * € Ö of & Ï 3

[00033] ; : $

[00034] For example, the ranking utility of the locations (b), (d), (e) and (f) in the record 72 are respectively calculated as: wb FT} = a (Tr = 5, #{d.T,} = a {dT} x prid) = 5, ule. To) = gle T,} x pr£e) = 6, w{f Bh = gf x prfy= 1.

[00035]

[00036] In one embodiment, the ranking utility of a location set X in the record Ty is denoted 102289 as u( X, Ta) and is defined as: w(X,T = Yulin).

[00037] HEXANCT,

[00038] For example, the ranking utility of the location {b, d, e} and {d, e, f } in the record 72 are respectively calculated as: u {bde, To) =u {bh To) +w{d,T,} +ule Ty) =pr{h} + prid} + prie) =0 4 546 = 0 (def, 0) =u (d,Tı)+ule Ty) +ulf. F3) =prid) + prie} + pr 0 m5 464 1m 12

[00039] S+6+1=12

[00040] In one embodiment, the ranking utility of a location set X in a database D is denoted as u(X) and is defined as: u (x 7 = >, 8 (X,T.}.

XC ATED

[00041] a

[00042] For example, the utility of item sets {b,d,e} and {d,e,f} in D are respectively calculated as: u{hde) = u(hed, Tr) + u{bde, Tr) = 20 + 20 = 40 wide Y= def Toy = 12

[00043] wide) = u{def, F,} = 12

[00044] The record utility (transaction utility in the traditional utility mining) of a record Tj, is denoted asru(T,), and defined as: ry =» WX.T)

[00045] et, | 6

[00046] For example, ru(T,) = u(a, T,) + u(e,T;) =3+ 1+ 6 = 10. The resting record 92289 from T, to Ty, are respectively calculated as ru(T,) =21, ru(T3) = 7, ru(T,) = 11, ru(Ts) = 7,ru(Te) = 10,ru(T,) = 30,ru(Tg) = 15, ru(Ty) = 13, ru(T,0) = 16.

[00047] The record-weighted utilization (rwu, the same as {wu in the traditional utility mining) of a location set X is denoted as rwu(X)and defined as: ricu( X) = > ra}

[00048] Fegxi |

[00049] g(X) is a record set in which the records includes the itemset X For example, rwu(bde) = ru(T,) + ru(T,) = 21 + 20 = 41.

[00050] The total utility of a database D is denoted as 7U, and defined as: TU =» ru) 00051] ÉED

[00052] For example, the total utility in a database D is calculated as TU=10+21+7+11+ 7+10+20+15 + 13 + 16 = 130.

[00053] A location set X in a database D is a high-utility pattern (HUP) if its utility is no less than the minimum utility count as:

[00054] HUP — (XXI TU x8}.

[00055] For example, the utility of the location sets {b, d, e} and {d, e, f} are respectively calculated as u(bde) = 40 and u(def) = 12 Thus, the location set {b,d,e} is a HUP since u(bde) = 40 > 130 x 0.3 = 39. The location set {b, d, e} is not a HUP since u(def) = 12 < 39.

[00056] A location set Xin a database D is a high record-weighted utilization pattern (HRW UP) if its record weighted utilization is no less than the minimum utility count as:

[00057] HRWUP « {Xiruu{X}> TU x 81}.

7

[00058] For example, the record-weighted utilization of the location sets {b, d, e} and {d, e, Ih 02289 are respectively calculated as: rrndbdey = ruF,} + ru(F,) = 21 + 20 = 41 > 389

[00059] rigsdde i= rly = 31 < 39

[00060] The itemset {b, d,e} is a HRWUP and the location set {d, e, f} is not a HRWUP. Generally, if a HRWUP contains / items, it is denoted as &-HRWUP (high-record-weighted utilization k-pattern). Therefore, the location set {b, d, e} is a 3-HRWUP.

[00061] At block S1, an ordered location list is acquired.

[00062] In one embodiment, the whole Hadoop framework by providing POI suggestions in this article is shown in FIG. 3. In the proposed framework, the system collects the ratings of the tourist attractions (locations) and the users’ traffic history in a pre-defined period. After performing the proposed EHUM algorithm, the user can obtain a list of POI suggestions with a different number of locations (HUPs) in a specific region. Therefore, users can focus on their tours and don’t need to waste a lot of time to arrange their schedules.

[00063] In one embodiment, the ordered location list is used to Construct a searching graph.

[00064] At block S2, according to the ordered location list, a searching graph for mining process is generated.

[00065] In one embodiment, before performing the proposed EHUM framework, a searching graph will be built first then the pruning strategies and the proposed algorithm can be performed by this graph. The proposed EHUM will find all of the HUPs in a dataset when the EHUM process estimates or prunes all of the nodes (they all indicate a specific candidate itemset respectively) in this graph. The searching graph does not need to contain any location that does not belong to the 1- HRWUPs. Thus, EHUM constructs a searching graph by the I-HRWUPs not all of the locations in a record database. After obtaining the 1-HRWUPs, EHUM sorts the locations in the 1-HRWUPs with 8 their record-weighted utilizations by descending order. For example, assume there are three locations, 102289 a, b, c in the 1-HRWUPs and their record-weighted utilization are rwu(a) = 80, rwu(b) = 90 and rwu(c) = 110,then EHUM will output an ordered item list: c, b and a. Afterward, a routing graph will be generated from this ordered list by a first algorithm (Algorithm 1) and a second algorithm (Algorithm 2). As shown in FIG.4, the pseudo code of constructing searching graph of the Algorithm 1 is shown, and as shown in FIG.5, the pseudo code of building child nodes of the Algorithm 2 is shown.

[00066] In the given database, rwu(a) = 38 < 39, therefor, b, c, d, e and f are all 1-HRWUPs. And the sorted 1-HRWUP list by the descending record-weighted utilization order is e, b, d, c, f. In this example, the searching graph is shown in FIG.6. There are thirty-one nodes in this graph (not include starting node S) and each node represents a possible high-utility pattern.

[00067] According to the above process, the searching graph is generated before EHUM starting its searching process. Therefore, each node in the searching graph indicates a specific location set which can be estimated whether it is a high-utility pattern or not. The expression of a node is the traveling log between the starting node to this node. For example, the expression of node 6 in FIG. 6 is {e,b,d,f } , and node 7 is {e, b, d}. In this way, all of the possible high-utility location set are strictly encoded into the searching graph.

[00068] At block S3, a pruning strategies is determined based on the searching graph.

[00069] In one embodiment, in the pruning strategies, there are two upper-bounds for high- utility patterns, named sub-tree utility and local utility, to determine whether a process estimates following nodes of a current node from the current node or not, a record-weighted utilization descending order is / = {i,, La, ..., i}, and a last location in a sorted location set X sorted by / order is im, set I’ is a sub-list of Z as i,,41, Im+2, rik , Z = L is a location€ I’ and g(X U {7})is a record set in which records include location set X U {7}, the sub-tree utility of z with respect to X is: 9

MX, 2) = > | HX Dy + ut, N+ x ut, 7} | 1006608

[00070] FERN EEE} i

[00071] The local utility of z with respect to X 1s: MX. = > x Ti+ X wii, n|

[00072] Pegi era” |

[00073] At block S4, travel record information 1s acquired.

[00074] In one embodiment, the travel record information is stored in HDFS with the binary structure; for any specific record in the travel record information, the efficient high utility itemset mining algorithm uses a hash table to store information of the specific record and uses Apache Avro to serialize the information of the specific record into the binary structure.

[00075] At block S5, high-utility patterns are mined by using a efficient high utility itemset mining algorithm based on the pruning strategies based on the travel record information.

[00076] In one embodiment, the proposed framework of EHUM is shown in FIG.7. There are three different MapReduce architectures, which is responsible for different goals, in the proposed frameworks. The first one is that revealing 1-HRWUPs, the second one is sorting 1-HRWUPs and the last one is mining HUPs. An independent process between a second MapReduce (MapReduce 2) and a third MapReduce (MapReduce 3) to generate a task file to indicate MapReduce 3 finding HUIS.

[00077] Generally, a Hadoop program catches the input data from HDFS with the plain text format. In EHUM, due to performance considerations, the travel record information is stored in HDFS with the binary structure. EHUM tries to produce less temporary data as possible, therefore, the random access ability for the original input dataset is very important. For any specific record, EHUM uses a hash table to store the record information and uses Apache Avro to serialize it into the binary structure. Thus, EHUM can get specific information for a certain location in a record efficiently. In this algorithm, all of the complex data structures are serialized and deserialized by Apache Avro. The, 102289 following sections do not emphasize it again.

[00078] In one embodiment, the high-record-weighted utilization /-patterns are determined by using a first MapReduce. Then the 1-HRWUPs should be sorted by the descending order of their record-weighted utilities. the candidate high-utility patterns are determined and a task file for storing the candidate high-utility patterns is generated according to the sorted 1-HRWUPs. The high-utility patterns are determined according to the candidate high-utility patterns from the task file and the travel record information by using a second MapReduce.

[00079] In one embodiment, a first MapReduce (MapReduce 1) is at the beginning of the proposed framework and reveals all of 1-HRWUPs in the dataset. Besides Mapper and Reducer, EHUM also setups Combiner in MapReduce 1 to increase the performance. As shown in FIG.8 and FIG.9, the pseudo code of Mapper of MapReduce 1 of a third Algorithm (Algorithm 3) is shown in FIG.8 and the pseudo code of Combiner of MapReducel of a fourth Algorithm (Algorithm 4) is shown in FIG.9. In Algorithm 3, each Mapper obtains a part of the dataset. Then, the key-value pair for the location ID and the record utility of a certain record which contains this location will output to Combiners. In Algorithm 4, Mapper nodes accumulate the value of the same location ID before it output the key-value pair list to Reducers. Actually, the output value of Algorithm 4 is the partial RWU for a location. It can reduce the requirement of the communication bandwidth and the time of transportation. Finally, the key-value pairs of the same location ID will be assigned to the same Reducer. Then Reducers in MapReduce 1 calculate the sum of the partial RWU for each location and output the 1-HRWUPs to MapReduce 2 for the sorting process.

[00080] In one embodiment, in MapReduce 2, when seeking the dataset, it is shown that the proposed novel framework first reveals any and all HUPs that may contain identical # of location. Therefore, a given task file will need generation so that it can be shown that the given candidate 11 location sets used for MapReduce2. Or at the k-th time for performing MapReduce 2, any and all the, 102289 k-HUPs must be located in this round. Task file format can be defined as the list for key-values in structural form. A key is defined as a set of locations where it is estimated from previous rounds as well as saying that the value 1s defined as a list of locations that may be extended directly from this set of locations in a given searching graph. Hence, keys and values may be combined with new candidate location sets on this this iteration. Therefore, looking at the first task file we say that is may just contain 1 record of any key that is NULL as well as has a value which is in the list of 1-HRWUPs.

For example, it is {(NULL, {e, b, d, f})} in FIG. 3. As shown in FIG.10, the pseudo code of Reducer of MapReduce 1 in a fifth algorithm (Algorithm 5) is shown. As shown in FIG.11, the pseudo code of generating the task file for candidate k-HUPs in a sixth algorithm (Algorithm 6).

[00081] In Algorithm 6, it combines the concept of Algorithm 1 and the pruning strategies in EFIM. An example for using node 17 ({b}) which is in FIG.3 , as (n, L;) in line 1 is described below. In this case, n is {b} and lr is {d, c, f}. The state of sub-tree utility and local utility for this branch (they were calculated by the previous MapReduce 2 process) is shown in the FIG.12. Thus, next Nodes = {d}' and whole Following List = {d,c}. Due to sub-tree utility prunning strategy, the following branches after node 22 and 24 will be pruned. The branch between node 18 and 21 will also be pruned, because lu{b, f} <threshold. Finally, {{b, d}',{c}} will be writed into list o in this loop iteration.

[00082] In one embodiment, EHUM loads the candidate HUPs from the task file and record information from the HBase database. Then, a second MapReduce (MapReduce 2) will be applied to calculate the utility, sub-tree utility and local utility for all candidate location sets, and further reveal the HUPs in this dataset. In MapReduce 2, FHUM also maintains a related record ID list (this list just records the ID numbers and does not include any record information) to be as input file for MapReduce system, therefore, FHUM does not need to scan the whole dataset.

12

[00083] In one embodiment, As shown in FIG.13, the pseudo code of Mapper of MapReduce, 102289 2 in a seventh algorithm (Algorithm 7) is shown. As shown in FIG.14, the pseudo code of Combiner of MapReduce 2 in a eighth algorithm (Algorithm 8) is shown. As shown in FIG.15, the pseudo code of Reducer of MapReduce 2 in a ninth algorithm (Algorithm 9) is shown. In Algorithm 7, a Mapper of MapReduce 2 will calculate the utility, sub-tree utility and local utility of each location set in the task file for each record which is assigned to the Mapper. If a record contains a candidate location set, it will be keep in the new record ID list, otherwise, it will be removed. For the same reason of Algorithm 4, MapReduce 2 also designes a Combiner to reduce the communication cost. In Algorithm 8, EHUM accumulates the value of utility, sUtility and [Utility to get the partial utility information for each candidate location set which is assigned to this Mapper. Then, EHUM updates the list of key-value pair and outputs to Reducers. In Algorithm 9, EHUM calculates the value of utility, sub- tree utility and local utility of each candidate location set for the whole travel record dataset and output to the HBase database in order to be used in the next iteration. If the utility of an location set is larger than the pre-defined threshold, this location set will be stored in the HBase database. Finally, if Algorithm 8 cannot generate more candidate HUPs, it means EHUM has already revealed all of the HUPs in this travel record dataset and it will stop the process and output the results.

[00084] Atblock S6, a list of points of interest suggestions with a different number of locations is determined according to the high-utility patterns.

[00085] The preset application collects the ratings of the tourist attractions (locations) and the users’ traffic history in a pre-defined period. After performing the proposed EHUM algorithm, the user can obtain a list of POI suggestions with a different number of locations (HUPs) in a specific region. Therefore, users can focus on their tours and don’t need to wast a lot of time to arrange their schedules.

13

[00086] In several embodiments provided in the preset application, it should be understood that 102289 the disclosed blockchain node devices and method can be implemented in other ways. For example, the embodiments of the devices described above are merely illustrative. For example, divisions of the units are only divisions according to logical functions, and there can be other manners of division in actual implementation.

[00087] In addition, each functional unit in each embodiment of the present disclosure can be integrated into one processing unit, or can be physically present separately in each unit, or two or more units can be integrated into one unit. The above integrated unit can be implemented in a form of hardware or in a form of a software functional unit.

[00088] The present disclosure is not limited to the details of the above-described exemplary embodiments, and the present disclosure can be embodied in other specific forms without departing from the spirit or essential characteristics of the present disclosure. Therefore, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present disclosure is to be defined by the appended claims. All changes and variations in the meaning and scope of equivalent elements are included in the present disclosure. Any reference sign in the claims should not be construed as limiting the claim. Furthermore, the word "comprising" does not exclude other units nor does the singular exclude the plural. A plurality of units or devices stated in the system claims may also be implemented by one unit or device through software or hardware. Words such as “first” and “second” are used to indicate names, but not in any particular order.

[00089] Finally, the above embodiments are only used to illustrate technical solutions of the present disclosure, and are not to be taken as restrictions on the technical solutions. Although the present disclosure has been described in detail with reference to the above embodiments, those skilled in the art should understand that the technical solutions described in one embodiments can be modified, or some of technical features can be equivalently substituted, and that these modifications 14 or substitutions are not to detract from the essence of the technical solutions or from the scope of the 102289 technical solutions of the embodiments of the present disclosure.

Claims

CLAIMS LU102289 We claim:

1. A method of mining large-scale high utility patterns, executed by a blockchain management node device, the method comprising: acquiring an ordered location list; according to the ordered location list, generating a searching graph for mining process; based on the searching graph, determining pruning strategies; acquiring travel record information; based on the travel record information, mining high-utility patterns by using a efficient high utility itemset mining algorithm based on the pruning strategies; according to the high-utility patterns, determining a list of points of interest suggestions with a different number of locations.

2. The method of mining large-scale high utility patterns of claim 1, wherein each node in the searching graph indicates a specific location set which can be estimated whether it is a high-utility pattern or not, the expression of a node is the traveling log between the starting node to this node.

3. The method of mining large-scale high utility patterns of claim 1, wherein in the pruning strategies, there are two upper-bounds for high-utility patterns, named sub-tree utility and local utility, to determine whether a process estimates following nodes of a current node from the current node or not, a record-weighted utilization descending order is I = {i4, I, ..., ir}, and a last location in a sorted location set X sorted by / order is i,,, set I’ is a sub-list of 7 as i411, Im+2,-, lg, Z = ip is a location€ /" and g(X U {z})is a record set in which records include location set X U {z}, the sub-tree utility of z with respect to X 1s: SX = >, = SET N, TG > ati, pi FRA | . PD as} i 16 the local utility of z with respect to X is: LU102289 fl X, 2) = > Jr Ti+ > uid, D Tein rar

4. The method of mining large-scale high utility patterns a of claim 1, wherein the travel record information is stored in HDFS with the binary structure; for any specific record in the travel record information, the efficient high utility itemset mining algorithm uses a hash table to store information of the specific record and uses Apache Avro to serialize the information of the specific record into the binary structure.

5. The method of mining large-scale high utility patterns of claim 1, wherein based on the travel record information, by using the efficient high utility itemset mining algorithm based on the pruning strategies, the high-utility patterns are mined by: determining high-record-weighted utilization /-patterns by using a first MapReduce, wherein a high-record-weighted utilization pattern contains Æ items, it is denoted as high-record-weighted utilization k-pattern, -HRWUP:; sorting the 1-HRWUPs by the descending order of their record-weighted utilities; determining candidate high-utility patterns and generating a task file for storing the candidate high-utility patterns according to the sorted 1-HRWUPs; determining the high-utility patterns according to the candidate high-utility patterns from the task file and the travel record information by using a second MapReduce.

6. The method of mining large-scale high utility patterns of claim 5, wherein there are Combiners in the first MapReduce, the high-record-weighted utilization 1-patterns by using a first MapReduce is determined by: obtaining a part of dataset through each Mapper of the first MapReduce; 17 output a key-value pair for a location ID and a record utility of a certain record which contains, 102289 the location to the Combiners; accumulating a value of the same location ID through Mapper nodes before a key-value pair list is output to Reducers of the first MapReduce; assigning the key-value pairs of the same location ID to the same Reducer; calculating the sum of the partial record-weighted utilization for each location through the Reducers and output the 1-HRWUPs.

7. The method of mining large-scale high utility patterns of claim 5, wherein the task file format can be defined as the list for key-values in structural form, a key is defined as a set of locations where it is estimated from previous rounds as well as saying that the value is defined as a list of locations that may be extended directly from this set of locations in the searching graph, keys and values may be combined with new candidate location sets on the iteration.

8. The method of mining large-scale high utility patterns of claim 5, wherein according to the candidate high-utility patterns from the task file and the travel record information, the high- utility patterns is determined by: loading the high-utility patterns from the task file and the and record information from a database; calculating the utility, sub-tree utility and local utility for all candidate location sets by apply the second MapReduce; revealing the high-utility patterns in the dataset.

9. The method of mining large-scale high utility patterns of claim 8, wherein a Mapper of the second MapReduce calculates the utility, sub-tree utility and local utility of each location set in the task file for each record which is assigned to the Mapper; a Combiner of the second MapReduce is designed to reduce communication cost; the value of utility, sUtility and [Utility 18 is accumulated to get the partial utility information for each candidate location set which 15102289 assigned to the Mapper, the list of key-value pair is updated and is outputted to the Reducers; the value of utility, sub-tree utility and local utility of each candidate location set for the whole travel record dataset are calculated and outputted to the database in order to be used in the next iteration, if the utility of an location set is larger than the pre-defined threshold, the location set will be stored in the database, finally, more candidate HUPs cannot be generated, all of the high-utility patterns are revealed in the dataset and the process is stopped.

10. The method of mining large-scale high utility patterns of claim 9, wherein if a record contains a candidate location set, the record is keep in the new record ID list; otherwise, the record is removed.

19