CN104464344A

CN104464344A - Vehicle driving path prediction method and system

Info

Publication number: CN104464344A
Application number: CN201410628190.1A
Authority: CN
Inventors: 马传香; 王时绘; 余啸; 曾诚; 陈昊; 张; 吕顺营; 宋建华; 吴思尧
Original assignee: Hubei University
Current assignee: Hubei University
Priority date: 2014-11-07
Filing date: 2014-11-07
Publication date: 2015-03-25
Anticipated expiration: 2034-11-07
Also published as: CN104464344B

Abstract

The invention provides a vehicle driving path prediction method and system. The method includes the steps that a minimum internal storage is determined on the basis of a Hadoop platform, the largest length of a path is scanned, and an original path sequence database is evenly divided into n disjoint sub-path sequence databases; the original path sequence database and the n sub-path sequence databases are respectively uploaded to an HDFS; the n sub-path sequence databases are dispatched to different Map nodes by a master control node, each Map node executes an improved GSP algorithm, the sub-path sequence databases stored in a Map node internal storage are scanned according to a preset minimum supporting degree X, a local path sequence mode is worked out, and Reduce nodes are merged and processed so that an overall candidate sequence mode can be obtained; the original path sequence database is scanned again so that an overall path sequence mode can be obtained; the overall path sequence mode generates a path association rule and the confidence degree of the path association rule is calculated so that a vehicle driving path prediction result can be obtained.

Description

A kind of vehicle running path Forecasting Methodology and system

Technical field

The invention belongs to intelligent transportation system technical field, particularly relate to a kind of vehicle running path Forecasting Methodology and system.

Background technology

(1) intelligent transportation system

Along with the development of geographic positioning technology is with ripe, and the rise of mobile computing, the application based on path and geographic position becomes the common focus of academia and industry member or even government.Routing information and geographic position, as the important attribute of mobile object, can provide important support for the improvement of much service and application system.The path of mobile object and positional information are inputted as system, has expedited the emergence of numerous emerging application.Intelligent transportation system is exactly a wherein famous application.The predecessor of intelligent transportation system is intelligent vehicle roadnet.Intelligent transportation system applies to whole traffic management system by effectively integrated to the infotech of advanced person, data communication transmission technology, Electronic transducer technology, electron controls technology and computer processing technology etc., and set up a kind of on a large scale in, comprehensively to play a role, in real time, multi-transportation and management system accurately and efficiently.Intelligent transportation system is a complicated comprehensive system, can be divided into following subsystems from the angle of system composition:

1) advanced transportation information service systems (ATIS)

ATIS is based upon on perfect information network basis.Traffic participant by being equipped on road, Che Shang, on transfer stop, on parking lot and the sensor of forecast center and transmission equipment, provide the Real-time Traffic Information of various places to traffic information center; ATIS obtain these information and by process after, in real time to traffic participant provide Traffic Information, public traffic information, transfer information, traffic weather information, parking lot information and with other information going out line correlation; Traveler determines trip mode, the selection schemer of oneself according to these information.Further, when car being equipped with automatic location and navigational system, this system can help driver automatically to select travel route.

2) advanced traveler information systems (ATMS)

ATMS some and ATIS shared information gather, process and transmission system, but ATMS uses mainly to traffic administration person, for detection control and management highway communication, at road, provide communication contact between vehicle and driver.It carries out real-time supervision by the traffic in roadnet, traffic hazard, meteorological condition and traffic environment, rely on advanced vehicle testing techniques and Computerized Information Processing Tech, obtain the information about traffic, and according to the information collected, traffic is controlled, as signal lamp, issue induction information, road control, accident treatment and rescue etc.

3) advanced public transportation system (APTS)

The fundamental purpose of APTS is the development adopting various intellectual technology to promote public transportation industry, makes public transit system realize safe and convenient, economy, target that freight volume is large.As provide advice with regard to trip mode and event, route and train number selection etc. to the public by personal computer, closed-circuit television etc., provided the real-time traffic information of vehicle to the person of waiting by display in bus stop.In public transit vehicle administrative center, the plan such as can to dispatch a car according to the real-time status reasonable arrangement of vehicle, return the vehicle to the garage and knock off, increases work efficiency and service quality.

4) advanced vehicle control system (AVCS)

The object of AVCS is that exploitation helps driver to carry out the various technology of this wagon control, thus makes ride safety of automobile, efficient.AVCS comprises warning to driver and help, and barrier is avoided waiting automatic Pilot technology.

5) transportation management system

Here refer to based on expressway network and information management system, utilize Logistics Theory to carry out the intelligentized logistic management system managed.Comprehensive utilization satnav, Geographic Information System, logistics information and network technology effectively organize freight transportation, improve shipping efficiency.

6) E-payment system (ETC)

ETC is state-of-the-art toll on the road and bridge's mode in the world.By being arranged on the special short range communication of microwave between the vehicle carried device in vehicle windscreen and the microwave antenna on charge station ETC track, Computer Networking and bank is utilized to carry out backstage settlement process, thus reach vehicle and do not need to stop by toll on the road and bridge station and can pay the object of road and bridge expense, and relevant income owner is given in the expense of paying sorting after background process.Electronic charging system without parking is installed in existing track, the traffic capacity in track can be made to improve 3 ~ 5 times.

7) emergency rescuing system (EMS)

EMS is a special system, its basis is ATIS, ATMS and relevant rescue facility and facility, by ATIS and ATMS, traffic surveillance and control center and the rescue facility of occupation are unified into organic whole, the services such as vehicle trouble on-the-spot emergency action, trailer, on-the-spot rescue, eliminating accident vehicle are provided for road user.

(2) Trace predict technology

The method of Trace predict is mainly divided into following two classes:

1) based on the Trace predict method of Markov model.Document [1]: Simmons R, Browning B, Zhang Y, et al.Learning to predict driver route and destination intent [C] .Proceedings of Intelligent TransportationSystems Conference, even if 2006:127-132. proposes there is better path, what people also can habitual select to pass by the past is familiar with route.Based on this prerequisite, by the observation to driver's history driving path data, set up Markov probability model and generate Markov probability tree, accordingly can by current time state, the routing of prediction vehicle subsequent time.Document [2]: based on the ETC charge data Research on Mining [J] of mixing Markov model. Traffic transport system engineering and information .2012.12 (4). choose ETC historical data build path sequence transaction database, propose a kind of method based on mixing Markov Trace predict model prediction vehicle on highway path, utilize the prediction the method achieving highway ETC vehicle current state in future.But the distance of the method prediction is short, be merely able to the section of predicting that vehicle subsequent time will arrive.

2) based on the Trace predict method of sequential mode mining.Document [3]: Yang J, Hu M.Trajpattern:mining sequentialpatterns from imprecise trajectories of mobile objects [C] .Proceedings of the InternationalConferences on Extending Database Technology, 2006:664-681. are for the position prediction problem of moving target under mobile computing environment, propose a kind of method excavating target travel rule from historical trajectory data, first moving region is divided into several grid of area equation, then target trajectory is changed into by the ordered sequence formed through these grid limits, then standard GSP algorithm is adopted to excavate Frequent Sequential Patterns wherein and generate inference rule.Document [4]: Giannotti F, Nanni M, Pedreschi D.Trajectory pattern mining [C] .Proceedings of the 13th ACMSlGKDD International Conference on Knowledge Discovery and Data Mining, 2007; 330-339. propose a kind of frequent Sequential Pattern Mining Algorithm being provided data by GPS device, on the basis of the algorithm of document [3], add this parameter of the residence time in grid.But the method arithmetic capability when processing mass data can not meet the requirement of people far away.Therefore, the newest fruits of computer software and hardware development must be given full play to, improve counting yield.

At present, intelligent transportation system adopts a large amount of advanced sensing device, network technology, camera arrangement and high speed computer system, can Real-Time Monitoring and collect a large amount of traffic datas.Supposed that with series installation the intersection of electronic eyes is formed transportation network for node, so vehicle running path sequence (hereinafter referred to as path sequence) can represent with node sequence arrangement.If I={i _k, k=1,2 ..., n} is a project set, project i _krepresent intersection road circuit node and road being provided with electronic eyes, n is intersection number.Path sequence is the ordered arrangement of disparity items, and path sequence S can be expressed as S=<s ₁, s ₂... s _j. ... s _n>, wherein s _jfor the project in project set I.In a path sequence, the sequence of a continuous item composition is called the subpath sequence of this path sequence arbitrarily.If path sequence α is the subpath sequence of path sequence β, then path sequence β is claimed to comprise path sequence α.Path sequence S is the path sequence number comprising S in path sequence database at the support counting of path sequence database.Path sequence S is the number percent shared in path sequence database of the path sequence that comprises S in the support of path sequence database, is designated as Support (S).Given minimum support ξ, if the support of path sequence S in path sequence database is not less than ξ, then claims path sequence S to be path sequence pattern.Path sequence has following character (hereinafter referred to as character 1): every two adjacent items contained by path sequence are adjacent two nodes of road.

(3) Map-Reduce programming framework

Map-Reduce is a kind of programming framework, have employed concept " Map (mapping) " and " Reduce (reduction) ", for the concurrent operation of large-scale dataset (being greater than 1TB).At related documents: [3] Jeffrey Dean and Sanjay Ghemawat.Map-Reduce:Simplified data processing on large Cluster [C] .Commuication of theACM, 2008, propose in 51 (1): 107-113..User only need write the function that two are called Map and Reduce, system can manage the coordination between the execution of Map or Reduce parallel task and task, and the situation of certain mission failure above-mentioned can be processed, and the fault-tolerance to hardware fault can be ensured simultaneously.

Computation process based on Map-Reduce is as follows:

1) first input file is divided into M data fragmentation by the Map-Reduce storehouse in user program, the size of each burst is generally from 16 to 64MB (user can control the size of each data slot by optional parameter), and then Map-Reduce storehouse creates a large amount of copies of programs in a group of planes.

2) these copies of programs have a special program-primary control program, and in copy, other program is all by the working routine of primary control program allocating task.Have M Map task and R Reduce task to be assigned with, a Map task or Reduce task matching are given an idle working routine by primary control program.

3) working routine being assigned with Map task reads relevant input data slot, key-value (key is parsed from the data slot of input, value) right, then key-be worth passing to user-defined Map function, Map function is by the middle ephemeral key produced-be worth being kept in local memory cache.

4) key in buffer memory-be worth being divided into R region by partition functions, is periodically written on local disk afterwards.The key of buffer memory-be worth will be returned to primary control program to the memory location on local disk, will be responsible for these memory locations to pass to the working routine being assigned with Reduce task again by primary control program.

5) after the working routine being assigned with Reduce task receives the data storage location information that primary control program sends, using remote procedure call (remote procedure calls) to read from the disk of the working routine place main frame being assigned with Map task, these are data cached.After the working routine being assigned with Reduce task have read all intermediate data, after key is sorted, make to have the data aggregate of same keys together.Because many different keys can be mapped in identical Reduce task, therefore must sort.If intermediate data cannot complete sequence too greatly in internal memory, so will sort in outside.

6) intermediate data after the working routine traversal sequence of Reduce task has been assigned with, for each unique middle key-it is right to be worth, the set of this key and its relevant intermediate value is passed to user-defined Reduce function by the working routine being assigned with Reduce task.The output of Reduce function is appended to the output file of affiliated subregion.

7) after all Map and Reduce tasks all complete, primary control program wakes user program up. and during this time, calling Map-Reduce in user program just returns.

(4) Hadoop cloud computing platform

Hadoop is the open source software project meeting reliability, extensibility, Distributed Calculation developed by Apache foundation.User can when not understanding distributed low-level details, exploitation distributed program.The power making full use of cluster carries out high-speed computation and storage.Hadoop achieves a distributed file system (Hadoop Distributed File System), is called for short HDFS.HDFS has the feature of high fault tolerance, and design is used for being deployed on cheap hardware; And it provides high-throughput to visit the data of application program, be applicable to the application program that those have super large data set.HDFS relaxes the requirement of POSIX, can data in the form of streaming in access file system.

The design that the framework of Hadoop is most crucial is exactly: HDFS and Map-Reduce.HDFS is that the data of magnanimity provide storage, and Map-Reduce is that the data of magnanimity provide calculating.

But, for concrete technical problems, need to solve how planning technology scheme is to adopt the problem of Map-Reduce Parallel Implementation.Not yet there is the technical scheme with ideal effect in the art.

Summary of the invention

The distance predicted for the existing Trace predict method based on Markov model is short, be merely able to the section of predicting that vehicle subsequent time will arrive, the existing Trace predict method based on sequential mode mining is in the problem processing mass data and high dimensional data arithmetic capability poor efficiency, and for the character 1 that vehicle running path sequence has, the present invention improves the production process of original GSP algorithm candidate sequence pattern, promote the operational performance of original GSP algorithm, and utilize Map-Reduce programming framework to carry out parallelization to improvement GSP algorithm, design meets the sequence library decomposition strategy of concurrent operation requirement, reduce I/O expense.The Large-scale parallel computing ability making full use of Hadoop cloud computing platform on this basis improves mass data sequential mode mining efficiency, shortens working hours.

Technical scheme provided by the invention is a kind of vehicle running path Forecasting Methodology, carries out following steps based on Hadoop platform,

Step 1, according to the internal memory situation of platform computing machine every in Hadoop platform, determine the minimum internal memory of all nodes, and be designated as Q, unit is GB;

Step 2, scanning stores the original path sequence library of vehicle running path sequence, the number obtaining path sequence in original path sequence library is designated as m bar, every paths sequence comprises more than one crossing, in original path sequence library, the actual storage size of longest path sequence is designated as P, and unit is B;

Step 3, is on average divided into n disjoint subpath sequence library by original route sequence library by horizontal division mode, wherein P × (m/n)≤Q × 10 ⁹;

Step 4, uploads in certain specified folder of HDFS by original path sequence library;

Step 5, uploads in another specified folder of HDFS by n sub-path sequence database;

Step 6, n step 5 uploaded by the main controlled node of Hadoop platform sub-path sequence database divides tasks different Map nodes, each Map node performs the GSP algorithm improved, according to the minimum support ξ preset, scan the subpath sequence library left in Map node memory, calculate local path sequence pattern, with <key, the form that value> is right passes to Reduce node, wherein key is local path sequence pattern, and value is the support counting of local path sequence pattern;

It is as follows that each Map node performs the GSP algorithm improved,

Operation a, for the subpath sequence library being assigned to this Map node, scanning subpath sequence library obtains 1-path sequence pattern L ₁, make k=1,

Operation b, by k-path sequence pattern L _kproduce candidate k+1-path sequence C _k+1, again scan former sequence library, calculate the support of each path candidate sequence, produce k+1-path sequence pattern L _k+1; Wherein, candidate k+1-path sequence C is produced _k+1divide the following two kinds situation,

(1) if produce candidate 2-path sequence pattern by 1-path sequence pattern, scanning stores the adjacency list of traffic network information, checks 1-path sequence pattern L ₁in each path sequence pattern s ₁adjacent node, will with s ₁adjacent node project adds s to ₁in;

(2) if produce candidate k+1-path sequence pattern by k-path sequence pattern, k>1,

First, to two path sequence pattern s any in k-path sequence pattern ₁and s ₂if remove path sequence pattern s ₁first project with remove path sequence pattern s ₂the path sequence that obtains of last project identical, then by s ₁with s ₂connect; Then, prune, if certain the subpath sequence comprising certain path candidate sequence pattern is not path sequence pattern, then delete from path candidate sequence pattern;

Operation c, makes k=k+1, repetitive operation b, until do not have new path candidate sequence to produce;

Step 7, the <key that Reduce node passes over Map node, value> obtain overall candidate sequence pattern to carrying out merger process;

Step 8, scanning step 4 leaves original path sequence library in HDFS in overall candidate sequence mode counting again, finds out the sequence pattern meeting and be not less than the minimum support ξ preset, obtains global path sequence pattern;

Step 9, produces path correlation rule and the degree of confidence of calculating path correlation rule by the global path sequence pattern produced in step 8, obtains vehicle running path and predict the outcome.

The present invention is also corresponding provides a kind of vehicle running path prognoses system, arranges based on Hadoop platform with lower module,

Internal memory confirms module, and for the internal memory situation according to platform computing machine every in Hadoop platform, determine the internal memory of the machine that internal memory is minimum in all nodes, and be designated as Q, unit is GB;

Longest path sequence confirms module, for scanning the original path sequence library storing vehicle running path sequence, the number obtaining path sequence in original path sequence library is designated as m bar, every paths sequence comprises more than one crossing, in original path sequence library, the actual storage size of longest path sequence is designated as P, and unit is B;

Subpath sequence library divides module, for original route sequence library being on average divided into n disjoint subpath sequence library by horizontal division mode, wherein P × (m/n)≤Q × 10 ⁹;

Transmission module on raw data base, for uploading in certain specified folder of HDFS by original path sequence library;

Transmission module on subdata base, for uploading in another specified folder of HDFS by n sub-path sequence database;

Local path sequence pattern module, the n uploaded by transmission module on subdata base for making the main controlled node of Hadoop platform sub-path sequence database divides tasks different Map nodes, each Map node performs the GSP algorithm improved, according to the minimum support ξ preset, scan the subpath sequence library left in Map node memory, calculate local path sequence pattern, with <key, the form that value> is right passes to Reduce node, wherein key is local path sequence pattern, value is the support counting of local path sequence pattern,

It is as follows that each Map node performs the GSP algorithm improved,

Overall situation candidate sequence mode module, obtains overall candidate sequence pattern for the <key making Reduce node pass over Map node, value> to carrying out merger process;

Global path sequence pattern module, original path sequence library in HDFS is left in overall candidate sequence mode counting for scanning transmission module on raw data base again, find out the sequence pattern meeting and be not less than the minimum support ξ preset, obtain global path sequence pattern;

Predict the outcome module, for producing path correlation rule and the degree of confidence of calculating path correlation rule by the global path sequence pattern produced in global path sequence pattern module, obtaining vehicle running path and predicting the outcome.

Relative to domestic and international existing vehicle running path Forecasting Methodology, the present invention, according to the basic demand of Map-Reduce programming framework, has redesigned and has carried out sequential mode mining and the flow process of generation pass correlation rule to vehicle running path sequence.The present invention also improves for the production process of vehicle running path sequence character 1 to original GSP algorithm candidate sequence pattern, the present invention have also been devised rational sequence library decomposition strategy, achieve the parallelization improving GSP algorithm, reduce I/O expense, the processing power sharing the cluster computer stored can be given full play to, increase work efficiency.Technical scheme of the present invention has simply, feature fast, can improve preferably and carry out sequential mode mining and the efficiency of generation pass correlation rule to vehicle running path sequence.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the embodiment of the present invention;

Fig. 2 is the simulation traffic network schematic diagram of the embodiment of the present invention;

Fig. 3 is the adjacency list of the storage simulation traffic network of the embodiment of the present invention;

Fig. 4 is that the original path sequence library of the embodiment of the present invention divides schematic diagram;

Fig. 5 is that embodiment of the present invention antithetical phrase path sequence database 1 performs Map task schematic diagram;

Fig. 6 is that embodiment of the present invention antithetical phrase path sequence database 2 performs Map task schematic diagram;

Fig. 7 is that embodiment of the present invention antithetical phrase path sequence database 3 performs Map task schematic diagram.

Embodiment

Technical solution of the present invention is described in detail below in conjunction with drawings and Examples.

Embodiment, for simulation traffic network as shown in Figure 2, all has electronic eyes image data in 14 intersections of A ~ N.Because the present invention will utilize the information of traffic network, so adopt adjacency list to store traffic network information, the adjacency list that this road network is corresponding is shown in accompanying drawing 3, A crossing and B, C crossing adjoins, B crossing and A, D crossing adjoins, C crossing and A, E crossing adjoins, D crossing and B, G, F crossing adjoins, E crossing and C, F, H crossing adjoins, F crossing and D, G, J, H, E crossing adjoins, G crossing and D, I, F crossing adjoins, H crossing and F, K, E crossing adjoins, I crossing and G, L crossing adjoins, J crossing and F, N crossing adjoins, K crossing and H, M crossing adjoins, L crossing and I, N crossing adjoins, M crossing and K, N crossing adjoins, N crossing and J, L, M crossing adjoins.The traveling of the vehicle of electronic eyes collection is recorded corresponding path sequence stored in vehicle running path sequence library, every paths sequence comprises more than one crossing, such as, shown in following table.

Path sequence
	<A B D F H K>
<A C E F G I L>
	<A B D F H K M N>
<C E F G I L N>
	<A B D F H K>

<C E F G I L N>
	<A B D G I L N>
<A B D F H K M>
	<A B D F H K>
<E F G I L N>
	<A B D G I L N>
<A B D F H K M N>

The reflection of path sequence pattern be the route selection of vehicle regularity.Produce tool directive path correlation rule by path sequence pattern, the former piece of rule represents that the path sequence that vehicle has travelled, consequent represent the path sequence that vehicle will travel.Degree of confidence conf (<A B D> → <F H K>) as this paths correlation rule of <A B D> → <F H K> is defined as in path sequence database the number that comprises path sequence <A B D F H K> and the ratio of number comprising path sequence <A B D>.Namely represent that the following probability through FHK node of the vehicle having run over A B D tri-nodes is conf (<A B D> → <F H K>).

Based on the above-mentioned original path sequence library generated in advance, the vehicle running path Forecasting Methodology flow process based on Map-Reduce programming framework of the present invention's design is shown in accompanying drawing 1, and institute can adopt computer software technology realization flow automatically to run by those skilled in the art in steps.Embodiment specific implementation process is as follows:

Step 1, according to the internal memory situation of platform computing machine every in Hadoop platform, determines the internal memory of the machine that internal memory is minimum in all nodes, and is designated as Q (unit: GB).In embodiment, obtain Q=2GB.

Owing to will on average be divided into n disjoint subpath sequence library to original path sequence library in step 3, and subpath sequence library is put into node memory.So the bottleneck in order to not allow the less computing machine of wherein certain internal memory become computing, suggestion is concrete when implementing in Hadoop platform, and the internal memory of every platform computing machine is the same with operational performance.

Step 2, run-down original path sequence library (original path sequence library can text document form store, be beneficial to and original path sequence library is imported in HDFS), obtain the number of path sequence in database and be designated as m bar, in database, the actual storage size of longest path sequence is P (unit: B).In embodiment, in database, the number of path sequence is 12, and because a character take up space is 1B, therefore maximum length sequence actual storage size is 17B (comprising space and angle brackets), therefore obtains m=12, P=17B.

Step 3, is on average divided into n disjoint subpath sequence library (n disjoint subpath sequence library also can the form of text document store) by original route sequence library by horizontal division mode.General m can be divided exactly by n, each subpath sequence library is made to comprise m/n paths sequence, namely the path sequence of the 1st article to m/n article of original path sequence library is comprised in first sub-path sequence database, (k-1) × (m/n)+1 article that the individual sub-path sequence database of kth (1<k<n) comprises original path sequence library arrives the path sequence of k × (m/n) article, n-th subpath sequence library comprises the path sequence of (n-1) × (m/n)+1 article to m article of original path sequence library.In order to the original route sequence library be placed in external memory need not be scanned when counting path candidate sequence pattern, reduce I/O expense, each subpath sequence library should be made to put into internal memory.Namely P × (m/n)≤Q × 10 should be met ⁹.During P, Q employing other unit, also corresponding conditions should be met, in protection scheme of the present invention.

As Fig. 4, original path sequence library is divided into n=3 sub-path sequence database by embodiment setting, 17 × (12/3) <2 × 10 in embodiment ⁹, meet the requirement of subpath sequence library being put into internal memory.

Original path sequence library is divided the subpath sequence library 1,2,3 obtained as follows respectively:

The path sequence table of subpath sequence library 1

Path sequence
	<A B D F H K>
<A C E F G I L>
	<A B D F H K M N>
<C E F G I L N>

The path sequence table of subpath sequence library 2

Path sequence
	<A B D F H K>
<C E F G I L N>
	<A B D G I L N>
<A B D F H K M>

The path sequence table of subpath sequence library 3

Path sequence
	<A B D F H K>
<E F G I L N>
	<A B D G I L N>
<A B D F H K M N>

Each path sequence comprises project set { some projects in A, B, C, D, E, F, G, H, I, J, K, L, M, N} respectively.Subpath sequence library 1 comprises the 1st article of original path sequence library to the 4th paths sequence, subpath sequence library 2 comprises the 5th article of original path sequence library to the 8th paths sequence, and subpath sequence library 3 comprises the 9th article of original path sequence library to the 12nd paths sequence.

If the number of Map node is q in Hadoop platform, the number of suggestion subpath sequence library equals the number of Map node, i.e. n=q.If n<q, when running the method, have (q-n) individual Map node to be not used when not having mission failure, Duty-circle is not high.If n>q, when running the method, when not having mission failure, n-q sub-path sequence database needs just can be processed after q the complete front q of Map node processing sub-path sequence database, and treatment effeciency is not high.Therefore n=q can meet Duty-circle and treatment effeciency simultaneously.

Step 4, uploaded to by original path sequence library in certain specified folder of HDFS, step 8 will scan the path sequence database leaving this specified folder in.

Step 5, uploads in another specified folder of HDFS by n sub-path sequence database, and the n in this file sub-path sequence database is the input file that step 6 processes.

Step 6, n sub-path sequence database step 5 uploaded by main controlled node (running the computer node of primary control program) divides tasks different Map nodes (performing the computer node of Map task), each Map node performs the GSP algorithm improved, according to the minimum support ξ preset, scan the subpath sequence library left in Map node memory, calculate local path sequence pattern, with <key, the form that value> is right passes to Reduce node (performing the computer node of Reduce task), wherein key is local path sequence pattern, value is the support counting of local path sequence pattern.

It is as follows that each Map node performs the GSP algorithm improved:

Operation a, for the subpath sequence library being assigned to this Map node, first scans subpath sequence library and obtains 1-path sequence pattern L ₁, namely length is 1 and support in subpath sequence library is not less than the set of the path sequence of ξ.If length is k and the set that support in subpath sequence library is not less than the path sequence of ξ is k-path sequence pattern L _k; Make k=1,

Operation b, then by k-path sequence pattern L _kproduce candidate k+1-path sequence C _k+1, again scan former sequence library, calculate the support of each path candidate sequence, produce k+1-path sequence pattern L _k+1;

Operation c, makes k=k+1, repetitive operation b afterwards, until do not have new path candidate sequence to produce, and gained 1-path sequence pattern L ₁, 2-path sequence pattern L ₂it is all local path sequence pattern.The number of times of scan database is identical with the maximum length of the path sequence pattern of generation.

Wherein, produce path candidate sequence pattern and mainly divide the following two kinds situation:

(1) if produce candidate 2-path sequence pattern by 1-path sequence pattern, scanning adjacency list, checks 1-path sequence pattern L ₁in each path sequence pattern s ₁adjacent node, if s ₁adjacent node also at 1-path sequence pattern L ₁in, then s ₁with s ₁adjacent node connects, and is about to and s ₁adjacent node project adds s to ₁in.

(2) if produce candidate k+1-path sequence pattern (k>1) by k-path sequence pattern, path candidate sequence pattern is produced main in two steps:

First, to two path sequence pattern s any in k-path sequence pattern ₁and s ₂if remove path sequence pattern s ₁first project with remove path sequence pattern s ₂the path sequence that obtains of last project identical, then can by s ₁with s ₂connect, by s ₂last project add s to ₁in.Then prune: if certain subpath sequence of certain path candidate sequence pattern is not path sequence pattern, then this path candidate sequence pattern can not be path sequence pattern, it is deleted from path candidate sequence pattern.

Embodiment setting minimum support is 50%, performs the concrete steps of improvement GSP algorithm as Fig. 5,6,7.Be assigned to the Map node of subpath sequence library 1, scanning subpath sequence library 1 obtains 1-path sequence pattern L ₁, then by 1-path sequence pattern L ₁produce candidate 2-path sequence pattern C ₂, again scan former sequence library, calculate the support of each path candidate sequence pattern, produce 2-path sequence pattern L ₂, repetitive operation afterwards, until do not have new path candidate sequence pattern to produce.Antithetical phrase path sequence database 2, subpath sequence library 3 are respectively by the corresponding Map node respective handling be assigned to.

See Fig. 5, in antithetical phrase path sequence database 1 implementation, acquired results is respectively shown as follows:

L ₁(1-path sequence pattern)

Path sequence	Support counting
		<A>	3
<B>	2
		<C>	2
<D>	2
		<E>	2
<F>	4
		<G>	2
<H>	2
		<I>	2
<K>	2
		<L>	2
<N>	2

C ₂(candidate 2-path sequence pattern)

Path sequence
	<A B>
<A C>
	<B A>
<B D>
	<C A>
<C E>
	<D B>
<D G>
	<D F>
<E F>
	<E H>
<E C>
	<F D>
<F G>
	<F H>
<F E>
	<G D>
<G I>
	<G F>
<H E>
	<H F>
<H K>
	<I G>
<I L>
	<K H>
<L I>
	<L N>

<N L>

L ₂(2-path sequence pattern)

Path sequence	Support counting
		<A B>	2
<B D>	2
		<C E>	2
<D F>	2
		<E F>	2
<F G>	2
		<F H>	2
<G I>	2
		<H K>	2
<I L>	2

C ₃(candidate 3-path sequence pattern)

Path sequence
	<A B D>
<B D F>
	<C E F>
<D F G>
	<D F H>
<E F G>
	<E F H>
<F G I>
	<F H K>
<G I L>

L ₃(3-path sequence pattern)

Path sequence

Support counting

<A B D>	2
		<B D F>	2
<C E F>	2
		<D F H>	2
<E F G>	2
		<F G I>	2
<F H K>	2
		<G I L>	2

C ₄(candidate 4-path sequence pattern)

Path sequence
	<A B D F>
<B D F H>
	<C E F G>
<D F H K>
	<E F G I>
<F G I L>

L ₄(4-path sequence pattern)

Path sequence	Support counting
		<A B D F>	2
<B D F H>	2
		<C E F G>	2
<D F H K>	2
		<E F G I>	2
<F G I L>	2

C ₅(candidate 5-path sequence pattern)

Path sequence

<A B D F H>
	<B D F H K>
<C E F G I>
	<E F G I L>

L ₅(5-path sequence pattern)

Path sequence	Support counting
		<A B D F H>	2
<B D F H K>	2
		<C E F G I>	2
<E F G I L>	2

C ₆(candidate 6-path sequence pattern)

Path sequence
	<A B D F H K>
<C E F G I L>

L ₆(6-path sequence pattern)

Path sequence	Support counting
		<A B D F H K>	2
<C E F G I L>	2

See Fig. 6, in antithetical phrase path sequence database 2 implementation, acquired results is respectively shown as follows:

L ₁(1-path sequence pattern)

Path sequence	Support counting
		<A>	3
<B>	3
		<D>	3

<F>	3
		<G>	2
<H>	2
		<I>	2
<K>	2
		<L>	2
<N>	2

C ₂(candidate 2-path sequence pattern)

Path sequence
	<A B>
<B A>
	<B D>
<D B>
	<D G>
<D F>
	<F D>
<F G>
	<F H>
<G D>
	<G I>
<G F>
	<H E>
<H F>
	<H K>
<I G>
	<I L>
<K H>
	<L I>

<L N>
	<N L>

L ₂(2-path sequence pattern)

Path sequence	Support counting
		<A B>	3
<B D>	3
		<D F>	2
<F H>	2
		<G I>	2
<H K>	2
		<I L>	2
<L N>	2

C ₃(candidate 3-path sequence pattern)

Path sequence
	<A B D>
<B D F>
	<D F H>
<F H K>
	<G I L>
<I L N>

L ₃(3-path sequence pattern)

Path sequence	Support counting
		<A B D>	3
<B D F>	2
		<D F H>	2
<F H K>	2

<G I L>	2
		<I L N>	2

C ₄(candidate 4-path sequence pattern)

Path sequence
	<A B D F>
<B D F H>
	<D F H K>
<G I L N>

L ₄(4-path sequence pattern)

Path sequence	Support counting
		<A B D F>	2
<B D F H>	2
		<D F H K>	2
<G I L N>	2

C ₅(candidate 5-path sequence pattern)

Path sequence
	<A B D F H>
<B D F H K>

L ₅(5-path sequence pattern)

Path sequence	Support counting
		<A B D F H>	2
<B D F H K>	2

C ₆(candidate 6-path sequence pattern)

Path sequence

L ₆(6-path sequence pattern)

Path sequence	Support counting
		<A B D F H K>	2

See Fig. 7, in antithetical phrase path sequence database 3 implementation, acquired results is respectively shown as follows:

L ₁(1-path sequence pattern)

Path sequence	Support counting
		<A>	3
<B>	3
		<D>	3
<F>	3
		<G>	2
<H>	2
		<I>	2
<K>	2
		<L>	2
<N>	3

C ₂(candidate 2-path sequence pattern)

Path sequence
	<A B>
<B A>
	<B D>
<D B>
	<D G>
<D F>
	<F D>

<F G>
	<F H>
<G D>
	<G I>
<G F>
	<H E>
<H F>
	<H K>
<I G>
	<I L>
<K H>
	<L I>
<L N>
	<N L>

L ₂(2-path sequence pattern)

C ₃(candidate 3-path sequence pattern)

Path sequence
	<A B D>

<B D F>
	<D F H>
<F H K>
	<G I L>
<I L N>

L ₃(3-path sequence pattern)

Path sequence	Support counting
		<A B D>	3
<B D F>	2
		<D F H>	2
<F H K>	2
		<G I L>	2
<I L N>	2

C ₄(candidate 4-path sequence pattern)

Path sequence
	<A B D F>
<B D F H>
	<D F H K>
<G I L N>

L ₄(4-path sequence pattern)

Path sequence	Support counting
		<A B D F>	2
<B D F H>	2
		<D F H K>	2
<G I L N>	2

C ₅(candidate 5-path sequence pattern)

Path sequence
	<A B D F H>
<B D F H K>

L ₅(5-path sequence pattern)

Path sequence	Support counting
		<A B D F H>	2
<B D F H K>	2

C ₆(candidate 6-path sequence pattern)

Path sequence
	<A B D F H K>

L ₆(6-path sequence pattern)

Path sequence	Support counting
		<A B D F H K>	2

Map working node passes to the <key of Reduce working node, and value> is to such as following table:

key	value
		<A>	3
<B>	2
		<C>	2
<D>	2
		<E>	2
<F>	4
		<G>	2
<H>	2
		<I>	2

<K>	2
		<L>	2
<N>	2
		<A B>	2
<B D>	2
		<C E>	2
<D F>	2
		<E F>	2
<F G>	2
		<F H>	2
<G I>	2
		<H K>	2
<I L>	2
		<A B D>	2
<B D F>	2
		<C E F>	2
<D F H>	2
		<E F G>	2
<F G I>	2
		<F H K>	2
<G I L>	2
		<A B D F>	2
<B D F H>	2
		<C E F G>	2
<D F H K>	2
		<E F G I>	2
<F G I L>	2
		<A B D F H>	2
<B D F H K>	2

<C E F G I>	2
		<E F G I L>	2
<A B D F H K>	2
		<C E F G I L>	2
<A>	3
		<B>	3
<D>	3
		<F>	3
<G>	2
		<H>	2
<I>	2
		<K>	2
<L>	2
		<N>	2
<A B>	3
		<B D>	3
<D F>	2
		<F H>	2
<G I>	2
		<H K>	2
<I L>	2
		<L N>	2
<A B D>	3
		<B D F>	2
<D F H>	2
		<F H K>	2
<G I L>	2
		<I L N>	2
<A B D F>	2

<B D F H>	2
		<D F H K>	2
<G I L N>	2
		<A B D F H>	2
<B D F H K>	2
		<A B D F H K>	2
<A>	3
		<B>	3
<D>	3
		<F>	3
<G>	2
		<H>	2
<I>	2
		<K>	2
<L>	2
		<N>	3
<A B>	3
		<B D>	3
<D F>	2
		<F H>	2
<G I>	2
		<H K>	2
<I L>	2
		<L N>	2
<A B D>	3
		<B D F>	2
<D F H>	2
		<F H K>	2
<G I L>	2

<I L N>	2
		<A B D F>	2
<B D F H>	2
		<D F H K>	2
<G I L N>	2
		<A B D F H>	2
<B D F H K>	2
		<A B D F H K>	2

N sub-path sequence database automatically to be divided by Master node and task different Map working nodes by Hadoop, and can manage the coordination between the execution of Map parallel task and task, and can process the situation of certain mission failure above-mentioned.Realize relatively simple, quick in this way.

Step 7, the <key that Reduce node passes over Map node, value> obtains overall candidate sequence pattern to merger process, namely identical to key <key, value> is combined, by <key, value> is to being converted to <key, the set > of the value that this key is correlated with, the overall candidate sequence pattern that embodiment produces is as following table.

key	Value gathers
		<A>	{3,3,3}
<B>	{2,3,3}
		<C>	{2}
<D>	{2,3,3}
		<E>	{2}
<F>	{4,3,3}
		<G>	{2,2,2}
<H>	{2,2,2}
		<I>	{2,2,2}
<K>	{2,2,2}
		<L>	{2,2,2}
<N>	{2,2,3}
		<A B>	{2,3,3}

<B D>	{2,3,3}
		<C E>	{2}
<D F>	{2,2,2}
		<E F>	{2}
<F G>	{2}
		<F H>	{2,2,2}
<G I>	{2,2,2}
		<H K>	{2,2,2}
<I L>	{2,2,2}
		<A B D>	{2,3,3}
<B D F>	{2,2,2}
		<C E F>	{2}
<D F H>	{2,2,2}
		<E F G>	{2}
<F G I>	{2}
		<F H K>	{2,2,2}
<G I L>	{2,2,2}
		<A B D F>	{2,2,2}
<B D F H>	{2,2,2}
		<C E F G>	{2}
<D F H K>	{2,2,2}
		<E F G I>	{2}
<F G I L>	{2}
		<A B D F H>	{2,2,2}
<B D F H K>	{2,2,2}
		<C E F G I>	{2}
<E F G I L>	{2}
		<A B D F H K>	{2,2,2}
<C E F G I L>	{2}

<L N>	{2,2}
		<I L N>	{2,2}
<G I L N>	{2,2}

Merger process is completed automatically by Hadoop, and object is to not repeat identical local sequence mode counting.

Step 8, scanning step 4 leaves original path sequence library in HDFS in overall candidate sequence mode counting again, find out the sequence pattern meeting and be not less than the minimum support ξ preset, embodiment exports <key, value> as following table.The local sequence pattern that Map task just produces, does not meet the minimum support of the overall situation, so again scan former sequence library, obtains the path sequence pattern of the overall situation.Scan former sequence library to the key counting in step 7 gained overall situation path candidate sequence pattern, obtain global path sequence pattern, namely obtain the key in following table.

key	value
		<A>	9
<B>	8
		<D>	8
<F>	10
		<G>	6
<H>	6
		<I>	6
<K>	6
		<L>	6
<N>	7
		<A B>	8
<B D>	8
		<D F>	6
<F H>	6
		<G I>	6
<H K>	6
		<I L>	6
<A B D>	8
		<B D F>	6

<D F H>	6
		<F H K>	6
<G I L>	6
		<A B D F>	6
<B D F H>	6
		<D F H K>	6
<A B D F H>	6
		<B D F H K>	6
<A B D F H K>	6

Step 9, produces path correlation rule and the degree of confidence of calculating path correlation rule by the global path sequence pattern produced in step 8, obtains vehicle running path and predict the outcome.The concrete steps being produced path correlation rule by global path sequence pattern are: using front n project (1≤n<L) of L-path sequence pattern (L>1) as regular former piece, a rear L-n project is as consequent, and the degree of confidence of rule is the support of whole path sequence pattern and the ratio of the support of regular former piece.The path correlation rule produced and degree of confidence thereof are as following table:

Path correlation rule	Degree of confidence
		<A>→<B>	88.89％
<B>→<D>	100％
		<D>→<F>	75％
<F>→<H>	60％
		<G>→<I>	100％
<H>→<K>	100％
		<I>→<L>	100％
<A>→<B D>	88.89％
		<A B>→<D>	100％
<B>→<D F>	75％
		<B D>→<F>	75％
<D>→<F H>	75％
		<D F>→<H>	100％
<F>→<H K>	60％

<F H>→<K>	100％
		<G>→<I L>	100％
<G I>→<L>	100％
		<A>→<B D F>	66.67％
<A B>→<D F>	75％
		<A B D>→<F>	75％
<B>→<D F H>	75％
		<B D>→<F H>	75％
<B D F>→<H>	100％
		<D>→<F H K>	100％
<D F>→<H K>	100％
		<D F H>→<K>	100％
<A>→<B D F H>	66.67％
		<A B>→<D F H>	75％
<A B D>→<F H>	75％
		<A B D F>→<H>	100％
<B>→<D F H K>	75％
		<B D>→<F H K>	75％
<B D F>→<H K>	100％
		<B D F H>→<K>	100％
<A>→<B D F H K>	66.67％
		<A B>→<D F H K>	75％
<A B D>→<F H K>	75％
		<A B D F>→<H K>	100％
<A B D F H>→<K>	100％

During concrete enforcement, step 1 ~ 5 can be performed by the main controlled node of Hadoop platform, step 6 by the main controlled node of Hadoop platform divide task Map node perform, step 7, step 8, step 9 are performed by the Reduce node of Hadoop platform.

The present invention is also corresponding provides a kind of vehicle running path prognoses system, arrange based on Hadoop platform with lower module, internal memory confirms module, for the internal memory situation according to platform computing machine every in Hadoop platform, determine the internal memory of the machine that internal memory is minimum in all nodes, and be designated as Q;

Longest path sequence confirms module, for scanning the original path sequence library storing vehicle running path sequence, the number obtaining path sequence in original path sequence library is designated as m bar, every paths sequence comprises more than one crossing, and in original path sequence library, the actual storage size of longest path sequence is designated as P;

Subpath sequence library divides module, for original route sequence library is on average divided into n disjoint subpath sequence library by horizontal division mode;

It is as follows that each Map node performs the GSP algorithm improved,

(1) if produce candidate 2-path sequence pattern by 1-path sequence pattern, scanning stores the adjacency list of traffic network information, checks 1-path sequence pattern L ₁in each path sequence pattern s ₁adjacent node, if s ₁adjacent node also at 1-path sequence pattern L ₁in, will with s ₁adjacent node project adds s to ₁in;

Specific embodiment described herein is only to the explanation for example of the present invention's spirit.Those skilled in the art can make various amendment or supplement or adopt similar mode to substitute to described specific embodiment, but can't depart from spirit of the present invention or surmount the scope that appended claims defines.

Claims

1. a vehicle running path Forecasting Methodology, is characterized in that: carry out following steps based on Hadoop platform,

Step 6, n step 5 uploaded by the main controlled node of Hadoop platform sub-path sequence database divides tasks different Map nodes, each Map node performs the GSP algorithm improved, according to the minimum support x preset, scan the subpath sequence library left in Map node memory, calculate local path sequence pattern, with <key, the form that value> is right passes to Reduce node, wherein key is local path sequence pattern, and value is the support counting of local path sequence pattern;

It is as follows that each Map node performs the GSP algorithm improved,

Operation a, for the subpath sequence library being assigned to this Map node, scanning subpath sequence library obtains 1-path sequence pattern l ₁, order k=1,

Operation b, by k-path sequence pattern l _kproduce candidate k+1-path sequence c _k+1, again scan former sequence library, calculate the support of each path candidate sequence, produce k+1-path sequence pattern l _k+1; Wherein, candidate is produced k+1-path sequence c _k+1divide the following two kinds situation,

(1) if produce candidate 2 by 1-path sequence pattern -path sequence pattern, scanning stores the adjacency list of traffic network information, checks 1-path sequence pattern l ₁in each path sequence pattern s ₁adjacent node, if s ₁adjacent node also in 1-path sequence pattern l ₁in, will be with s ₁adjacent node project is added to s ₁in;

(2) if by k-path sequence pattern produces candidate k+1 -path sequence pattern, k>1,

First, right kany two path sequence patterns in-path sequence pattern s ₁with s ₂if remove path sequence pattern s ₁first project with remove path sequence pattern s ₂the path sequence that obtains of last project identical, then will s ₁with s ₂connect;

Then, prune, if certain the subpath sequence comprising certain path candidate sequence pattern is not path sequence pattern, then delete from path candidate sequence pattern;

Operation c, order k= k+ 1, repetitive operation b, until do not have new path candidate sequence to produce;

Step 8, scanning step 4 leaves original path sequence library in HDFS in overall candidate sequence mode counting again, finds out the sequence pattern meeting and be not less than the minimum support x preset, obtains global path sequence pattern;

2. a vehicle running path prognoses system, is characterized in that: arrange based on Hadoop platform with lower module,

Local path sequence pattern module, the n uploaded by transmission module on subdata base for making the main controlled node of Hadoop platform sub-path sequence database divides tasks different Map nodes, each Map node performs the GSP algorithm improved, according to the minimum support x preset, scan the subpath sequence library left in Map node memory, calculate local path sequence pattern, with <key, the form that value> is right passes to Reduce node, wherein key is local path sequence pattern, value is the support counting of local path sequence pattern,

It is as follows that each Map node performs the GSP algorithm improved,

Global path sequence pattern module, original path sequence library in HDFS is left in overall candidate sequence mode counting for scanning transmission module on raw data base again, find out the sequence pattern meeting and be not less than the minimum support x preset, obtain global path sequence pattern;