CN103646541B

CN103646541B - Vehicle congestion degree acquiring method based on Hadoop

Info

Publication number: CN103646541B
Application number: CN201310688009.1A
Authority: CN
Inventors: 廖丹; 卜思桐; 孙罡; 陆川; 虞红芳; 许都
Original assignee: University of Electronic Science and Technology of China; Guangdong Electronic Information Engineering Research Institute of UESTC
Current assignee: University of Electronic Science and Technology of China; Guangdong Electronic Information Engineering Research Institute of UESTC
Priority date: 2013-12-16
Filing date: 2013-12-16
Publication date: 2017-05-24
Anticipated expiration: 2033-12-16
Also published as: CN103646541A

Abstract

The invention discloses a vehicle congestion degree acquiring method based on Hadoop. Parallel processing on statistical GPS messages is carried out through multiple MapReduce tasks, parallel calculation of the statistical congestion degree and the linearity congestion degree of each area at one time is carried out through a congestion degree calculation function, so a problem of long calculation time existing in a mobile cluster algorithm is overcome, and calculation efficiency is greatly improved. The vehicle congestion degree acquiring method based on Hadoop can simultaneously realize automatic storing of data copy and automatic expansion of a computer cluster and further has properties of high expansibility and high fault tolerance performance.

Description

A kind of vehicle crowding acquisition methods based on Hadoop

Technical field

The invention belongs to car networking communication technical field, more specifically, it is related to a kind of vehicle based on Hadoop to gather around Squeeze degree acquisition methods.

Background technology

For many intelligent cities apply, recognize that the focus of the mobile vehicle of urban area is essential, and The focus of vehicle can be described as vehicle crowding region high, and the focus of high crowding is typically the position of traffic congestion Put.

Intuitively, it is how crowded have that the crowding of a point reflects a point, distributed unknown when vehicle collection When measurement crowding become difficult, in order to quantify a crowding for local unknown object collection, we have proposed mobile cluster Algorithm.

We assume that have one group of N number of vehicle deployment in a two-dimentional urban A, in these N number of vehicles, one small Subset It is that we have the sensor group of complete cognition.However, we are not recognize for remainder Know, our task is to use N_SReport infer its remaining part N/N_SThe state of object, particularly their space attribute.

Assuming that city A is a given region, sensor object N_SAny mobile, their mobility is inadvertently provided A kind of crowded situation near sensing A, these sensors their current shape of every five seconds for example regular reporting in an asynchronous manner State.

This report is that a form φ for 6 tuples=(m, x, y, v, β, t), m ∈ N are that an object identifier is uniquely known The sensor of report is not sent, and (x, y) ∈ A are the current locations of m, and N represents the instantaneous velocity of m, and β is that a binary value is represented Whether vehicle unloaded, and t is the time for sending report because report every five seconds for example is sent back to coming, it is average it is per second we can obtain five points One of sensor report, these report be referred to as sensing report, in order to increase the granularity of our sample data sets, except these are true Positive sensing report, within the time without report, we use linear interpolation method estimating vehicle states, and we represent all Report collection

Table 1 lists the conventional sign for this work.

Table 1 is conventional sign and description

Next we define and are used for quantifying the crowding function of place crowding, and a simple method quantifies one The crowding of point is speed, the linear function of minimum and maximum speed using instantaneous this point.

Define 1：Collect Φ, normal-moveout spectrum to defining a physical location (x, y) and reportingBe report collection Φ in position ι The car speed collection of all reports of time t, i.e.

The normal-moveout spectrum of all times can be write asBecause the time is discrete, Φ includes Finite Number The report of amount, such normal-moveout spectrum is also limited.

Define 2：Ground spot speedIt is defined as the average of position ι time t normal-moveout spectrums.

Define 3：The linear crowding of given place ιIt is defined as the complementary comparing phase along common customs of normal-moveout spectrum The index rolling average of logarithm.

v_max(ι) is the maximal rate of ι, v_min(ι) is the minimum speed of ι, α_ιAnd τ_ιIt is obtain position ι dynamics two Parameter.

Crowded linear function is that based on an implicit hypothesis, i.e. the speed of vehicle is generally evenly distributed in normal-moveout spectrum, it It is not necessarily accurate in practice.

Define 4：The statistics crowding of given place ι=(x, y)It is defined as the complementation of the instantaneous velocity of normal-moveout spectrum The index rolling average of Cumulative Distribution Function, i.e.

P(v≤v^(t)(ι)),v∈V_Φ(ι) represents position ι less than or equal to instantaneous average speedThe probability of speed.

General is achieved in that by performing algorithm order to do.

However, mobile cluster algorithm is serial implementation, first by MapReduce（）The GPS of function pair statistics Information carries out time and compartmentalization treatment, and the GPS information in synchronization t is divided into several regions, then calculates respectively each Crowding in individual region, next region is calculated after the statistics crowding and linear crowding in region has been calculated again, Until having calculated the Zone Full in t, the statistics crowding and linear crowding of time t+1 are then calculated, temporally increased Order long is calculated successively, this utilization sensor object technology, dynamic access vehicles identifications, position, instantaneous velocity, advance The information such as direction, instant time, with the increase in traffic data acquisition source, real-time transport information substantial amounts, this can cause Calculate that the spent time is long, efficiency is low, be not suitable for for analyzing traffic in actual environment.

The content of the invention

It is an object of the invention to overcome the deficiencies in the prior art, there is provided a kind of vehicle crowding based on Hadoop is obtained Method, the GPS message of statistics is processed by multiple MapReduce tasks in parallel, so that the calculating time is saved, while have to expand The performance of exhibition, high efficiency and high fault tolerance.

For achieving the above object, a kind of vehicle crowding acquisition methods based on Hadoop of the present invention, its feature exists In comprising the following steps：

（1）, by MapReduce programming models process statistics；

The GPS information of statistics is input into, and GPS information is divided into by N number of data slot, Map by MapReduce programming models （）When function reads data slot, a record, i.e. a line GPS information of data slot are read every time, continue and take a note Next record, Map are read after record again（）This row record that function will read resolves into key-value pair<K_n,V_n>, n=1,2 ... N, In K_nIt is middle to be input into this row record initial character side-play amount hereof, in V_nThe middle content for being input into this row message, while one, statement Array 1, the time time needed during counting statistics crowding and linear crowding, longitude long, latitude lat, speed speed It is stored in array 1, Hadoop is that each data slot creates a Map task, using parallel meter between all of Map tasks Calculation mode, Hadoop can create Reduce tasks according to the quantity of the Reduce tasks set in configuration file simultaneously, Reduce tasks are based on Reduce（）What function was performed, be also executed in parallel between multiple Reduce tasks；

（2）, first MapReduce（）Function carries out time and compartmentalization treatment：

（2、1）, in first Map（）The longitude in array 1, latitude are processed in function, the region to studying is entered Row gridding；It is i to longitude, latitude is marked for region site (i, j) of j, then first Map（）Function exports corresponding K_n It is site (i, j), corresponding V_nIt is time time, speed speed；

（2、2）, in Reduce（）In function, relevant speed parameter of the vehicle in synchronization is calculated respectively；By longitude, latitude Degree identical K_nCorresponding all V_nA list chained list is merged into, is formed<K_n,list(V_n)>Group, while stating an array 2, speed speed is extracted and is stored in array 2, ι=site (i, j) is made, then according to formulaSeek average speed of all vehicles in moment time in region site (i, j)Maximal rate v of all vehicles in time time speed in zoning site (i, j) again_max(ι) and minimum speed v_min(ι), all vehicles are in ratio P (v≤v of the time time speed less than average speed in last zoning site (i, j)^(t)(ι)),v∈V_Φ(ι), then first Reduce（）The corresponding K of output of function_nBe worth is site (i, j), corresponding V_nFor Time, v_max(ι), v_min(ι),P(v≤v^(t)(ι)),v∈V_Φ(ι), will finally export the distribution for being stored in Hadoop On formula file system hdfs；

（3）, second MapReduce（）The GPS information of function pair record carries out connection treatment：

（3、1）, by first MapReduce（）The statistics that the output of function and last moment time-1 are calculated is crowded The value of degree and linear crowding is input to second MapReduce（）In function；

（3、2）, second Map（）Every record packing that function will read, allows it to be coupled in Reduce sides； Second Map（）During every record that function treatment is read, with every record of data source marking, then record is packed and marked Note, at this moment contains original record and data source label, second Map in record bag（）Function is then each record bag setting group key Value；Second Map（）The output of function is registered as key/value pair, and button carrys out subregion, for being coupled, second Map（）Function A record bag is then exported, using group key as the key being coupled；

（3、3）, second Reduce（）Function receives second Map（）The output data of function, and its value has been carried out Full crossed product：

Second Reduce（）All records of the identical link button of function pair are processed together, and original note is obtained by unpacking Record, and according to the data source recorded obtained by label, second Reduce（）The input data values that function pair is received are carried out Complete intersection product, as second Reduce（）When the input data values that function is received have different labels respectively, intersection multiplies Product is exactly the original collection of these input datas, and each amalgamation result that crossed product is obtained is admitted to Combine () function, The result that then Combine () function is obtained is：

Site (i, j), time, v_max(ι), v_min(ι),P(v≤v^(t)(ι)),v∈V_Φ(ι), site (i, j), Time-1,

Wherein, t is moment time, (t- ι_ι) represent subsequent time time-1, ι_ιIt is the dynamic parameter for obtaining position ι, respectively The time time of bar record is identical, as second Reduce（）The input data values that function is received do not contain label or label phase Together, then Combine () function abandons its amalgamation result；

（4）, by the 3rd MapReduce（）Map in function（）Function counting statistics crowding and linear crowding：

In the 3rd Map（）Every V of record is input into function_nValue：

According to formulaIt is calculated the linear of moment time Crowding；

Further according to formulaIt is calculated moment time Statistics crowding；Wherein, α_ιAnd ι_ιIt is the dynamic parameter for obtaining position ι.

Wherein, described V_nThe content of middle input message includes：Car number no, gps Message Record sequence number, vehicle ID: C-objid, license plate number c-regnum, the color c-regcolor of car, the time time for receiving this gps message, the moment car Longitude long, the latitude lat of the moment vehicle position, the speed speed of the moment vehicle, the taxi of position Travel direction direction, the height height of vehicle position, the moment vehicle whether zero load state.

What goal of the invention of the invention was realized in：

Vehicle crowding acquisition methods of the present invention based on Hadoop, are processed by multiple MapReduce tasks in parallel and united The GPS message of meter, then by crowding calculate function by regional synchronization statistics crowding and linear crowding Concurrently calculate, this overcome mobile cluster algorithm and calculate time defect long, substantially increase computational efficiency, be based on The algorithm of Hadoop can automatically save data trnascription and extension computer cluster simultaneously, while having high scalability and fault-tolerant Property.

Meanwhile, vehicle crowding acquisition methods of the present invention based on Hadoop also have the advantages that：

（1）, high efficiency；When mass data is processed, the vehicle crowding acquisition methods based on Hadoop are calculated than mobile cluster The time of method operation is few, and computational efficiency is greatly improved, additionally, Hadoop can dynamically mobile data among the nodes, to protect The dynamic equilibrium of each node is demonstrate,proved, therefore its processing speed is very fast；

（2）, high scalability；Algorithm based on Hadoop is to distribute data between available computer cluster and complete to calculate to appoint Business, these clusters can be easily extended in thousands of nodes；

（3）, high fault tolerance；Algorithm based on Hadoop can automatically save the multiple copies of data, and can be automatic The task of failure is redistributed.

Brief description of the drawings

Fig. 1 is vehicle crowding acquisition methods flow chart of the present invention based on Hadoop；

Fig. 2 is the program runtime contrast column diagram based on the present invention and based on mobile cluster algorithm.

Specific embodiment

Specific embodiment of the invention is described below in conjunction with the accompanying drawings, so as to those skilled in the art preferably Understand the present invention.Requiring particular attention is that, in the following description, when known function and design detailed description perhaps When can desalinate main contents of the invention, these descriptions will be ignored herein.

Embodiment

Fig. 1 is vehicle crowding acquisition methods flow chart of the present invention based on Hadoop.

In the present embodiment, it is as shown in figure 1, the present invention is based on the vehicle crowding acquisition methods of Hadoop including following Step：

S101, by MapReduce programming models process statistics；The GPS information of statistics is input into, and is passed through GPS information is divided into N number of data slot, Map by MapReduce programming models（）When function reads data slot, one is read every time Bar is recorded, i.e. a line GPS message of data slot, to be continued and read next record, Map again after taking a record（）Function will This row record for reading resolves into key-value pair<K_n,V_n>, n=1,2 ... N, in K_nMiddle this row of input record initial character is hereof Side-play amount, in V_nThe middle content for being input into this row message, wherein, V_nThe content of middle input message includes：Car number no, gps disappears Breath LSN, vehicle ID:C-objid, license plate number c-regnum, the color c-regcolor of car, receive this gps message Time time, the longitude long of the moment vehicle position, the latitude lat of the moment vehicle position, the moment vehicle Speed speed, the travel direction direction of taxi, the height height of vehicle position, the moment vehicle whether Unloaded state.An array 1, time time, the longitude needed during counting statistics crowding and linear crowding are stated simultaneously Long, latitude lat, speed speed are stored in array 1, and Hadoop is that each data slot creates a Map task, is owned Map tasks between use parallel computation mode, Hadoop simultaneously can according in configuration file set Reduce tasks number Measure to create Reduce tasks, Reduce tasks are based on Reduce（）What function was performed, be also between multiple Reduce tasks Executed in parallel；

S102, first Map（）Function carries out time and compartmentalization treatment；In first Map（）To in array 1 in function Longitude, latitude processed, to study region carry out gridding；It is i to longitude, latitude is made for region site (i, j) of j Mark, then first Map（）Function exports corresponding K_nIt is site (i, j), corresponding V_nIt is time time, speed speed；

S103, in first Reduce（）In function, relevant speed parameter of the vehicle in synchronization is calculated respectively；Will be through Degree, latitude identical K_nCorresponding all V_nA list chained list is merged into, is formed<K_n,list(V_n)>Group, while one, statement Array 2, speed speed is extracted and is stored in array 2, ι=site (i, j) is made, then according to formulaSeek average speed of all vehicles in moment time in region site (i, j)Maximal rate v of all vehicles in time time speed in zoning site (i, j) again_max(ι) and minimum speed v_min(ι), all vehicles are in ratio P (v≤v of the time time speed less than average speed in last zoning site (i, j)^(t)(ι)),v∈V_Φ(ι), then first Reduce（）The corresponding K of output of function_nBe worth is site (i, j), corresponding V_nFor Time, v_max(ι), v_min(ι),P(v≤v^(t)(ι)),v∈V_Φ(ι), will finally export the distribution for being stored in Hadoop On formula file system hdfs；

S104, second Map（）The GPS information of function pair record is coupled；By first MapReduce（）Function The statistics crowding and the value of linear crowding that output and last moment time-1 are calculated are input to second MapReduce （）In function, second Map（）Every read is recorded and packed by function, it is coupled in Reduce sides；The Two Map（）During every record that function treatment is read, with every record of data source marking, then record is packed and marked, At this moment original record and data source label, second Map are contained in record bag（）Function is then each record bag setting group key assignments； Second Map（）The output of function is registered as key/value pair, and button carrys out subregion, for being coupled, second Map（）Function is then One record bag of output, using group key as the key being coupled；

S105, second Reduce（）Function receives second Map（）The output data of function, and its value is carried out completely Crossed product；Second Reduce（）All records of the identical link button of function pair are processed together, and original note is obtained by unpacking Record, and according to the data source recorded obtained by label, second Reduce（）The input data values that function pair is received are carried out Complete intersection product, as second Reduce（）When the input data values that function is received have different labels respectively, intersection multiplies Product is exactly the original collection of these input datas, and each amalgamation result that crossed product is obtained is admitted to Combine () function, The result that then Combine () function is obtained is：

S106, by the 3rd MapReduce（）Map in function（）Function counting statistics crowding is crowded with linear Degree；In the 3rd Map（）Every V of record is input into function_nValue：

According to formulaIt is calculated the linear of moment time Crowding；

In the present embodiment, as shown in Fig. 2 series one is the method based on mobility cluster, series two is based on this hair Bright method；Abscissa is followed successively by the time used by 1,3,6,10,14,18,23 crowdings of hour of calculating；Ordinate unit is Millisecond.We can see that the data volume for calculating is bigger, the time phase used based on the present invention than the method based on mobile cluster Poor is more, and efficiency is higher, and superiority is more obvious.

Although being described to illustrative specific embodiment of the invention above, in order to the technology of the art Personnel understand the present invention, it should be apparent that the invention is not restricted to the scope of specific embodiment, to the common skill of the art For art personnel, as long as various change is in appended claim restriction and the spirit and scope of the present invention for determining, these Change is it will be apparent that all utilize the innovation and creation of present inventive concept in the row of protection.

Claims

1. a kind of vehicle crowding acquisition methods based on Hadoop, it is characterised in that comprise the following steps：

(1) statistics, is processed by MapReduce programming models；

The GPS information of statistics is input into, and GPS information is divided into by N number of data slot, Map () letter by MapReduce programming models When number reads data slot, a record, i.e. a line GPS information of data slot are read every time, continue after taking a record Next record is read again, and this row record that Map () function will read resolves into key-value pair<K_n,V_n>, n=1,2 ... N, in K_n It is middle to be input into this row record initial character side-play amount hereof, in V_nThe middle content for being input into this row message, while stating a number Group 1, the time time needed during counting statistics crowding and linear crowding, longitude long, latitude lat, speed speed are deposited Enter in array 1, Hadoop is that each data slot creates a Map task, and parallel computation is used between all of Map tasks Mode, Hadoop can create Reduce tasks, Reduce according to the quantity of the Reduce tasks set in configuration file simultaneously Task is performed based on Reduce () function, is also executed in parallel between multiple Reduce tasks；

(2), first MapReduce () function carries out time and compartmentalization treatment：

(2,1), in first Map () function the longitude in array 1, latitude are processed, the region to studying carries out net Format；It is i to longitude, latitude is marked for region site (i, j) of j, then first Map () function exports corresponding K_nFor Site (i, j), corresponding V_nIt is time time, speed speed；

(2,2), in Reduce () function, relevant speed parameter of the vehicle in synchronization is calculated respectively；By longitude, latitude phase Same K_nCorresponding all V_nA list chained list is merged into, is formed<K_n,list(V_n)>Group, while an array 2 is stated, will Speed speed is extracted and is stored in array 2, ι=site (i, j) is made, then according to formulaSeek average speed of all vehicles in moment time in region site (i, j)Maximal rate vmax (ι) and minimum speed of all vehicles in time time speed in zoning site (i, j) again v_min(ι), all vehicles are in ratio P (v≤v of the time time speed less than average speed in last zoning site (i, j)^(t)(ι)),v∈V_Φ(ι), then first corresponding K of output of Reduce () function_nBe worth is site (i, j), corresponding V_nFor Time, v_max(ι), v_min(ι),P(v≤v(t)(ι)),v∈V_Φ(ι), will finally export the distribution for being stored in Hadoop On formula file system hdfs；

(3), the GPS information of second MapReduce () function pair record carries out connection treatment：

(3,1), the statistics crowding that the output of first MapReduce () function and last moment time-1 are calculated and The value of linear crowding is input in second MapReduce () function；

Every record packing that (3,2), second Map () function will read, makes it be coupled in Reduce sides；Second During every record that the treatment of Map () function is read, with every record of data source marking, then record is packed and marked, at this moment Contain original record and data source label in record bag, second Map () function is then each record bag setting group key assignments；The The output of two Map () functions is registered as key/value pair, and button carrys out subregion, and for being coupled, second Map () function is then defeated Go out a record bag, using group key as the key being coupled；

(3,3), second Reduce () function receive second output data of Map () function, and its value are handed over completely Cross product：

All records of the identical link button of second Reduce () function pair are processed together, and original record is obtained by unpacking, with And according to the data source recorded obtained by label, the input data values that second Reduce () function pair is received are carried out completely Crossed product, when the input data values that second Reduce () function is received have different labels respectively, crossed product is just It is the original collection of these input datas, each amalgamation result that crossed product is obtained is admitted to Combine () function, then The result that Combine () function is obtained is：

Site (i, j), time, v_max(ι), v_min(ι),P(v≤v^(t)(ι)),v∈V_Φ(ι), site (i, j), time- 1,

Wherein, t is moment time, (t- τ_ι) represent subsequent time time-1, τ_ιIt is the dynamic parameter for obtaining position ι, each bar note The time time of record is identical, when the input data values that second Reduce () function is received are identical not containing label or label, Then Combine () function abandons its amalgamation result；

(4), by Map () the function counting statistics crowding and linear crowding in the 3rd MapReduce () function：

Every V of record is input into the 3rd Map () function_nValue：

According to formulaIt is calculated the linear crowded of moment time Degree；

Further according to formulaIt is calculated the statistics of moment time Crowding；Wherein, α_ιAnd ι_ιIt is the dynamic parameter for obtaining position ι.

2. vehicle crowding acquisition methods based on Hadoop according to claim 1, it is characterised in that described V_nIn The content for being input into message includes：Car number no, GPS Message Record sequence number, vehicle ID:C-objid, license plate number c-regnum, The color c-regcolor of car, the time time for receiving this GPS message, the longitude long of the moment vehicle position, should The latitude lat of moment vehicle position, the speed speed of the moment vehicle, the travel direction direction of taxi, car The height height of position, the moment vehicle whether zero load state.