A kind of vehicle crowding acquisition methods based on Hadoop
Technical field
The invention belongs to car networking communication technical field, more specifically, it is related to a kind of vehicle based on Hadoop to gather around
Squeeze degree acquisition methods.
Background technology
For many intelligent cities apply, recognize that the focus of the mobile vehicle of urban area is essential, and
The focus of vehicle can be described as vehicle crowding region high, and the focus of high crowding is typically the position of traffic congestion
Put.
Intuitively, it is how crowded have that the crowding of a point reflects a point, distributed unknown when vehicle collection
When measurement crowding become difficult, in order to quantify a crowding for local unknown object collection, we have proposed mobile cluster
Algorithm.
We assume that have one group of N number of vehicle deployment in a two-dimentional urban A, in these N number of vehicles, one small
Subset It is that we have the sensor group of complete cognition.However, we are not recognize for remainder
Know, our task is to use NSReport infer its remaining part N/NSThe state of object, particularly their space attribute.
Assuming that city A is a given region, sensor object NSAny mobile, their mobility is inadvertently provided
A kind of crowded situation near sensing A, these sensors their current shape of every five seconds for example regular reporting in an asynchronous manner
State.
This report is that a form φ for 6 tuples=(m, x, y, v, β, t), m ∈ N are that an object identifier is uniquely known
The sensor of report is not sent, and (x, y) ∈ A are the current locations of m, and N represents the instantaneous velocity of m, and β is that a binary value is represented
Whether vehicle unloaded, and t is the time for sending report because report every five seconds for example is sent back to coming, it is average it is per second we can obtain five points
One of sensor report, these report be referred to as sensing report, in order to increase the granularity of our sample data sets, except these are true
Positive sensing report, within the time without report, we use linear interpolation method estimating vehicle states, and we represent all
Report collection
Table 1 lists the conventional sign for this work.
Table 1 is conventional sign and description
Next we define and are used for quantifying the crowding function of place crowding, and a simple method quantifies one
The crowding of point is speed, the linear function of minimum and maximum speed using instantaneous this point.
Define 1:Collect Φ, normal-moveout spectrum to defining a physical location (x, y) and reportingBe report collection Φ in position ι
The car speed collection of all reports of time t, i.e.
The normal-moveout spectrum of all times can be write asBecause the time is discrete, Φ includes Finite Number
The report of amount, such normal-moveout spectrum is also limited.
Define 2:Ground spot speedIt is defined as the average of position ι time t normal-moveout spectrums.
Define 3:The linear crowding of given place ιIt is defined as the complementary comparing phase along common customs of normal-moveout spectrum
The index rolling average of logarithm.
vmax(ι) is the maximal rate of ι, vmin(ι) is the minimum speed of ι, αιAnd τιIt is obtain position ι dynamics two
Parameter.
Crowded linear function is that based on an implicit hypothesis, i.e. the speed of vehicle is generally evenly distributed in normal-moveout spectrum, it
It is not necessarily accurate in practice.
Define 4:The statistics crowding of given place ι=(x, y)It is defined as the complementation of the instantaneous velocity of normal-moveout spectrum
The index rolling average of Cumulative Distribution Function, i.e.
P(v≤v(t)(ι)),v∈VΦ(ι) represents position ι less than or equal to instantaneous average speedThe probability of speed.
General is achieved in that by performing algorithm order to do.
However, mobile cluster algorithm is serial implementation, first by MapReduce()The GPS of function pair statistics
Information carries out time and compartmentalization treatment, and the GPS information in synchronization t is divided into several regions, then calculates respectively each
Crowding in individual region, next region is calculated after the statistics crowding and linear crowding in region has been calculated again,
Until having calculated the Zone Full in t, the statistics crowding and linear crowding of time t+1 are then calculated, temporally increased
Order long is calculated successively, this utilization sensor object technology, dynamic access vehicles identifications, position, instantaneous velocity, advance
The information such as direction, instant time, with the increase in traffic data acquisition source, real-time transport information substantial amounts, this can cause
Calculate that the spent time is long, efficiency is low, be not suitable for for analyzing traffic in actual environment.
The content of the invention
It is an object of the invention to overcome the deficiencies in the prior art, there is provided a kind of vehicle crowding based on Hadoop is obtained
Method, the GPS message of statistics is processed by multiple MapReduce tasks in parallel, so that the calculating time is saved, while have to expand
The performance of exhibition, high efficiency and high fault tolerance.
For achieving the above object, a kind of vehicle crowding acquisition methods based on Hadoop of the present invention, its feature exists
In comprising the following steps:
(1), by MapReduce programming models process statistics;
The GPS information of statistics is input into, and GPS information is divided into by N number of data slot, Map by MapReduce programming models
()When function reads data slot, a record, i.e. a line GPS information of data slot are read every time, continue and take a note
Next record, Map are read after record again()This row record that function will read resolves into key-value pair<Kn,Vn>, n=1,2 ... N,
In KnIt is middle to be input into this row record initial character side-play amount hereof, in VnThe middle content for being input into this row message, while one, statement
Array 1, the time time needed during counting statistics crowding and linear crowding, longitude long, latitude lat, speed speed
It is stored in array 1, Hadoop is that each data slot creates a Map task, using parallel meter between all of Map tasks
Calculation mode, Hadoop can create Reduce tasks according to the quantity of the Reduce tasks set in configuration file simultaneously,
Reduce tasks are based on Reduce()What function was performed, be also executed in parallel between multiple Reduce tasks;
(2), first MapReduce()Function carries out time and compartmentalization treatment:
(2、1), in first Map()The longitude in array 1, latitude are processed in function, the region to studying is entered
Row gridding;It is i to longitude, latitude is marked for region site (i, j) of j, then first Map()Function exports corresponding Kn
It is site (i, j), corresponding VnIt is time time, speed speed;
(2、2), in Reduce()In function, relevant speed parameter of the vehicle in synchronization is calculated respectively;By longitude, latitude
Degree identical KnCorresponding all VnA list chained list is merged into, is formed<Kn,list(Vn)>Group, while stating an array
2, speed speed is extracted and is stored in array 2, ι=site (i, j) is made, then according to formulaSeek average speed of all vehicles in moment time in region site (i, j)Maximal rate v of all vehicles in time time speed in zoning site (i, j) againmax(ι) and minimum speed
vmin(ι), all vehicles are in ratio P (v≤v of the time time speed less than average speed in last zoning site (i, j)(t)(ι)),v∈VΦ(ι), then first Reduce()The corresponding K of output of functionnBe worth is site (i, j), corresponding VnFor
Time, vmax(ι), vmin(ι),P(v≤v(t)(ι)),v∈VΦ(ι), will finally export the distribution for being stored in Hadoop
On formula file system hdfs;
(3), second MapReduce()The GPS information of function pair record carries out connection treatment:
(3、1), by first MapReduce()The statistics that the output of function and last moment time-1 are calculated is crowded
The value of degree and linear crowding is input to second MapReduce()In function;
(3、2), second Map()Every record packing that function will read, allows it to be coupled in Reduce sides;
Second Map()During every record that function treatment is read, with every record of data source marking, then record is packed and marked
Note, at this moment contains original record and data source label, second Map in record bag()Function is then each record bag setting group key
Value;Second Map()The output of function is registered as key/value pair, and button carrys out subregion, for being coupled, second Map()Function
A record bag is then exported, using group key as the key being coupled;
(3、3), second Reduce()Function receives second Map()The output data of function, and its value has been carried out
Full crossed product:
Second Reduce()All records of the identical link button of function pair are processed together, and original note is obtained by unpacking
Record, and according to the data source recorded obtained by label, second Reduce()The input data values that function pair is received are carried out
Complete intersection product, as second Reduce()When the input data values that function is received have different labels respectively, intersection multiplies
Product is exactly the original collection of these input datas, and each amalgamation result that crossed product is obtained is admitted to Combine () function,
The result that then Combine () function is obtained is:
Site (i, j), time, vmax(ι), vmin(ι),P(v≤v(t)(ι)),v∈VΦ(ι), site (i, j),
Time-1,
Wherein, t is moment time, (t- ιι) represent subsequent time time-1, ιιIt is the dynamic parameter for obtaining position ι, respectively
The time time of bar record is identical, as second Reduce()The input data values that function is received do not contain label or label phase
Together, then Combine () function abandons its amalgamation result;
(4), by the 3rd MapReduce()Map in function()Function counting statistics crowding and linear crowding:
In the 3rd Map()Every V of record is input into functionnValue:
Site (i, j), time, vmax(ι), vmin(ι),P(v≤v(t)(ι)),v∈VΦ(ι), site (i, j),
Time-1,
According to formulaIt is calculated the linear of moment time
Crowding;
Further according to formulaIt is calculated moment time
Statistics crowding;Wherein, αιAnd ιιIt is the dynamic parameter for obtaining position ι.
Wherein, described VnThe content of middle input message includes:Car number no, gps Message Record sequence number, vehicle ID:
C-objid, license plate number c-regnum, the color c-regcolor of car, the time time for receiving this gps message, the moment car
Longitude long, the latitude lat of the moment vehicle position, the speed speed of the moment vehicle, the taxi of position
Travel direction direction, the height height of vehicle position, the moment vehicle whether zero load state.
What goal of the invention of the invention was realized in:
Vehicle crowding acquisition methods of the present invention based on Hadoop, are processed by multiple MapReduce tasks in parallel and united
The GPS message of meter, then by crowding calculate function by regional synchronization statistics crowding and linear crowding
Concurrently calculate, this overcome mobile cluster algorithm and calculate time defect long, substantially increase computational efficiency, be based on
The algorithm of Hadoop can automatically save data trnascription and extension computer cluster simultaneously, while having high scalability and fault-tolerant
Property.
Meanwhile, vehicle crowding acquisition methods of the present invention based on Hadoop also have the advantages that:
(1), high efficiency;When mass data is processed, the vehicle crowding acquisition methods based on Hadoop are calculated than mobile cluster
The time of method operation is few, and computational efficiency is greatly improved, additionally, Hadoop can dynamically mobile data among the nodes, to protect
The dynamic equilibrium of each node is demonstrate,proved, therefore its processing speed is very fast;
(2), high scalability;Algorithm based on Hadoop is to distribute data between available computer cluster and complete to calculate to appoint
Business, these clusters can be easily extended in thousands of nodes;
(3), high fault tolerance;Algorithm based on Hadoop can automatically save the multiple copies of data, and can be automatic
The task of failure is redistributed.
Brief description of the drawings
Fig. 1 is vehicle crowding acquisition methods flow chart of the present invention based on Hadoop;
Fig. 2 is the program runtime contrast column diagram based on the present invention and based on mobile cluster algorithm.
Specific embodiment
Specific embodiment of the invention is described below in conjunction with the accompanying drawings, so as to those skilled in the art preferably
Understand the present invention.Requiring particular attention is that, in the following description, when known function and design detailed description perhaps
When can desalinate main contents of the invention, these descriptions will be ignored herein.
Embodiment
Fig. 1 is vehicle crowding acquisition methods flow chart of the present invention based on Hadoop.
In the present embodiment, it is as shown in figure 1, the present invention is based on the vehicle crowding acquisition methods of Hadoop including following
Step:
S101, by MapReduce programming models process statistics;The GPS information of statistics is input into, and is passed through
GPS information is divided into N number of data slot, Map by MapReduce programming models()When function reads data slot, one is read every time
Bar is recorded, i.e. a line GPS message of data slot, to be continued and read next record, Map again after taking a record()Function will
This row record for reading resolves into key-value pair<Kn,Vn>, n=1,2 ... N, in KnMiddle this row of input record initial character is hereof
Side-play amount, in VnThe middle content for being input into this row message, wherein, VnThe content of middle input message includes:Car number no, gps disappears
Breath LSN, vehicle ID:C-objid, license plate number c-regnum, the color c-regcolor of car, receive this gps message
Time time, the longitude long of the moment vehicle position, the latitude lat of the moment vehicle position, the moment vehicle
Speed speed, the travel direction direction of taxi, the height height of vehicle position, the moment vehicle whether
Unloaded state.An array 1, time time, the longitude needed during counting statistics crowding and linear crowding are stated simultaneously
Long, latitude lat, speed speed are stored in array 1, and Hadoop is that each data slot creates a Map task, is owned
Map tasks between use parallel computation mode, Hadoop simultaneously can according in configuration file set Reduce tasks number
Measure to create Reduce tasks, Reduce tasks are based on Reduce()What function was performed, be also between multiple Reduce tasks
Executed in parallel;
S102, first Map()Function carries out time and compartmentalization treatment;In first Map()To in array 1 in function
Longitude, latitude processed, to study region carry out gridding;It is i to longitude, latitude is made for region site (i, j) of j
Mark, then first Map()Function exports corresponding KnIt is site (i, j), corresponding VnIt is time time, speed speed;
S103, in first Reduce()In function, relevant speed parameter of the vehicle in synchronization is calculated respectively;Will be through
Degree, latitude identical KnCorresponding all VnA list chained list is merged into, is formed<Kn,list(Vn)>Group, while one, statement
Array 2, speed speed is extracted and is stored in array 2, ι=site (i, j) is made, then according to formulaSeek average speed of all vehicles in moment time in region site (i, j)Maximal rate v of all vehicles in time time speed in zoning site (i, j) againmax(ι) and minimum speed
vmin(ι), all vehicles are in ratio P (v≤v of the time time speed less than average speed in last zoning site (i, j)(t)(ι)),v∈VΦ(ι), then first Reduce()The corresponding K of output of functionnBe worth is site (i, j), corresponding VnFor
Time, vmax(ι), vmin(ι),P(v≤v(t)(ι)),v∈VΦ(ι), will finally export the distribution for being stored in Hadoop
On formula file system hdfs;
S104, second Map()The GPS information of function pair record is coupled;By first MapReduce()Function
The statistics crowding and the value of linear crowding that output and last moment time-1 are calculated are input to second MapReduce
()In function, second Map()Every read is recorded and packed by function, it is coupled in Reduce sides;The
Two Map()During every record that function treatment is read, with every record of data source marking, then record is packed and marked,
At this moment original record and data source label, second Map are contained in record bag()Function is then each record bag setting group key assignments;
Second Map()The output of function is registered as key/value pair, and button carrys out subregion, for being coupled, second Map()Function is then
One record bag of output, using group key as the key being coupled;
S105, second Reduce()Function receives second Map()The output data of function, and its value is carried out completely
Crossed product;Second Reduce()All records of the identical link button of function pair are processed together, and original note is obtained by unpacking
Record, and according to the data source recorded obtained by label, second Reduce()The input data values that function pair is received are carried out
Complete intersection product, as second Reduce()When the input data values that function is received have different labels respectively, intersection multiplies
Product is exactly the original collection of these input datas, and each amalgamation result that crossed product is obtained is admitted to Combine () function,
The result that then Combine () function is obtained is:
Site (i, j), time, vmax(ι), vmin(ι),P(v≤v(t)(ι)),v∈VΦ(ι), site (i, j),
Time-1,
Wherein, t is moment time, (t- ιι) represent subsequent time time-1, ιιIt is the dynamic parameter for obtaining position ι, respectively
The time time of bar record is identical, as second Reduce()The input data values that function is received do not contain label or label phase
Together, then Combine () function abandons its amalgamation result;
S106, by the 3rd MapReduce()Map in function()Function counting statistics crowding is crowded with linear
Degree;In the 3rd Map()Every V of record is input into functionnValue:
Site (i, j), time, vmax(ι), vmin(ι),P(v≤v(t)(ι)),v∈VΦ(ι), site (i, j),
Time-1,
According to formulaIt is calculated the linear of moment time
Crowding;
Further according to formulaIt is calculated moment time
Statistics crowding;Wherein, αιAnd ιιIt is the dynamic parameter for obtaining position ι.
Fig. 2 is the program runtime contrast column diagram based on the present invention and based on mobile cluster algorithm.
In the present embodiment, as shown in Fig. 2 series one is the method based on mobility cluster, series two is based on this hair
Bright method;Abscissa is followed successively by the time used by 1,3,6,10,14,18,23 crowdings of hour of calculating;Ordinate unit is
Millisecond.We can see that the data volume for calculating is bigger, the time phase used based on the present invention than the method based on mobile cluster
Poor is more, and efficiency is higher, and superiority is more obvious.
Although being described to illustrative specific embodiment of the invention above, in order to the technology of the art
Personnel understand the present invention, it should be apparent that the invention is not restricted to the scope of specific embodiment, to the common skill of the art
For art personnel, as long as various change is in appended claim restriction and the spirit and scope of the present invention for determining, these
Change is it will be apparent that all utilize the innovation and creation of present inventive concept in the row of protection.