Background
Due to the inherent characteristics of space entities and space phenomena of the space where the space-time data is located in time, space and attributes, the space-time big data presents the complexity of multi-dimensional, semantic and space-time dynamic association. The space-time big data comprises three-dimensional information of time, space and special attributes, and has the comprehensive characteristics of multiple sources, mass and quick updating. The space-time data become key elements of smart city resources, and by researching formal expression of space-time big data multi-dimensional association description, dynamic modeling of association relation and analysis of a multi-scale association method, space-time big data information is mined and optimally configured, so that configuration of the city resources is optimized, excessive consumption and waste of the city resources are reduced, and decision support is provided.
With the rapid development of technologies such as mobile internet technology, spatial positioning technology, location service technology, big data technology, and cloud computing, the application of Intelligent Transportation System (ITS) becomes more and more important in daily life. At present, various traffic data acquisition technologies acquire massive space-time data in real time, and the positions of moving objects are predicted based on the space-time data, so that intelligent decisions and services are provided for traffic planning, traffic supervision and scheduling, more detailed, accurate and efficient services are provided for users, and accordingly coordinated development of technologies, society and people is achieved.
The markov model is a statistical analysis model, each state of which is represented as various states by some probability density distribution, and each state is generated by a state having a corresponding probability density distribution.
However, the value of the current intelligent traffic based on space-time big data is not fully mined, the space-time data in an intelligent traffic system cannot be efficiently stored, retrieved, analyzed and mined, and the prediction and study and judgment of the user behavior and traffic situation trajectory are lacked. The intelligent traffic contains space-time big data, and how to effectively predict the track and the position of a mobile user in the planning, construction and supervision of the intelligent traffic, so that decision support and help for traffic road planning, traffic scheduling and management, city planning, public service selection and security engineering are not well solved all the time.
The motion trajectory prediction technology mainly comprises motion trajectory data acquisition, denoising, characteristic parameter extraction, prediction model establishment, prediction identification decision and the like. According to the mobile position prediction method, mobile position data of a user are collected through mobile equipment, characteristic information of the user is extracted through data preprocessing, denoising, and a density-based combined clustering model and algorithm to construct an interest point sequence, and the position of the mobile user is predicted through constructing a mobile Markov model.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a prediction method based on a mobile Markov model under space-time big data in the space-time big data environment, so as to solve the problems of large data storage processing capacity, unsatisfactory prediction accuracy and precision and improve the accuracy and precision of the position prediction of a mobile user under the space-time big data environment.
In order to achieve the above object, the technical solution adopted by the present invention is as follows:
a prediction method based on a mobile Markov model under space-time big data is characterized by comprising the following steps:
the method comprises the following steps: carrying out denoising processing on the collected historical position data; denoising the collected historical position data, filtering interference data, filtering dynamic moving tracks and keeping static moving tracks;
step two: clustering the denoised data; performing joint density-based clustering processing on the static moving track obtained after denoising in the first step through a joint density clustering algorithm to obtain a cluster;
step three: establishing interest points aiming at the clustering clusters; extracting the behavior characteristics of the mobile user aiming at the cluster obtained in the second step, thereby establishing the interest point of the user;
step four: denoising the interest points; calculating the interest points in the third step, calculating the radius, interval time and density of each interest point, and simultaneously clustering each de-noised moving track contained in the interest points again, further filtering noise data in the interest points and reserving real interest points;
step five: establishing a mobile Markov model; establishing a state transition probability and a state transition probability matrix for each real interest point obtained in the fourth step;
step six: predicting a next location; after the mobile user data is collected, extracting the interest points of the mobile user through the first step, the second step and the fourth step, and realizing the prediction of the next position of the mobile user according to the mobile Markov model established in the fifth step.
Further, the denoising method in the first step includes: firstly, a static moving track is reserved, namely the speed of the static moving track is less than delta, wherein the delta is a predefined constant; when the speed of the moving track is larger than delta, the moving track is the dynamic moving track, and all the dynamic moving tracks are deleted;
further, performing cluster merging on the cluster in the second step, wherein the cluster merging method comprises the following steps: cluster C1={c1,c3,c7,c9And cluster C2={c9,c11,c12And e, merging the two cluster clusters into one cluster: c1∪C2={c1,c3,c7,c9,c11,c12}。
The beneficial effects produced by the invention are as follows:
the invention extracts the behavior characteristics of the user by denoising the position data, clustering based on joint density and cluster merging, thereby constructing the position point, and meanwhile constructing the mobile Markov model with the memory function based on the position point, and predicting the position of the mobile object according to the position point. A large amount of noise data and track data irrelevant to the predicted position can be removed through the first step, the second step, the third step and the fourth step, and the position prediction is realized only by depending on the interest points, so that the memory space of processing data needed by predicting a moving object in a space-time big data environment can be reduced, a prediction system is simplified, the prediction speed is improved, and in addition, the precision and the accuracy of the position prediction can be improved through a mobile Markov model with a memory function.
Detailed Description
The invention will be described in more detail below with reference to the drawings and specific examples, but the scope of the invention is not limited thereto.
As shown in fig. 1, a prediction method based on a mobile markov model under spatio-temporal big data includes the following steps:
the method comprises the following steps: and (6) data cleaning. And denoising the collected historical position data, filtering out a dynamic moving track, and keeping a static moving track.
Due to the change of the speed of the moving object, the precision of the positioning equipment is not high, and the acquired trajectory data of the moving user does not completely accord with the real situation; in addition, due to the stability problem of the device, the collected data often contains certain noise. Since the trajectory of the mobile user is a continuous signal in time, and the mobile markov chain is a discrete random process, the collected mobile trajectory data of the mobile user needs to be discretized, and noise data needs to be filtered, that is, data cleaning is needed.
Because of the huge amount of spatio-temporal data, the raw data must be processed before data processing. Given the euclidean space, the trajectory sequence T ═ T1,t2,···,tnAre discrete trace points arranged in time sequence, tiIs the ith trace point, ti=(xi,yi,ti),1<i<n。
Track segment TS ═ TS1,ts2,···,stkThe method is characterized in that the method is a discrete track sequence arranged according to a time sequence, and the track segments are continuous track pointsForming ordered discrete line segments.
Speed of discrete track segment
The velocity of the j-th segment of the track is represented, which is the average velocity of the discrete track segment composed of the nearest n discrete track points. To filter out noisy data, a static movement trajectory is first retained, i.e. if the movement trajectory of the j-th segment
The segment is a static moving track, where δ is a predefined constant, and since the moving track of the mobile user is dynamically changed, this constant is a local average value to adapt to the variable dynamic data, that is:
wherein the content of the first and second substances,
the speed of the j section of the nearest n movement track sections.
Simultaneously filtering out all dynamic movement trajectories, i.e. filtering out
Moving trajectory data of (ts)
jWith the previous movement track section ts
j-1And merging into a moving track segment.
Step two: and (3) clustering the static moving tracks reserved in the step one by using a joint density clustering algorithm, extracting the behavior characteristics of the mobile user, thereby establishing the interest points of the user, and simultaneously combining the interest points to enable the interest points to share the maximum common interest point.
1) Joint density based clustering method processing process
To categorize the remaining static movement trajectories, the concept of the domain is first defined: for a given track point p, taking the point p as the center of a circle, and taking the track point within the radius r as the r field of the track point p;
for all track point sets T, q is any track point, and for a given track point p, the density-based field of the point p is as follows: n (p) { q ∈ S | dist (p, q) ≦ r }.
The number of trace points in the r field of the trace point p is as follows: size (N (p)).
Joint density clustering:
then
The main idea of density clustering is to block the position big data and filter the noise data in the track points of the mobile users in order to reduce the data processing amount. The core implementation process is as follows: traversing the trace points of each mobile user in the position big data, generating clustering clusters through a clustering processing method based on joint density, and assuming the number of points in the field of one trace point p: size (N (p)) is not less than lambda, lambda is a predefined constant and represents the minimum number of track points in a class cluster, a new cluster C is created according to the determination of a specific problem, and the track point p is the core object of the cluster; if size (N (p) < lambda, the trace point p is noise data and needs to be filtered; and finally, carrying out cluster merging according to a cluster processing method of the joint density.
The specific process is as follows: initializing the cluster number n to be 0; traversing each trace point p in the set T of trace points; if size (N (p))<Lambda, the trace point is noise and needs to be filtered; if size (N (p) ≧ λ, a new cluster C is establishedi(ii) a Depth-first traversal of new cluster CiThe trace points p are subjected to clustering combination of joint densities based on the field of densities to obtain a cluster Cn(ii) a The method for merging the clustering clusters comprises the following steps: hypothesis Cluster C1={c1,c3,c7,c9And cluster C2={c9,c11,c12Merging the two clusters into a cluster Cn=C1∪C2={c1,c3,c7,c9,c11,c12Get cluster CnWherein c isiRepresenting the ith element in the cluster.
2) And constructing interest points, calculating the radius, interval time and density of each interest point, and clustering each de-noised track point again to further filter noise data in the interest points and keep the real interest points.
Once cluster clusters are formed, the radius raduis, access time interval, density, etc. of each cluster are determined, wherein: radius raduis is the distance from the cluster center to the farthest trace point; the access time interval is the interval time between the earliest access time and the latest access time; density is the number of trace points for a moving user within a cluster.
According to the behavior characteristics of the place when the user visits, corresponding semantic information is marked on each class cluster to form points of interest (POIs), namely positions. For example, a cluster may be marked as home, etc.; while each cluster is relative to a state in the mobile markov model. In the marking process, the cluster-like radius raduis and the density diversity are calculated, and the interest point C is obtainediAnd clustering again, if some static moving tracks do not belong to any cluster, the static moving tracks are marked as unknown, then all the static moving tracks marked as unknown are removed, all the continuous static moving tracks share the same label and are classified into a single event, and one event corresponds to one state. For example, 6 consecutive static moving tracks share one tag, which is considered as a static moving track marking point. And finally, finishing all static moving track marking work, recording the number m of the marked points, and forming a state transition probability matrix by the state transition probabilities among the m interest points.
The constructed points of interest are shown in fig. 2. In fig. 2, H denotes Home, W denotes Work, S denotes Sport, L denotes leave, and a connecting line with an arrow between points of interest denotes a transition probability between points of interest, that is, a state transition probability.
The POIs are arranged in descending order according to the density.
Step three: and for the real interest points reserved in the second step, constructing a mobile Markov model based on the real interest points.
Predicting the interest points of the mobile user, the probability of the user at a certain interest point must be determined, and if there are m interest points, the vector of the interest points can be expressed as: dm={dm,1,dm,2,···,dm,iWhen the ith state is mu, dm,i,μ1 is ═ 1; otherwise, dm,i,μ=0。
As shown in fig. 3, fig. 3 shows the point of interest transfer case, there are 5 points of interest D1-D5 in fig. 3, the point of interest at the first time point is D3, the point of interest at the second time point is transferred to D1, the point of interest at the third time point is transferred to D5, and the point of interest at the fourth time point is transferred to D2.
FIG. 4 illustrates a state vector diagram constructed according to the point of interest state transition scenario of FIG. 3, where D1-D5 correspond to the first-fifth elements in the vector, respectively, for example, the point of interest at the first time point is labeled D3, and the corresponding 3 rd element in the first vector is 1; the point of interest at the second point in time is labeled D1, corresponding to a 1 in the first element in the second vector; the point of interest at the third point in time is labeled D5, and the corresponding fifth element in the third vector is 1; the point of interest at the fourth point in time is labeled D2, and the corresponding second element in the fourth vector is 1.
Step four: and establishing a state transition probability, a state transition probability matrix and an n-MMC (n-MMCs, n Mobility Markov Chains) state transition probability matrix according to the mobile Markov model established in the third step.
According to training data, based on the data cleaning, the clustering processing method based on the joint density and the construction of the interest points, a mobile Markov model can be constructed, each interest point corresponds to an event, each event corresponds to a state in the mobile Markov model, and if the interest points m are, the integral state transition probability is as follows:
wherein the content of the first and second substances,
from this a state transition probability matrix can be constructed:
as shown in fig. 2, the state transition probability matrix of the finally constructed mobile markov model according to the constructed interest points is as follows:
standard Mobile Markov Chains (MMC) are memoryless, with predictions of future position being dependent only on the current position. This is not so consistent with the actual situation that a person selects the future according to habits and memory, so when selecting the next action, the decision of the next action is made according to the historical memory. This memoryless property can have some negative impact on the accuracy of the prediction of future locations. To solve this problem, a concept of n-MMCs is introduced, in which the state considers not only the current point of interest but also n-1 previous points of interest that have been visited.
According to training data, an n-MMCs state transition probability matrix is further constructed, in order to explain the concept based on n-MMCs prediction, an interest point constructed according to a clustering algorithm of joint density is shown in fig. 2, small and strong telephone GPS data are collected, and small and strong track information is obtained through learning. In 2-MMCs, four different states are considered, Home (H), work (W), Leisure (L) and sports (S), respectively, for the purpose of predicting the location at the next time based on the 2 locations most recently visited. The rows of the state transition probability matrix thus represent all possible state combinations for the most recently visited n points of interest, while the columns represent the next position in the n-MMCs. For example, if the previous position is H, the current position is W, and the next position is H, then a state transition HW to WH will occur, and the state transition probability matrix is updated accordingly, where the previous position is W and the current position is H, and the corresponding state transition probabilities are as follows:
where μ is the previous state, v is the current state, σ is the next state, dm,i-1,μRepresenting a total of m points of interest, the i-1 th state being μ, dm,i,νRepresenting a total of m points of interest, i states being v, dm,i+1,σRepresenting a total of m points of interest, the i +1 th state being σ, dn,j-1,μRepresents a total of n interest points, the j-1 th state is mu, dn,j,νRepresenting n interest points in total, wherein the j state is v, dn,j+1,σRepresenting a total of n points of interest, the j +1 th state being σ. The partial state transition probability matrix of the finally constructed 2-MMC mobile Markov model is shown in Table 1:
TABLE 12-MMC Mobile Mark transition probability Table
|
W
|
H
|
S
|
L
|
HW
|
0.1
|
0.8
|
0.1
|
0
|
HS
|
0
|
0.8
|
0.13
|
0.07
|
HL
|
0
|
0.9
|
0.04
|
0.06
|
WH
|
0.71
|
0.24
|
0.03
|
0.02
|
WS
|
0.26
|
0.59
|
0.11
|
0.04
|
WL
|
0.32
|
0.68
|
0
|
0 |
Step five: and searching the n-MMC state transition probability matrix according to the state transition probability matrix in the fourth step to predict the next position of the mobile user and determine the next position of the mobile user.
According to the constructed mobile Markov model, in order to predict the next position, the current state and the previous state corresponding to the next position are searched in the rows in sequence from the n-MMC state transition probability matrix, the column object with the highest probability in the columns corresponding to the positions found in the rows is searched in the columns to be used as the next position of the mobile object, and meanwhile, the corresponding rows and columns in the n-MMC state transition probability matrix are updated.
It should be noted that the above-mentioned embodiments illustrate rather than limit the technical solutions of the present invention, and that equivalent substitutions or other modifications made by persons skilled in the art according to the prior art are included in the scope of the claims of the present invention as long as they do not exceed the spirit and scope of the technical solutions of the present invention.