CN114817669A - Intelligent distributed extended crowd movement track space-time prediction model - Google Patents

Intelligent distributed extended crowd movement track space-time prediction model Download PDF

Info

Publication number
CN114817669A
CN114817669A CN202210429821.1A CN202210429821A CN114817669A CN 114817669 A CN114817669 A CN 114817669A CN 202210429821 A CN202210429821 A CN 202210429821A CN 114817669 A CN114817669 A CN 114817669A
Authority
CN
China
Prior art keywords
crowd
space
time
fitting
track
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210429821.1A
Other languages
Chinese (zh)
Inventor
石德省
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202210429821.1A priority Critical patent/CN114817669A/en
Publication of CN114817669A publication Critical patent/CN114817669A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The method takes crowd moving track data in distributed sensing as an object, adopts space-time division modes with different granularities to carry out data mining with different scales on a data set, integrates various crowd intelligent fitting algorithms, improves the crowd intelligent fitting algorithms, constructs space distribution of crowd moving tracks under time division and a time distribution model under space division, extracts periodic rule of crowd space-time distribution in distributed sensing, expands crowd moving track space-time framework, provides effective information recommendation, preference analysis and behavior prediction for users, creatively applies the distributed sensing to crowd moving track space-time prediction, mines user moving mode information through the crowd moving track data in the distributed sensing, effectively analyzes moving track data, is beneficial to a sensing system to obtain the distributed space-time related information, and effectively serves sensing activities and space-time hot spot prediction, and provides efficient crowd trajectory space-time hot spot prediction application.

Description

Intelligent distributed extended crowd movement track space-time prediction model
Technical Field
The application relates to a crowd movement track space-time prediction model, in particular to an intelligent distributed type crowd movement track expansion space-time prediction model, and belongs to the technical field of big data hotspot trend analysis.
Background
With the rapid development of computer and sensor technologies, mobile intelligent devices represented by smart phones have increasingly powerful functions, and the intelligent devices are loaded with embedded sensors such as speed sensors, pressure sensors, gravity sensors and GPS (global positioning system), so that the mobile intelligent devices not only have sensing functions of communication, voice, photographing, recording, positioning and the like, but also can sense relevant data such as temperature, direction, blood pressure, heart rate, air quality and the like. In recent years, a mobile intelligent sensing technology represented by a wireless sensor network has been developed rapidly, and the wireless sensor network is formed by arranging a series of static sensor nodes in a monitoring task area and self-organizing into the wireless sensor network, and sensing related data through devices such as sensors and the like so as to provide related application services.
However, there is still a certain distance from the theoretical development of the wireless sensor network to the wide urban perception application, where the networking cost and the cost of the sensor network limit the application in the large-scale urban area, and in addition, the sensor type of the sensor network layout has strong application correlation, so that the sensor network lacks flexibility and reusability for different types of applications. From the trend of the current mobile sensing technology development, the limitations of the wireless sensor network are mainly reflected in the following aspects: firstly, the application range is not large, the number of nodes of the sensing network at the present stage is not large, and the number of the nodes affects the monitoring range, so that the monitoring method cannot be applied to large-scale monitoring. Secondly, the cost is high, and in a large-scale wireless sensor network (such as forest and other ecological environment monitoring, urban traffic condition monitoring and the like), the cost required by sensor node access, network access and the like is high; meanwhile, in long-term use, due to the fact that factors such as bad weather or insufficient electric quantity cause node faults, later-period maintenance is complicated, and maintenance cost is high. And thirdly, the wireless sensor network is limited in applicable objects, in some aspects (monitoring of marine environments, forests, rivers and the like), the direct acquisition objects of the perception data are countries, government agencies and the like, the countries, the government agencies and the like analyze the perception data to release messages, but for the public, the perception is not centered on the countries, and the countries cannot timely and directly acquire interested information from the perception data.
Since wireless sensor networks represent a deficiency in the large-scale range of sensing applications. For enlarging the scale of perception, the change of the node can be more flexible, the subjective effect of people can be exerted, and novel distributed perception appears: the intelligent information processing method includes the steps that a person carries intelligent equipment such as a mobile phone, information (temperature, position, altitude, time, air quality, weather conditions and the like) in the environment is actively collected, the information is uploaded to a server (the uploading form can be data, pictures, text description, videos and the like), then analysis processing is carried out, and finally the information is dynamically published, so that users can know dynamic information concerned by the users more intuitively. The distributed sensing system uses the way of sensing data by the wireless sensor network, but has the following differences compared with the wireless sensor network: firstly, the sensing nodes of distributed sensing can be fixed sensors, and can also be changed along with the change of the spatial positions of sensor equipment, a mobile phone, an intelligent bracelet and the like carried by a user. Secondly, the variability of the topological structure of distributed sensing is stronger, and the topological structure of the sensing node is more flexible and variable relative to the topological structure of the wireless sensor network due to the dynamic change of the sensing node along with the movement of people. Thirdly, the main action objects of distributed sensing are different from the traditional wireless sensor network, the distributed sensing is man-centered, the distributed sensing mainly depends on people, and meanwhile, the direct action objects are people, groups or some people in a certain area.
Distributed perception emphasizes human-centric perception, including the perception awareness of the user and the surrounding environment in which the user is located, which may interact, inter-affect, between people and people, people and the environment. The distributed sensing utilizes various types of equipment to collect data, and can be used for collecting and uploading data by a smart phone, a GPS (global positioning system) positioning device, a temperature and humidity sensing system and the like. Because the distributed sensing nodes are generally carried about, the sensing nodes have the characteristic of mobility, and two or more nodes can communicate to share and exchange data resources.
Distributed sensing can better serve the public, adaptive requirements. The sensing range is large, the position of the sensing node can be changed at any time, and the like. If different according to the perception purpose, it can be divided into the following applications: public, social, and personal perceptions (or group, and personal perceptions). In general, distributed sensing has been applied to certain aspects of infrastructure, environmental, social life.
In conclusion, distributed sensing has a wide application prospect, but the distributed sensing is used for crowd movement trajectory space-time prediction, and meanwhile, there are technical difficulties to be solved urgently, including:
(1) firstly, the validity of data is not high, because distributed sensing data is sensed by people and is actively uploaded, the uploaded data has certain subjective components, and if the data is not real or is wrong, the final processing result is greatly influenced, and the validity of the data uploaded by a user cannot be ensured in the prior art; secondly, the distributed sensing nodes present irregular dynamic characteristics, mobile users and the sensing nodes are not fixed, the scale of the nodes is large, and the sensing nodes are influenced by many factors, such as user travel modes, population density of positions where the users are located, space-time distribution and the like, so that the dynamic distribution of the nodes is irregular. The prior art lacks of the grasp of the space-time distribution characteristic of the node, and the collection and processing quality of the data is not high; thirdly, the difficulty of processing the perception data is increased, the traditional mobile perception technology such as a wireless sensor network is distinguished, the dynamics of the perception data is stronger, the perception scale of the data is large, and the data is continuously updated in real time.
(2) The crowd moving trajectory data in distributed sensing contains user moving mode information, but the moving trajectory data cannot be effectively analyzed in the prior art, a sensing system is very difficult to acquire distributed space-time related information, cannot effectively serve sensing activities, and cannot provide efficient sensing application, the distributed sensing in the prior art lacks a model taking the crowd moving trajectory data as an object, lacks data mining with different scales for a data set by adopting space-time division modes with different granularities, does not have a proper trajectory fitting algorithm, cannot construct a space distribution model of the crowd moving trajectory under time division and a time distribution model under space division, cannot extract a periodic rule of the crowd space-time distribution in distributed sensing from the space-time distribution model, cannot expand a crowd moving trajectory space-time framework, and cannot provide effective information recommendation, preference analysis and behavior prediction for users, and the method cannot be used for the space-time prediction of the movement locus of the crowd.
(3) In the prior art, a method for extracting longitude and latitude and time and a data preprocessing method are lacked in a crowd moving track, and errors including clock errors, multipath effect influences and the like exist in a positioning system of intelligent equipment such as a mobile phone carried by a user in a distributed sensing mode; the mobile equipment is in failure, the randomness of actively uploading data is high, and partial data is incomplete, so that some mobile track points are partially lost in time and space; the active participation data transmission has large human factors, the data is influenced by the subjective activity of participants, personal preference and the like, and if the noise is not processed, the result is biased or the result of analyzing the time-space prediction is completely opposite; and the trajectory data lacks granularity division of time and space, lacks spatial distribution under the granularity division of time and lacks time distribution under the granularity division of space, and the crowd movement trajectory space-time prediction lacks accuracy and intuition.
(4) In the prior art, the defects that track analysis is sensitive to initially set track prediction factors and center points, local optimal solutions are easy to obtain and the like exist, proper initial fitting centers and track prediction factors cannot be selected by utilizing a hierarchical fitting method, and visual analysis is lacked; the prior art has the problems of sensitivity to an initial center and easiness in local convergence, and is lack of bidirectional fitting prediction optimization; in the design of space-time distribution division, the design of time granularity and space granularity is unreasonable, the space distribution under the time granularity division and the time distribution under the space granularity division cannot be analyzed, a periodical mode contained in the constructed moving track is lacked, the space hot spot of the moving track under the time granularity division mode and the time hot spot under the space granularity division mode cannot be analyzed and predicted, and the space-time prediction of the crowd moving track cannot be performed by utilizing distributed sensing.
Disclosure of Invention
The method is based on crowd moving track data in distributed sensing as an object, adopts space-time division modes with different granularities to carry out data mining with different scales on a data set, integrates different crowd fitting algorithms, improves the crowd fitting algorithms, constructs space distribution of crowd moving tracks under time division and a time distribution model under space division, extracts periodic rules of crowd space-time distribution in distributed sensing, expands crowd moving track space-time architecture, provides effective information recommendation, preference analysis and behavior prediction for users, creatively applies a innovative wireless sensing scene generated along with popularization of mobile intelligent equipment, development of wireless sensing technology and the like to crowd moving track space-time prediction, fully plays an important role of people in the sensing process, and utilizes the crowd moving track data in distributed sensing, the user movement mode information is mined, the movement track data is effectively analyzed, the sensing system is facilitated to obtain distributed space-time related information, the sensing activity and the space-time hotspot prediction are effectively served, and efficient crowd track space-time hotspot prediction application is provided.
In order to realize the technical characteristics and effects, the technical scheme adopted by the application is as follows:
an intelligent distributed extended crowd movement track space-time prediction model is provided, wherein the first is a movement track data prediction process; secondly, distributed movement track preprocessing comprises: carrying out crowd sensing denoising processing and track normalization standardization; thirdly, the space-time division design of the movement track; fourthly, the crowd-sourcing fitting prediction under the space-time division comprises the following steps: the method comprises the following steps of calculating the similarity of time-space track points, carrying out an crowd sourcing hierarchy fitting algorithm, carrying out an crowd sourcing distance fitting algorithm, selecting a track prediction factor and an initial crowd sourcing fitting center; fifthly, fitting the two-way pair group intelligence; sixthly, performing a crowd-sourcing fitting algorithm based on singular value decomposition;
the crowd moving track data in distributed sensing is taken as an object, different-scale data mining is carried out on a data set by adopting a space-time division mode with different granularity, different crowd fitting algorithms are integrated, the crowd fitting algorithm is improved, a space distribution model of the crowd moving track under time division and a time distribution model under space division are constructed, a periodic rule of the crowd space-time distribution in distributed sensing is extracted, a crowd moving track space-time framework is expanded, and effective information recommendation, preference analysis and behavior prediction are provided for a user;
the distributed extended crowd moving track space-time prediction process comprises the following steps:
step 1: extracting effective longitude, latitude and time data of the data set and carrying out denoising treatment;
step 2: respectively carrying out space-time granularity division according to the time-space division mode of the application, and then carrying out data normalization standardization;
and 3, step 3: selecting a track prediction factor and an initial crowd-sourcing fitting center;
and 4, step 4: performing crowd sourcing fitting prediction by taking the track prediction factor obtained in the step 3 and a crowd sourcing fitting center as an initial crowd sourcing fitting center;
and 5, step 5: performing crowd fitting on the crowd fitting in a two-way manner by using an improved method of the crowd fitting to obtain a prediction result;
and 6, step 6: decomposing the extracted data singular values, and then performing crowd fitting prediction to obtain a crowd fitting center;
and 7, step 7: and carrying out graphic visualization, and synthesizing the fitting results of the prototype crowd-sourcing fitting, the two-way pair crowd-sourcing fitting algorithm and the crowd-sourcing fitting algorithm after singular value decomposition to obtain the periodicity of the crowd moving track and the track space-time future trend.
Preferably, the analysis process of the moving trajectory data set comprises the design of crowd sensing denoising and trajectory normalization standardization of an original data set, the design of space-time division of the moving trajectory, and the crowd fitting prediction of space-time analysis on the basis of the space-time division, and the crowd fitting algorithm after the crowd sensing hierarchy fitting, prototype crowd fitting, two-way pair crowd fitting algorithm and singular value decomposition is adopted on the basis of the space-time division:
firstly, extracting longitude and latitude and time, preprocessing data in a filtering mode, then carrying out time and space granularity division on the obtained movement track data, dividing the movement track data in time by seconds, hours, days, working days/rest days, and dividing the movement track data in space by one area, nine areas and one hundred areas;
secondly, performing track analysis based on a crowd-sourcing fitting algorithm, and aiming at the defects that the method is sensitive to initially set track prediction factors and central points and is easy to obtain a local optimal solution, and the like, providing a method for selecting proper initial crowd-sourcing fitting centers and track prediction factors by using a crowd-sourcing hierarchical fitting method, and further performing visual analysis after crowd-sourcing fitting;
thirdly, aiming at the problems that the crowd sourcing fitting algorithm is sensitive to the initial center and easy to partially converge, the crowd sourcing fitting algorithm is further subjected to fitting prediction optimization by utilizing the two-way pair.
Fourthly, an improved crowd-sourcing fitting algorithm is provided, the data is processed through singular value decomposition, and then the crowd-sourcing fitting algorithm is utilized for crowd-sourcing fitting;
fifthly, in the space-time distribution division design, time granularity is divided according to seconds, hours, days, working days/rest days, and space granularity is divided according to regions; and the improved crowd-sourcing fitting algorithm provided by the application is utilized to analyze the spatial distribution under the time granularity division and the time distribution under the space granularity division respectively, construct a periodic mode contained in the moving track, and analyze and predict the space hot spot of the moving track in the time granularity division mode and the time hot spot in the space granularity division mode.
Preferably, the crowd-sourcing perception denoising process: searching for noise points by a method of setting a critical value, if some values of some two trace points are larger than the set critical value, presuming the trace points as the noise points, and the specific method is as follows: calculating the moving speed through the distance or time interval between a certain point and the next point, if the moving speed is larger than a set critical value, the point is considered as a noise point, if some points are considered as noise points, further analyzing the noise points, if the points are continued for a period of time and the time interval is longer or exceeds a certain distance in space, judging that the points are not the noise points, and the positions of the points are only the behavior mode of the motion of the user.
Track normalization: the moving track data comprises longitude, latitude and time, the measuring standards are different, the longitude ranges from-90 to 90, the latitude ranges from-180 to 180, the time is 0 to 24 measured by hour and 1 to 7 measured by week, when the space-time distribution of the moving track is analyzed, the moving track is subjected to standard unification, and the result is convenient to analyze in a unified interval by the normalization;
for the movement trajectory data x (F), F is 1,2,3, …, N, y (F) after normalization is formula 1:
Figure BDA0003611306580000051
and transforming the measuring standard of longitude, latitude and time to be between-1 and 1.
Preferably, the movement trajectory space-time division design is as follows: the distributed expanded movement track is divided according to the difference of time and space granularity, the space-time granularity is the hierarchical size of time and space, the space large granularity is defined as the larger interval of a space region, the time large granularity is defined as the long interval of a time period, and the space-time distribution condition of a user in a certain large-range activity region and a certain week is mastered; the small space-time granularity corresponds to the regional activity condition of a certain office building and the space-time distribution condition of a certain hour;
dividing the time by seconds, hours, days, working days/rest days respectively, and corresponding to the distribution condition of time granularity on the space; in space, track normalization standardization is utilized, the space is subjected to longitude and latitude expansion and reduction, the space is divided into regions according to grids, the time distribution design in the regions not only considers the space effect on time distribution, but also considers the time modes of different region division, the moving track data is processed and analyzed by a method combining space-time division and crowd fitting, the specific result of space-time division is obtained, and the crowd moving track mode is obtained through a space-time conclusion;
when the space-time division is carried out by the method 1, namely, the time is divided by seconds, hours, days, working days/rest days respectively, and meanwhile, crowd-sourcing fitting is carried out to see the effect of the space-time division, and the space distribution mode of the user under different time granularity division is deduced from the space distribution mode;
when the space-time method is divided by the 2 nd method, namely, the grid granularity is expanded and contracted in space, the space is divided into regions according to the longitude and latitude, after the regions are divided according to the space granularity, the space concept is ignored, and then the crowd fit of time is carried out in each region, so that the time distribution under different space granularities can be determined.
Preferably, the spatiotemporal trajectory point approximation is calculated as: the time space is taken as a sequence to measure the similarity, the similarity is measured through the numerical values of two objects, the similarity is judged through the distance between two points when the similarity of the tracks is judged, the space-time track point similarity calculation is based on the cosine distance, the measurement mode of the similarity is determined through the cosine values of the included angles of the vectors on the basis of the vectors, and the smaller the result value is, the more dissimilar the objects are; otherwise, the more similar the formula is calculated as formula 2, where x is 1 And y 1 Respectively representing the abscissa, x, of the first object 2 And y 2 Respectively, the abscissa and ordinate of the second object:
Figure BDA0003611306580000061
the cosine similarity is measured by depending on the space orientation of the object, and the track point approximation is measured by utilizing the characteristic that the cosine similarity is insensitive to numerical values in the analysis process of the moving track.
Preferably, the crowd-sourcing hierarchy fitting algorithm: initially, each sample point is taken as a track class, then the classes are gradually merged according to a certain rule, and the distance between one sample point in the track class C1 and one sample point in the track class C2 is the shortest euclidean distance between all sample points in different classes, so that the track class C1 and the track class C2 are considered to be similarly mergeable, which is specifically described as follows:
inputting: f, target track class number, G, sample point set output: f track class sets
The method comprises the following steps: step one, each sample point in G is taken as a track class;
secondly, calculating the distance between every two track classes;
thirdly, combining the two classes with the minimum distance into a track class;
fourthly, repeating the second step and the third step;
and fifthly, until the number of the track classes is F.
Preferably, the crowd-sourcing distance fitting algorithm: the method comprises the following steps of taking the Euclidean distance as a similarity measure, inspecting the distance from a sample point to a fitting center, taking the square sum CB of errors as a rule function of the Euclidean distance, approximately solving the fitting center capable of minimizing the square sum of errors through iterative optimization based on a greedy strategy, and performing an algorithm process:
the first process is as follows: reading in M sample sets and the number F of track classes;
and a second process: randomly selecting F samples from the M samples as seed points;
the third process: calculating the distances from all other points to the seed points, and dividing each point into the track class of the seed point closest to the point after comparing the sizes of all other points;
the process four is as follows: calculating new central points of all the track classes by using an averaging method;
and a fifth process: repeating the third process and the fourth process until the new center point is superposed with the previous center point or the difference is small, and jumping out of the cycle;
wherein, the end condition of the crowd-sourcing distance fitting is convergence, the fitting self ensures convergence, the condition for ensuring convergence is to satisfy the function definition of error square sum CB, and the formula is defined as formula 3:
Figure BDA0003611306580000071
where C is the given data set, if C contains F fitting subsets C 1 ,C 2 …, CF; the number of samples per fitting subset is n1, n2, …, n F (ii) a The central points are m respectively 1 ,m 2 ,…,m F
Preferably, the selection of the trajectory prediction factor and the initial crowd-sourcing fitting center: carrying out crowd-sourcing hierarchy fitting by using a sampling mode to find a proper track prediction factor and an initial central point, visually reflecting the proper crowd-sourcing hierarchy fitting track prediction factor through a dendrogram, and specifically selecting the track prediction factor and the initial central point by the following method:
the method comprises the following steps: sampling the original data;
step two: carrying out crowd-sourcing hierarchical fitting on the sampled points, and obtaining a suitable crowd-sourcing fitting trajectory prediction factor and a corresponding crowd-sourcing fitting center of the data set through the crowd-sourcing hierarchical fitting;
step three: and repeating the first step and the second step, jumping out of the loop when the number of the loops is less than a critical value k, continuing the loop, jumping out of the loop to output a final value when the number of the loops is greater than the critical value, selecting the track prediction factor of each time in the statistical result as the final value for the jumping out of the loop, selecting the F with the most times as the most suitable F, and then randomly selecting a group of swarm intelligence fitting centers corresponding to the track prediction factors as the suitable initial seed points.
Preferably, the two-way crowd-sourcing fit:
step 1: initializing raw data into a class;
step 2: setting F to be 2, and dividing the original class into two classes by using a crowd-sourcing fitting algorithm;
and step 3: then, selecting a track class on the basis of the step 2, and dividing the track class into two classes by using the method in the step 2 similarly, wherein the three classes exist at the moment;
and 4, step 4: selecting one of the three classes in the step 3, and repeating the operation in the step 2;
and 5: repeating the step 4 and the step 2 until the number F of the track classes is divided into the required track classes, and jumping out of the cycle;
in the process of step 3, an effective class needs to be found in step 2 for class division, and the class with the largest value of the CB is selected for division by calculating CB values of different track classes.
Preferably, the group intelligence fitting algorithm based on singular value decomposition: through dimension reduction and simplification of data, noise of track data is reduced, discrete points are reduced, global convergence is accelerated, the data are easier to analyze, and a singular value decomposition formula represented by a formula 4 is as follows:
A=U∑V T formula 4
In the formula, U is a matrix of n × n, and Σ is in order of n × mDiagonal matrix, Σ i Is a number on the diagonal. VT is the transpose of V, an m x m order matrix. A is an n x m order matrix. This equation is called the singular value decomposition of a.
After singular value decomposition, crowd-sourcing trajectory fitting is performed on the basis of the singular value decomposition.
Compared with the prior art, the innovation points and advantages of the application are as follows:
(1) the method is based on crowd moving track data in distributed sensing as an object, adopts space-time division modes with different granularities to carry out data mining with different scales on a data set, integrates different crowd fitting algorithms, improves the crowd fitting algorithms, constructs space distribution of crowd moving tracks under time division and a time distribution model under space division, extracts periodic rules of crowd space-time distribution in distributed sensing, expands crowd moving track space-time architecture, provides effective information recommendation, preference analysis and behavior prediction for users, creatively applies a innovative wireless sensing scene generated along with popularization of mobile intelligent equipment, development of wireless sensing technology and the like to crowd moving track space-time prediction, fully plays an important role of people in the sensing process, and utilizes the crowd moving track data in distributed sensing, the user movement mode information is mined, the movement track data is effectively analyzed, the sensing system is facilitated to obtain distributed space-time related information, the sensing activity and the space-time hotspot prediction are effectively served, and efficient crowd track space-time hotspot prediction application is provided.
(2) The method and the device solve the problem of low validity of distributed sensing data, and ensure the validity of the data uploaded by a user through crowd sensing denoising processing and track normalization standardization; the space-time distribution characteristics of the nodes can be efficiently controlled and mastered, and the data collection and processing quality is high; aiming at the problems of difficulty increase in processing of perception data, strong dynamics of perception data, large scale of data perception and real-time updating, longitude and latitude and time are extracted, the data are preprocessed in a filtering mode, then time and space granularity division is carried out on the obtained movement track data, the time is divided by seconds, hours, days and working days/rest days, the space is divided by one zone, nine zones and one hundred zones, the data processing pertinence of track space-time prediction is strong, and the method is applied to an extended model such as crowd movement track space-time prediction and has a better effect.
(3) Aiming at the defects that the swarm intelligence fitting algorithm is sensitive to the initially set track prediction factor and the central point and local optimal solution is easy to obtain and the like, the method for selecting the appropriate initial swarm intelligence fitting center and track prediction factor by utilizing the swarm intelligence level fitting is provided, and visual analysis is further carried out after the swarm intelligence fitting; and aiming at the problems that the crowd sourcing fitting algorithm is sensitive to the initial center and easy to partially converge, the crowd sourcing fitting algorithm is further subjected to fitting prediction optimization by utilizing the two-way pair. An improved crowd sourcing fitting algorithm is provided, data are processed through singular value decomposition, and then crowd sourcing fitting is carried out through the crowd sourcing fitting algorithm; the crowd-sourcing fitting effect based on singular value decomposition obtained from the experimental results is superior to the original crowd-sourcing fitting algorithm and the two-way pair crowd-sourcing fitting algorithm, and the quality of trajectory fitting is enhanced layer by layer through superposition improvement.
(4) According to the space-time distribution division design, time granularity is divided according to seconds, hours, days, working days/rest days, and space granularity is divided according to regions; and the improved crowd-sourcing fitting algorithm is utilized to analyze the space distribution under the time granularity division and the time distribution under the space granularity division respectively, construct a periodic pattern contained in the moving track, analyze and predict the space hot spot of the moving track in the time granularity division mode and the time hot spot in the space granularity division mode, analyze and mine the periodic pattern, the hot spot region, the overall track and the users with similar behaviors contained in the moving track, fully exert the advantages of the space-time prediction model of the moving track of the intelligent crowd, and have great value in the group behavior analysis and application based on the position.
Drawings
FIG. 1 is a flow chart of distributed extended crowd movement trajectory spatiotemporal prediction
FIG. 2 is a flow chart of the selection of trajectory predictors and initial crowd-sourcing fitting centers.
Fig. 3 is a schematic diagram of track point clustering space distribution based on the DP algorithm.
Fig. 4 is a schematic diagram of track point clustering spatial distribution of the improved crowd-sourcing fitting algorithm.
Fig. 5 is a schematic diagram of the fitting of the original spatial distribution of the trace points.
FIG. 6 is a schematic diagram of two-way crowd fit to the crowd fit algorithm.
FIG. 7 is a diagram illustrating the effects of crowd-sourcing fitting under singular value decomposition.
FIG. 8 is a detailed view of trajectory class 3 formed by crowd-sourcing fit under singular value decomposition.
Fig. 9 is a graph of the effect of power-law distribution curve fitting of the intra-class trace points around the fitting center.
FIG. 10 is a diagram showing the distribution of hot spot regions at 1 PM of the spatial-temporal prediction model of the movement locus.
FIG. 11 is a graph of the distribution of crowd-sourced fitted centers at one hour intervals over 24 hours of the predictive model.
FIG. 12 is a graph showing the distribution of crowd-sourced fit centers for each day of the seven days of the week for the predictive model.
Detailed description of the invention
The following describes in detail a specific implementation of the intelligent distributed extended crowd movement trajectory spatio-temporal prediction model in combination with the accompanying drawings, so that those skilled in the art can better understand and implement the present application. Those skilled in the art may now do so without departing from the spirit and scope of the present application, and therefore the present application is not limited to the specific embodiments disclosed below.
Distributed sensing is a revolutionary wireless sensing scene generated along with the popularization of mobile intelligent devices, the development of wireless sensing technology and the like. People play an important role in the perception process, and perception data collected by users is more and more as distributed perception and mobile perception develop. The crowd moving track data in distributed sensing contains user moving mode information, the moving track data is effectively analyzed, a sensing system is facilitated to obtain distributed space-time related information, sensing activities are effectively served, and efficient sensing application is provided. According to the crowd moving track time-space prediction method based on the distributed sensing, crowd moving track data in the distributed sensing are taken as objects, data mining with different scales is carried out on a data set in a time-space division mode with different granularities, different crowd fitting algorithms are integrated, the crowd fitting algorithms are improved, a space distribution model of the crowd moving track under time division and a time distribution model under space division are constructed, periodic rules of the crowd time-space distribution in the distributed sensing are extracted, a crowd moving track time-space framework is expanded, and effective information recommendation, preference analysis and behavior prediction are provided for users.
Firstly, extracting longitude and latitude and time, preprocessing data in a filtering mode, then carrying out time and space granularity division on the obtained movement track data, dividing the movement track data in time by seconds, hours, days, working days/rest days, and dividing the movement track data in space by one area, nine areas and one hundred areas;
secondly, performing track analysis based on a crowd-sourcing fitting algorithm, and aiming at the defects that the method is sensitive to initially set track prediction factors and central points and is easy to obtain a local optimal solution, and the like, providing a method for selecting proper initial crowd-sourcing fitting centers and track prediction factors by using a crowd-sourcing hierarchical fitting method, and further performing visual analysis after crowd-sourcing fitting;
thirdly, aiming at the problems that the crowd sourcing fitting algorithm is sensitive to the initial center and easy to partially converge, the crowd sourcing fitting algorithm is further subjected to fitting prediction optimization by utilizing the two-way pair.
Fourthly, an improved crowd-sourcing fitting algorithm is provided, the data is processed through singular value decomposition, and then the crowd-sourcing fitting algorithm is utilized for crowd-sourcing fitting;
fifthly, in the space-time distribution division design, time granularity is divided according to seconds, hours, days, working days/rest days, and space granularity is divided according to regions; and the improved crowd-sourcing fitting algorithm provided by the application is utilized to analyze the spatial distribution under the time granularity division and the time distribution under the space granularity division respectively, construct a periodic mode contained in the moving track, and analyze and predict the space hot spot of the moving track in the time granularity division mode and the time hot spot in the space granularity division mode.
First, moving track data prediction process
The analysis process of the moving track data set comprises the following aspects of crowd sensing denoising and track normalization standardization of an original data set, moving track space-time partition design and crowd fitting prediction of space-time analysis on the basis of space-time partition. Therefore, in the space-time analysis crowd-sourcing fitting design based on space-time division, crowd-sourcing hierarchy fitting, prototype crowd-sourcing fitting, bidirectional pair crowd-sourcing fitting algorithm and the crowd-sourcing fitting algorithm after singular value decomposition are adopted, wherein in the space-time analysis crowd-sourcing fitting design based on space-time division, the prototype crowd-sourcing fitting algorithm, the crowd-sourcing hierarchy fitting, the improved crowd-sourcing fitting, namely the bidirectional pair crowd-sourcing fitting algorithm, and the crowd-sourcing fitting algorithm after singular value decomposition are adopted.
The flow chart of the distributed extended crowd movement trajectory space-time prediction is shown in fig. 1, and according to the flow chart:
step 1: extracting effective longitude, latitude and time data of the data set and carrying out denoising treatment;
step 2: respectively carrying out space-time granularity division according to the time-space division mode of the application, and then carrying out data normalization standardization;
and 3, step 3: selecting a track prediction factor and an initial crowd-sourcing fitting center;
and 4, step 4: performing crowd sourcing fitting prediction by taking the track prediction factor obtained in the step 3 and a crowd sourcing fitting center as an initial crowd sourcing fitting center;
and 5, step 5: performing crowd fitting on the crowd fitting in a two-way manner by using an improved method of the crowd fitting to obtain a prediction result;
and 6, step 6: decomposing the extracted data singular values, and then performing crowd fitting prediction to obtain a crowd fitting center;
and 7, step 7: and carrying out graphic visualization, and synthesizing the fitting results of the prototype crowd-sourcing fitting, the two-way pair crowd-sourcing fitting algorithm and the crowd-sourcing fitting algorithm after singular value decomposition to obtain the periodicity of the crowd moving track and the track space-time future trend.
Second, distributed moving track preprocessing
Crowd sensing denoising process
The noise source of the moving track data in the crowd sensing includes the following aspects: firstly, errors exist in positioning systems of intelligent devices such as mobile phones carried by distributed sensing users, wherein the errors include clock errors, multipath effect influences and the like; secondly, mobile equipment faults may exist, and randomness of actively uploading data in crowd sensing is large, so that partial data is incomplete, and some moving track points are partially lost in time and space; thirdly, the data transmission is actively participated in, the human factors of the data transmission are large, and the data is influenced by the subjective initiative of participants, personal preference and the like.
The above factors cause noise in the moving track points, and if the noise is not processed, the data is directly subjected to demand analysis, and the result is deviated or a completely opposite conclusion is obtained through analysis, so that the denoising processing of the moving track data is particularly important.
Searching for noise points by a method of setting a critical value, if some values of some two trace points are larger than the set critical value, presuming the trace points as the noise points, and the specific method is as follows: calculating the moving speed through the distance or time interval between a certain point and the next point, if the moving speed is larger than a set critical value, the point is considered as a noise point, if some points are considered as noise points, further analyzing the noise points, if the points are continued for a period of time and the time interval is longer or exceeds a certain distance in space, judging that the points are not the noise points, and the positions of the points are only the behavior mode of the motion of the user.
(II) normalization of trajectories
The moving track data comprises longitude, latitude and time, the measurement standards are different, the longitude ranges from-90 to 90, the latitude ranges from-180 to 180, and the time is measured by hours to be 0 to 24, and the time is measured by weeks to be 1 to 7, so that the measurement standards are different, and therefore when the space-time distribution of the moving track is analyzed, the moving track needs to be unified in the standard, and therefore the normalization processing is needed. Normalization facilitates analysis of the results within a uniform interval.
For the movement trajectory data x (F), F is 1,2,3, …, N, y (F) after normalization is formula 1:
Figure BDA0003611306580000111
and transforming the measuring standard of longitude, latitude and time to be between-1 and 1.
Space-time division design of movement locus
The distributed expanded movement track is divided according to the difference of time and space granularity, the space-time granularity is the level size of time and space, so that the space-time relationship can be better mastered by the division, the space large granularity is defined as the space with larger interval in a space region, the time large granularity is defined as the time interval with long time interval, and the space-time distribution condition of a user in a certain large-range activity region and a certain week can be mastered; the small space-time granularity corresponds to the regional activity of an office building and the space-time distribution of an hour.
Dividing the time by seconds, hours, days, working days/rest days respectively, and corresponding to the distribution condition of time granularity on the space; in space, track normalization standardization is utilized, the space is subjected to longitude and latitude expansion and reduction, the space is divided into regions according to grids, time distribution design on the regions not only considers the space effect on time distribution, but also considers the time modes of different region division, the moving track data is processed and analyzed by a method of combining space-time division and crowd fitting, the specific result of space-time division is obtained, and the crowd moving track mode is obtained through a space-time conclusion.
When the space-time division is carried out by the method 1, namely, the division is carried out by seconds, hours, days, working days/rest days on the time, the crowd-sourcing fitting is carried out to see the effect of the space, and the space distribution mode of the user under different time granularity divisions is deduced from the space distribution mode. As can be seen in its spatial distribution within 12 to 13 points, which region is distributed in this time period, that location can be presumed to be a canteen or a restaurant; for another example, by dividing by weekday and holiday, the workplace during weekday can be deduced from the spatial profile, and the frequent play of the participating perceiving user can be seen by the holiday profile.
When the space-time method is divided by the 2 nd method, namely, the grid granularity is expanded and contracted in space, the space is divided into regions according to the longitude and latitude, after the regions are divided according to the space granularity, the space concept is ignored, and then the crowd fit of time is carried out in each region, so that the time distribution under different space granularities can be determined. If the area a is a tourist area, the analysis may infer the time point at which the area a is popular or dense, for example, 8 to 10 points.
Crowd-sourcing fit prediction under space-time partition
(I) calculation of space-time trace point approximation
The calculation of the approximate track points is to measure the similarity by taking a time space as a sequence, measure the similarity by using the numerical values of two objects, judge the similarity by using the distance between two points when judging the similarity of the tracks, and the calculation of the approximate track points is based on cosine distance in the application, wherein the measurement mode of the similarity is determined by using vectors as a reference and the cosine value of the included angle of the vectors, and the smaller the result value is, the more dissimilar the objects are; otherwise, the more similar the formula is calculated as formula 2, where x is 1 And y 1 Respectively representing the abscissa, x, of the first object 2 And y 2 Respectively, the abscissa and ordinate of the second object:
Figure BDA0003611306580000121
the cosine similarity is measured by depending on the space orientation of the object, and the track point approximation is measured by utilizing the characteristic that the cosine similarity is insensitive to numerical values in the analysis process of the moving track.
(II) crowd-sourcing hierarchical fitting algorithm
For the crowd-sourcing fitting algorithm, the following aspects are discussed and improved: the efficiency of the algorithm is suitable for analyzing the crowd track big data; whether different types of data can be processed for different data symbol compositions; whether different categories can be distinguished; whether there is adaptivity for an abnormal data point; different data inputs have an effect on the algorithm.
Initially, each sample point is taken as a track class, then the classes are gradually merged according to a certain rule, and the distance between one sample point in the track class C1 and one sample point in the track class C2 is the shortest euclidean distance between all sample points in different classes, so that the track class C1 and the track class C2 are considered to be similarly mergeable, which is specifically described as follows:
inputting: f, target track class number, G, sample point set output: f track class sets
The method comprises the following steps: step one, each sample point in G is taken as a track class;
secondly, calculating the distance between every two track classes;
thirdly, combining the two classes with the minimum distance into a track class;
fourthly, repeating the second step and the third step;
and fifthly, until the number of the track classes is F.
(III) crowd-sourcing distance fitting algorithm
The method comprises the following steps of taking the Euclidean distance as a similarity measure, inspecting the distance from a sample point to a fitting center, taking the square sum CB of errors as a rule function of the Euclidean distance, approximately solving the fitting center capable of minimizing the square sum of errors through iterative optimization based on a greedy strategy, and performing an algorithm process:
the first process is as follows: reading in M sample sets and the number F of track classes;
and a second process: randomly selecting F samples from the M samples as seed points;
the third process: calculating the distances from all other points to the seed points, and dividing each point into the track class of the seed point closest to the point after comparing the sizes of all other points;
the process four is as follows: calculating new central points of all the track classes by using an averaging method;
and a fifth process: repeating the third process and the fourth process until the new center point is superposed with the previous center point or the difference is small, and jumping out of the cycle;
wherein, the end condition of the crowd-sourcing distance fitting is convergence, the fitting self ensures convergence, the condition for ensuring convergence is to satisfy the function definition of error square sum CB, and the formula is defined as formula 3:
Figure BDA0003611306580000131
where C is the given data set, if C contains F fitting subsets C 1 ,C 2 …, CF; the number of samples per fitting subset is n1, n2, …, n F (ii) a The central points are m respectively 1 ,m 2 ,…,m F
(IV) selection of trajectory prediction factor and initial crowd-sourcing fitting center
The crowd-sourcing fitting algorithm has many advantages, but the main disadvantage is that it is sensitive to the initial seed points, and the difference in initial value selection directly affects the final result analysis, so that under the condition of relatively large data volume, the original crowd-sourcing fitting cannot be directly utilized for crowd-sourcing fitting. Therefore, it is necessary to select the appropriate number of classes of crowd-sourcing fits (i.e., into several classes, set as trajectory predictors) and initial centers (seed points). The crowd-sourcing hierarchy fitting algorithm is complex in the operation process, the crowd-sourcing hierarchy fitting is carried out in a sampling mode to find a proper track prediction factor and an initial central point, the proper crowd-sourcing track prediction factor is intuitively reflected through a dendrogram, and the method for specifically selecting the track prediction factor and the initial central point comprises the following steps:
the method comprises the following steps: sampling the original data;
step two: carrying out crowd-sourcing hierarchical fitting on the sampled points, and obtaining a suitable crowd-sourcing fitting trajectory prediction factor and a corresponding crowd-sourcing fitting center of the data set through the crowd-sourcing hierarchical fitting;
step three: and repeating the first step and the second step, jumping out of the loop when the number of the loops is less than a critical value k, continuing the loop, jumping out of the loop to output a final value when the number of the loops is greater than the critical value, selecting the track prediction factor of each time in the statistical result as the final value for the jumping out of the loop, selecting the F with the most times as the most suitable F, and then randomly selecting a group of swarm intelligence fitting centers corresponding to the track prediction factors as the suitable initial seed points.
The specific flow chart is shown in fig. 2.
Five, two-way crowd-sourcing fit
The application utilizes another algorithm to overcome the defect of local convergence and sensitivity to the initial value, namely bidirectional crowd-sourcing fitting, and the specific method is as follows:
step 1: initializing raw data into a class;
and 2, step: setting F to be 2, and dividing the original class into two classes by using a crowd-sourcing fitting algorithm;
and step 3: then, selecting a track class on the basis of the step 2, and dividing the track class into two classes by using the method in the step 2 similarly, wherein the three classes exist at the moment;
and 4, step 4: selecting one of the three classes in the step 3, and repeating the operation in the step 2;
and 5: repeating the step 4 and the step 2 until the number F of the track classes is divided into the required track classes, and jumping out of the cycle;
in the process of step 3, an effective class needs to be found in step 2 for class division, and the class with the largest value of the CB is selected for division by calculating CB values of different track classes.
The bidirectional crowd sourcing fitting algorithm is low in time complexity, simple in operation and easy to understand, and overcomes the defects that an original crowd sourcing fitting algorithm is sensitive to an initial value and the like.
Sixthly, crowd-sourcing fitting algorithm based on singular value decomposition
And the data is simplified through dimension reduction, the noise of the track data is reduced, the discrete points are reduced, the global convergence is accelerated, and the data is easier to analyze. Formula 4, formula of singular value decomposition:
A=U∑V T formula 4
In the formula, U is a matrix of n × n, sigma is a diagonal matrix of n × m, sigma i Is a number on the diagonal. VT is the transpose of V, an m x m order matrix. A is an n x m order matrix. This equation is called the singular value decomposition of a.
After singular value decomposition, crowd-sourcing trajectory fitting is performed on the basis of the singular value decomposition.
Seventh, Experimental results and analysis
Firstly, in order to verify that the crowd moving track crowd fitting is more suitable than the crowd moving track crowd fitting algorithm based on the density clustering algorithm, a typical crowd moving track crowd fitting algorithm (namely GP algorithm) based on the density clustering algorithm is selected to carry out a comparison experiment with the crowd moving track crowd fitting algorithm. Because the GP algorithm needs to calculate the distances between all points in advance, and the data set used in the experiment is large, the memory cost of the whole distance matrix is very large, and therefore, the receipt sampling is firstly carried out, and then the swarm intelligence fitting of the spatial distribution of the tracing points is carried out. And under the condition of sampling 4000 track points, obtaining a decision diagram by using a decision formula of a GP algorithm. It can be seen from the figure that if the threshold value is set too large, the GP algorithm will cluster all the trace points into one class, otherwise, there will be many clustering centers, and under a more reasonable condition, the number of the obtained trace classes is 2, and fig. 3 is a clustering space distribution condition corresponding to the decision graph. Fig. 4 shows the spatial distribution of the crowd-sourcing fit obtained using the modified crowd-sourcing fit algorithm (F-3 was obtained using the crowd-sourcing hierarchy fit). Comparing the crowd-sourcing fitting effect of fig. 3 and 4, it is found that the effect of the crowd-sourcing fitting algorithm is superior to that of the GP algorithm, mainly because the class formed by the data set is a convex class and the density difference of the fitting centers is too large, which is suitable for using crowd-sourcing fitting, but the effect is poor by using the algorithm based on density clustering.
Before fitting, according to the moving track data analysis process and the selection of a crowd-sourcing fitting algorithm, firstly preprocessing the moving track data, then obtaining a proper crowd-sourcing fitting number (namely a track prediction factor) and an initial crowd-sourcing fitting center by using crowd-sourcing hierarchical fitting, and finding that the proper crowd-sourcing fitting number is 3 (making a straight line parallel to a horizontal axis at a position with a distance of 40, and then viewing from top to bottom to obtain three types, namely the proper number is 3). And finally, obtaining a final crowd sourcing fitting result by utilizing a crowd sourcing fitting algorithm. Fig. 5 is a fitting of the original spatial distribution of the trace points, wherein the abscissa is latitude, the ordinate is longitude, and the black cross represents the center of each class.
The center positions of the three oranges are (113.1489 DEG E,27.1044 DEG N), (-72.2885 DEG w,43.5222 DEG N), (116.1255 DEG E,39.8597 DEG N) in sequence, and the three positions and the nearby area are indicated as hot spot areas.
In order to obtain better crowd-sourcing fitting effect, the crowd-sourcing fitting algorithm is subjected to crowd-sourcing fitting by utilizing two-way pair, and the crowd-sourcing fitting effect is shown in fig. 6. The crowd-sourcing fit effect is not substantially improved from the graph, since the process of the bi-directional crowd-sourcing fitting algorithm is to divide the class into two, and the size classes of inappropriate data sets coexist. Although the track data set is filtered in advance, the method has a good effect only on some abnormal points, and more discrete points in the track data set influence the final crowd-sourcing fitting effect.
Therefore, the trajectory data is selected to be subjected to singular value decomposition first and then to crowd-sourcing fitting. The singular value decomposition reduces the noise of the track data, can also reduce the influence of discrete points and subclasses on the crowd-sourcing fitting effect, and can reduce the dimension of the data by the singular value decomposition, thereby being easy for data analysis. The effect of the crowd-sourcing fit under singular value decomposition is shown in figure 7. FIG. 8 is a detail of the trajectory class 3 formed by the crowd-sourcing fit, showing the shape of the trajectory class. The central positions of the three classes are obtained as (117.3310 ° W,42.8048 ° N), (6.7114 ° E,44.7804 ° N), (115.8998 ° E,38.8928 ° N) in sequence, which indicates that the three positions and the nearby area are hot spot areas. The crowd fitting formed by the crowd fitting algorithm under the singular value decomposition has the characteristic of coexistence of large and small classes, and the spatial distribution condition of the actual crowd track is better reflected.
In order to more accurately compare the crowd sourcing fitting effect, a GB index and a Gunn index after crowd sourcing fitting by using a crowd sourcing fitting algorithm, a two-way pair crowd sourcing fitting algorithm and the crowd sourcing fitting algorithm under singular value decomposition are respectively calculated by using a fitting effect measurement model. The result shows that the GB index of the crowd-sourcing fitting algorithm is the largest, the GB index of the two-way pair crowd-sourcing fitting algorithm is the second highest, and the GB index of the crowd-sourcing fitting algorithm under the singular value decomposition is the smallest; under singular value decomposition, the Gunn index of the crowd-sourcing fitting algorithm is the largest, the Gunn index of the two-way pair crowd-sourcing fitting algorithm is the second order, and the Gunn index of the crowd-sourcing fitting algorithm is the smallest. Therefore, for the crowd-sourcing fitting of the trajectory data set of the present application, the crowd-sourcing fitting algorithm under the singular value decomposition has the best effect, and the second most after the two-way pair of the following crowd-sourcing fitting algorithm, the crowd-sourcing fitting algorithm has the worst effect. Therefore, the crowd fitting algorithm in the time-space analysis of the moving trajectory data selects the crowd fitting algorithm under the singular value decomposition.
As can be seen from fig. 7 and 8, there are more trace points near the crowd-sourcing fit center location for each class, the further away from the trace points, the fewer trace points, and this change is clearly non-linear. In order to obtain the distribution characteristics of the track points in the class, the track points in the class are counted, curve fitting is performed by using a fitting tool box of an MATLAB curve, fitting effects of functions such as Gaussian distribution, exponential distribution, polynomial distribution and power distribution are compared, and it is found that the curve fitting effect of the power distribution is the best and the correlation degree is the highest, as shown in FIG. 9, wherein an x axis represents the distance (relative distance) from a fitting center, and a y axis represents the number of the track points. The fitting model is f (x) a · x ^ b, a is 1.942e +07, b is-3.654, the fitting correlation degree reaches 99.82%, and the distribution follows a power law distribution.
In order to perform space-time analysis on a moving track, time multi-granularity division and space multi-granularity division are performed on space-time data, the space-time data are firstly divided according to 24 hours, hotspot region distribution conditions at different times (with hours as a measurement unit) are obtained through crowd-sourcing fitting, fig. 10 shows hotspot region distribution conditions at 1 pm (two classes are formed through crowd-sourcing fitting), and fig. 11 shows crowd-sourcing fitting center distribution conditions every hour within 24 hours. It can be seen that the hot spot area changes with time of day, but the hot spot status is almost continuously maintained around (116 ° E,39 ° N), especially at night, indicating that the area is likely to be a residential area.
Then, the time-space data is divided according to 'seven days a week', the distribution situation of the hot spot regions from Monday to Sunday is obtained through crowd-sourcing fitting, and fig. 12 is the distribution situation of crowd-sourcing fitting centers every day in seven days a week. It can be seen that the hot spot area changes from monday to sunday, but the hot spot status is maintained almost every day around (116 ° E,39 ° N), which again confirms the possibility of a residential area around the area. And another hotspot area on saturday and weekend is significantly different from another hotspot area on weekdays, which may be a local venue for leisure, sports, and another hotspot area on weekdays may be a local venue for business.
The whole area is divided into nine areas, then the time crowd fitting effect of a certain area is achieved, the three time hotspots of the area are about six pm in Monday, about eleven in Monday and fifteen in Friday in turn, and the time gathering effect of the area is easy to appear in the evening. The time crowd-sourcing fitting effect after dividing the whole area into one hundred regions is achieved, the time distribution of the track of each region is gathered into three classes, then the largest class is selected as the final 'time hot spot', the horizontal axis represents different divided regions (from the first region to the first hundred regions), the vertical axis represents the time axis (from monday to sunday, the resolution is 1 second), 1 represents a certain region 'manway rare', the crowd-sourcing fitting cannot be carried out due to too few track points or no 'time hot spot', the 'time hot spots' of each region are different and represent different properties of each region (such as office regions, residential regions and the like).
And carrying out crowd division according to the track, and obtaining a result based on crowd fitting of the overall track of the user. From the results, 182 users can be classified into 5 classes, and the user behaviors (here, behaviors indicate tracks) in each class are similar, and the user differences between different classes are large. In life, similar user recommendation can be performed according to the crowd obtained through division, and the crowd movement track space-time prediction model is applied to another aspect.

Claims (10)

1. The intelligent distributed extended crowd movement track space-time prediction model is characterized in that firstly, a movement track data prediction process is carried out; secondly, distributed movement track preprocessing comprises: carrying out crowd sensing denoising processing and track normalization standardization; thirdly, the space-time division design of the movement track; fourthly, the crowd-sourcing fitting prediction under the space-time division comprises the following steps: the method comprises the following steps of calculating the similarity of time-space track points, carrying out an crowd sourcing hierarchy fitting algorithm, carrying out an crowd sourcing distance fitting algorithm, selecting a track prediction factor and an initial crowd sourcing fitting center; fifthly, fitting the two-way pair group intelligence; sixthly, performing a crowd-sourcing fitting algorithm based on singular value decomposition;
the crowd moving track data in distributed sensing is taken as an object, different-scale data mining is carried out on a data set by adopting a space-time division mode with different granularity, different crowd fitting algorithms are integrated, the crowd fitting algorithm is improved, a space distribution model of the crowd moving track under time division and a time distribution model under space division are constructed, a periodic rule of the crowd space-time distribution in distributed sensing is extracted, a crowd moving track space-time framework is expanded, and effective information recommendation, preference analysis and behavior prediction are provided for a user;
the distributed extended crowd moving track space-time prediction process comprises the following steps:
step 1: extracting effective longitude, latitude and time data of the data set and carrying out denoising treatment;
step 2: respectively carrying out space-time granularity division according to the time-space division mode of the application, and then carrying out data normalization standardization;
and 3, step 3: selecting a track prediction factor and an initial crowd-sourcing fitting center;
and 4, step 4: performing crowd sourcing fitting prediction by taking the track prediction factor obtained in the step 3 and a crowd sourcing fitting center as an initial crowd sourcing fitting center;
and 5, step 5: performing crowd fitting on the crowd fitting in a two-way manner by using an improved method of the crowd fitting to obtain a prediction result;
and 6, step 6: decomposing the extracted data singular values, and then performing crowd fitting prediction to obtain a crowd fitting center;
and 7, step 7: and carrying out graphic visualization, and synthesizing the fitting results of the prototype crowd-sourcing fitting, the two-way pair crowd-sourcing fitting algorithm and the crowd-sourcing fitting algorithm after singular value decomposition to obtain the periodicity of the crowd moving track and the track space-time future trend.
2. The model of claim 1, wherein the analysis process of the mobile trajectory data set includes the design of crowd sensing denoising and trajectory normalization, the design of mobile trajectory space-time partition, and the crowd fitting prediction of space-time analysis based on the space-time partition, and the crowd fitting hierarchical fitting, prototype crowd fitting, two-way crowd fitting algorithm, and the crowd fitting algorithm after singular value decomposition are adopted based on the space-time partition:
firstly, extracting longitude and latitude and time, preprocessing data in a filtering mode, then carrying out time and space granularity division on the obtained movement track data, dividing the movement track data in time by seconds, hours, days, working days/rest days, and dividing the movement track data in space by one area, nine areas and one hundred areas;
secondly, performing track analysis based on a crowd-sourcing fitting algorithm, and aiming at the defects that the method is sensitive to initially set track prediction factors and central points and is easy to obtain a local optimal solution, and the like, providing a method for selecting proper initial crowd-sourcing fitting centers and track prediction factors by using a crowd-sourcing hierarchical fitting method, and further performing visual analysis after crowd-sourcing fitting;
thirdly, aiming at the problems that the crowd sourcing fitting algorithm is sensitive to the initial center and easy to partially converge, the crowd sourcing fitting algorithm is further subjected to fitting prediction optimization by utilizing two directions;
fourthly, an improved crowd-sourcing fitting algorithm is provided, the data is processed through singular value decomposition, and then the crowd-sourcing fitting algorithm is utilized for crowd-sourcing fitting;
fifthly, in the space-time distribution division design, time granularity is divided according to seconds, hours, days, working days/rest days, and space granularity is divided according to regions; and the improved crowd-sourcing fitting algorithm provided by the application is utilized to analyze the spatial distribution under the time granularity division and the time distribution under the space granularity division respectively, construct a periodic mode contained in the moving track, and analyze and predict the space hot spot of the moving track in the time granularity division mode and the time hot spot in the space granularity division mode.
3. The intelligent distributed extended crowd movement track space-time prediction model according to claim 1, characterized in that the crowd sensing denoising process: searching for noise points by a method of setting a critical value, if some values of some two trace points are larger than the set critical value, presuming the trace points as the noise points, and the specific method is as follows: calculating the moving speed through the distance or time interval between a certain point and the next point at a certain moment, if the moving speed is larger than a set critical value, determining the point as a noise point, if some points are regarded as noise points, further analyzing the noise points, and if the time interval of the points is longer in a period of time or exceeds a certain distance in space, judging that the points are not the noise points, wherein the positions of the points are only the behavior mode of the motion of the user;
track normalization: the moving track data comprises longitude, latitude and time, the measuring standards are different, the longitude ranges from-90 to 90, the latitude ranges from-180 to 180, the time is 0 to 24 measured by hour and 1 to 7 measured by week, when the space-time distribution of the moving track is analyzed, the moving track is subjected to standard unification, and the result is convenient to analyze in a unified interval by the normalization;
for the movement trajectory data x (F), F is 1,2,3, …, N, y (F) after normalization is formula 1:
Figure FDA0003611306570000021
and transforming the measuring standard of longitude, latitude and time to be between-1 and 1.
4. The intelligent distributed extended crowd movement track space-time prediction model according to claim 1, characterized in that the movement track space-time partition design is as follows: the distributed expanded movement track is divided according to the difference of time and space granularity, the space-time granularity is the hierarchical size of time and space, the space large granularity is defined as the larger interval of a space region, the time large granularity is defined as the long interval of a time period, and the space-time distribution condition of a user in a certain large-range activity region and a certain week is mastered; the small space-time granularity corresponds to the regional activity condition of a certain office building and the space-time distribution condition of a certain hour;
dividing the time by seconds, hours, days, working days/rest days respectively, and corresponding to the distribution condition of time granularity on the space; in space, utilizing track normalization standardization to expand and reduce longitude and latitude of the space, dividing the space into regions according to grids, considering the space effect on time distribution and the time modes divided by different regions, processing and analyzing the moving track data by a method combining space-time division and crowd fitting to obtain the specific result of space-time division, and obtaining the crowd moving track mode through a space-time conclusion;
when the space-time division is carried out by the method 1, namely, the time is divided by seconds, hours, days, working days/rest days respectively, and meanwhile, crowd-sourcing fitting is carried out to see the effect of the space-time division, and the space distribution mode of the user under different time granularity division is deduced from the space distribution mode;
when the space-time method is divided by the 2 nd method, namely, the grid granularity is expanded and contracted in space, the space is divided into regions according to the longitude and latitude, after the regions are divided according to the space granularity, the space concept is ignored, and then the crowd fit of time is carried out in each region, so that the time distribution under different space granularities can be determined.
5. The intelligent distributed extended crowd movement track space-time prediction model according to claim 1, characterized in that the space-time track point approximation degree is calculated as follows: the time space is taken as a sequence to measure the similarity, the similarity is measured through the numerical values of two objects, the similarity is judged through the distance between two points when the similarity of the tracks is judged, the space-time track point similarity calculation is based on the cosine distance, the measurement mode of the similarity is determined through the cosine values of the included angles of the vectors on the basis of the vectors, and the smaller the result value is, the more dissimilar the objects are; otherwise, the more similar the formula is calculated as formula 2, where x is 1 And y 1 Respectively representing the abscissa, x, of the first object 2 And y 2 Respectively, the abscissa and ordinate of the second object:
Figure FDA0003611306570000031
the cosine similarity is measured by depending on the space orientation of the object, and the track point approximation is measured by utilizing the characteristic that the cosine similarity is insensitive to numerical values in the analysis process of the moving track.
6. The intelligent distributed extended crowd movement track space-time prediction model according to claim 1, characterized in that a crowd-sourcing hierarchy fitting algorithm: initially, each sample point is taken as a track class, then the classes are gradually merged according to a certain rule, and the distance between one sample point in the track class C1 and one sample point in the track class C2 is the shortest euclidean distance between all sample points in different classes, so that the track class C1 and the track class C2 are considered to be similarly mergeable, which is specifically described as follows:
inputting: f, target track class number, G, sample point set output: f track class sets
The method comprises the following steps: step one, each sample point in G is taken as a track class;
secondly, calculating the distance between every two track classes;
thirdly, combining the two classes with the minimum distance into a track class;
fourthly, repeating the second step and the third step;
and fifthly, until the number of the track classes is F.
7. The intelligent distributed extended crowd movement track space-time prediction model according to claim 1, characterized in that a crowd-sourcing distance fitting algorithm: the method comprises the following steps of taking the Euclidean distance as a similarity measure, inspecting the distance from a sample point to a fitting center, taking the square sum CB of errors as a rule function of the Euclidean distance, approximately solving the fitting center capable of minimizing the square sum of errors through iterative optimization based on a greedy strategy, and performing an algorithm process:
the first process is as follows: reading in M sample sets and the number F of track classes;
and a second process: randomly selecting F samples from the M samples as seed points;
the third process: calculating the distances from all other points to the seed points, and dividing each point into the track class of the seed point closest to the point after comparing the sizes of all other points;
the process four is as follows: calculating new central points of all the track classes by using an averaging method;
and a fifth process: repeating the third process and the fourth process until the new center point is superposed with the previous center point or the difference is small, and jumping out of the cycle;
wherein, the end condition of the crowd-sourcing distance fitting is convergence, the fitting self ensures convergence, the condition for ensuring convergence is to satisfy the function definition of error square sum CB, and the formula is defined as formula 3:
Figure FDA0003611306570000041
where C is the given data set, if C contains F fitting subsets C 1 ,C 2 …, CF; the number of samples per fitting subset is n1, n2, …, n F (ii) a Center point partIs No. m 1 ,m 2 ,…,m F
8. The intelligent distributed extended crowd movement track space-time prediction model according to claim 1, wherein the selection of the track prediction factor and the initial crowd-sourcing fitting center is as follows: carrying out crowd-sourcing hierarchy fitting by using a sampling mode to find a proper track prediction factor and an initial central point, visually reflecting the proper crowd-sourcing hierarchy fitting track prediction factor through a dendrogram, and specifically selecting the track prediction factor and the initial central point by the following method:
the method comprises the following steps: sampling the original data;
step two: carrying out crowd-sourcing hierarchical fitting on the sampled points, and obtaining a suitable crowd-sourcing fitting trajectory prediction factor and a corresponding crowd-sourcing fitting center of the data set through the crowd-sourcing hierarchical fitting;
step three: and repeating the first step and the second step, jumping out of the cycle when the cycle times are less than a critical value k, continuing the cycle, jumping out of the cycle to output a final value when the cycle times are greater than the critical value, selecting the track prediction factor of each time in the statistical result as the final value for the jumping out of the cycle, selecting the F with the most times as the most suitable F, and then randomly selecting a group of group intelligence fitting centers as the suitable initial seed points in the group intelligence fitting centers corresponding to the track prediction factors.
9. The intelligent distributed extended crowd movement track space-time prediction model according to claim 1, characterized in that the two-way fitting to crowd intelligence is:
step 1: initializing raw data into a class;
step 2: setting F to be 2, and dividing the original class into two classes by using a crowd-sourcing fitting algorithm;
and step 3: then, selecting a track class on the basis of the step 2, and dividing the track class into two classes by using the method in the step 2 similarly, wherein the three classes exist at the moment;
and 4, step 4: selecting one of the three classes in the step 3, and repeating the operation in the step 2;
and 5: repeating the step 4 and the step 2 until the number F of the track classes is divided into the required track classes, and jumping out of the cycle;
in the process of step 3, an effective class needs to be found in step 2 for class division, and the class with the largest value of the CB is selected for division by calculating CB values of different track classes.
10. The intelligent distributed extended crowd movement track space-time prediction model according to claim 1, characterized in that a crowd fitting algorithm based on singular value decomposition: through dimension reduction and simplification of data, noise of track data is reduced, discrete points are reduced, global convergence is accelerated, the data are easier to analyze, and a singular value decomposition formula represented by a formula 4 is as follows:
A=U∑V T formula 4
In the formula, U is a matrix of n × n, sigma is a diagonal matrix of n × m, sigma i Is a number on the diagonal. VT is the transposition of V, is an m × m order matrix, A is an n × m order matrix, and the formula is the singular value decomposition of A;
after singular value decomposition, crowd-sourcing trajectory fitting is performed on the basis of the singular value decomposition.
CN202210429821.1A 2022-04-23 2022-04-23 Intelligent distributed extended crowd movement track space-time prediction model Pending CN114817669A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210429821.1A CN114817669A (en) 2022-04-23 2022-04-23 Intelligent distributed extended crowd movement track space-time prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210429821.1A CN114817669A (en) 2022-04-23 2022-04-23 Intelligent distributed extended crowd movement track space-time prediction model

Publications (1)

Publication Number Publication Date
CN114817669A true CN114817669A (en) 2022-07-29

Family

ID=82505437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210429821.1A Pending CN114817669A (en) 2022-04-23 2022-04-23 Intelligent distributed extended crowd movement track space-time prediction model

Country Status (1)

Country Link
CN (1) CN114817669A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117520906A (en) * 2024-01-05 2024-02-06 北京航空航天大学 Crowd classification method and system based on different characteristics of crowd travel activity entropy

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117520906A (en) * 2024-01-05 2024-02-06 北京航空航天大学 Crowd classification method and system based on different characteristics of crowd travel activity entropy
CN117520906B (en) * 2024-01-05 2024-03-12 北京航空航天大学 Crowd classification method and system based on different characteristics of crowd travel activity entropy

Similar Documents

Publication Publication Date Title
CN107396304B (en) Real-time urban population density and population mobility estimation method based on smart phone
Soh et al. Adaptive deep learning-based air quality prediction model using the most relevant spatial-temporal relations
Lv et al. Big data driven hidden Markov model based individual mobility prediction at points of interest
Zhao et al. Understanding the bias of call detail records in human mobility research
Rathore et al. Real-time urban microclimate analysis using internet of things
Yilmaz et al. Radio environment map as enabler for practical cognitive radio networks
Zhang et al. Statistics-based outlier detection for wireless sensor networks
Vieira et al. Characterizing dense urban areas from mobile phone-call data: Discovery and social dynamics
Nurmi et al. Identifying meaningful places: The non-parametric way
González-Vidal et al. Missing data imputation with bayesian maximum entropy for internet of things applications
Yang et al. ImgSensingNet: UAV vision guided aerial-ground air quality sensing system
CN107977673B (en) Economic activity population identification method based on big data
Do et al. Graph-deep-learning-based inference of fine-grained air quality from mobile IoT sensors
Li et al. Supreme: Fine-grained radio map reconstruction via spatial-temporal fusion network
Zou et al. Air quality prediction based on a spatiotemporal attention mechanism
KR20210086786A (en) System and method for predicting fine dust and odor
Zheng et al. TelcoFlow: Visual exploration of collective behaviors based on telco data
Yu et al. Understanding urban dynamics based on pervasive sensing: An experimental study on traffic density and air pollution
Eagle et al. Location Segmentation, Inference and Prediction for Anticipatory Computing.
Leca et al. Significant location detection & prediction in cellular networks using artificial neural networks
CN114817669A (en) Intelligent distributed extended crowd movement track space-time prediction model
Jiang et al. Crowd flow prediction for social internet-of-things systems based on the mobile network big data
Mohottige et al. Modeling classroom occupancy using data of WiFi infrastructure in a university campus
Guo et al. A deep spatio-temporal learning network for continuous citywide air quality forecast based on dense monitoring data
Alhazzani et al. Urban Attractors: Discovering patterns in regions of attraction in cities

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination