CN112765120A - Method for analyzing and extracting user movement track based on mobile phone signaling - Google Patents
Method for analyzing and extracting user movement track based on mobile phone signaling Download PDFInfo
- Publication number
- CN112765120A CN112765120A CN202011237478.8A CN202011237478A CN112765120A CN 112765120 A CN112765120 A CN 112765120A CN 202011237478 A CN202011237478 A CN 202011237478A CN 112765120 A CN112765120 A CN 112765120A
- Authority
- CN
- China
- Prior art keywords
- data
- user
- time
- track
- mobile phone
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/02—Services making use of location information
- H04W4/029—Location-based management or tracking services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/20—Services signaling; Auxiliary data signalling, i.e. transmitting data via a non-traffic channel
Abstract
The invention provides a method for analyzing and extracting a user movement track based on mobile phone signaling. Belonging to the field of big data analysis. According to the method, the user signaling data are acquired through a Flume acquisition system, invalid and drifting data and repeated data are deleted in the process of historical track analysis, then the data are further cleaned through a k-means and LOF outlier detection algorithm, the quality of the data and the accuracy of analysis are improved, the data are loaded to a Hive warehouse for off-line calculation, and the results are stored in a MySQL database and an ES search engine for subsequent query; in the process of extracting the real-time track, Kafka docking data is used, the data are transmitted to a Storm equal-flow calculation framework, the result is cached in Redis, and the track of a user is displayed in real time through GIS map software. The scheme provided by the invention can meet the requirements in different scenes and has high practicability and flexibility.
Description
Technical Field
The invention relates to the field of big data analysis, in particular to a method for analyzing and extracting a target track, and particularly relates to a method for analyzing and extracting a target movement track based on mobile phone signaling.
Background
With the coming of big data era, a large amount of data are generated in each field at a very fast speed, the data cover social security, family planning, employment, trip and other aspects, and the continuous data also bring opportunities and challenges to public security organs. The traditional database has poor expansibility, and when the data volume reaches a certain scale, the performance is greatly reduced. However, with the advent of various big data technology frameworks, a good solution is provided for solving the problem.
The development of technologies such as mobile communication and the like enables the use rate of smart phones to be increasingly popularized, and the coming of the 5G era enables the signaling data of the mobile phones to grow explosively. According to the statistical data of Ministry of industry and belief, the total number of mobile phone users of three operators reaches 15.9 hundred million users and increases by 7.3 percent on a par with 4 months in 2019; in 2019, the total number of the mobile phone base stations in China is 174 thousands, the total number of the mobile phone base stations reaches 841 thousands, the total number of the 4G base stations reaches 544 thousands, the construction of the 5G base stations is accelerated along with the start of 5G business, and the network coverage scale is continuously enlarged and improved. The mobile phone signaling has the greatest characteristic of wide coverage, contains rich information such as mobile phone serial numbers, timestamps and positions, and can be applied to the aspects of urban population distribution, travel OD analysis and the like. If the track of a specific person is extracted by using the data, particularly the position information, the method has important practical significance for fighting crimes and improving the case handling efficiency.
Disclosure of Invention
Aiming at the problems, the invention provides a specific target track analysis system which can meet the requirements of specific personnel track analysis and extraction under different scenes such as off-line query, real-time query and the like, and can be displayed to a user through map visualization software.
The technical scheme of the invention is as follows: a method for analyzing and extracting a user movement track based on mobile phone signaling comprises the following specific steps:
step (1.1), firstly, acquiring and storing a mobile phone signaling data source through a flash data acquisition system;
step (1.2), aiming at the analysis of the target historical track, a Flume data acquisition system acquires and stores the stored historical data into an HDFS distributed file system;
step (1.3), data cleaning is carried out on historical data stored in an HDFS distributed file system;
step (1.4), after the data cleaning is finished, loading the data into a Hive data warehouse, calculating and analyzing the off-line historical track of each user according to the data of each user,
step (1.5), storing the offline historical track data of the user into a MySQL relational database, partitioning according to days, and using the partitioned data for subsequent query or loading the partitioned data into an ES search engine to facilitate quick retrieval;
step (1.6), aiming at the analysis of the target real-time track, subscribing the real-time signaling data through a Kafka message system,
step (1.7), analyzing the real-time signaling data through a Storm streaming computation framework, and recording the state of a user: i.e. current location and time of occurrence; when a new piece of data and a target position are obtained and changed, updating the state information of the user, and calculating real-time track sequence data of the user;
and (1.8) caching the real-time track sequence data of the user into the Redis, and displaying the track of the user according to the time sequence through GIS map software.
Further, in step (1.1), the real-time data source includes device data, system data set and other data.
Further, in step (1.3), the specific operation steps of performing data cleansing on the historical data stored in the HDFS distributed file system are as follows:
(1.3.1), field missing data: the mobile phone signaling data comprises an imsi mobile phone serial number, a timestamp and base station longitude and latitude, and records of missing field information are deleted;
(1.3.2), drift data: firstly, setting a threshold, calculating the distance and time difference between two base stations to obtain the user speed, comparing the user speed with the threshold, and if the user speed is greater than the threshold, indicating that the user does not leave the range of the current base station;
(1.3.3), repeating data: according to the repeated records of the longitude and latitude of the user mobile phone signaling, two records with earliest and latest retention time are reserved, namely the appearance time and the departure time of the user in the signal range of the base station, and the rest records are deleted;
(1.3.4), outlier data points: preprocessing is carried out by using a k-means clustering algorithm, non-outlier data are filtered, then outliers are detected in the rest data by using an LOF outlier detection algorithm, and the outliers are deleted.
Further, in step (1.4), the historical track of the user includes imsi, location of the base station, and presence time and departure time of the user.
Further, in step (1.6), the Kafka messaging system is a distributed high-throughput messaging publish-subscribe system for storing real-time data.
The invention has the beneficial effects that: (1) missing, repeated, drifting and noise data in the data set are removed, and high quality of the data is guaranteed, accuracy of trajectory analysis is improved, and calculated amount is reduced through cleaning and preprocessing of source data;
(2) and the result of the real-time calculation is cached in a Redis database, so that the client can quickly respond when inquiring.
(3) The offline track and the real-time track coexist, and the possible missing problems can be made up through mutual comparison. The selection can be carried out according to different scenes, and the flexibility is higher.
Drawings
FIG. 1 is a flow chart of the architecture of the present invention;
FIG. 2 is a diagram illustrating a method for detecting outliers in a data set according to an embodiment of the present invention.
Detailed Description
In order to more clearly illustrate the technical solution of the present invention, the following detailed description is made with reference to the accompanying drawings:
the invention provides a method for extracting the track of a user based on big data of mobile phone signaling, and the real-time track and the off-line track of a target can be extracted according to different scenes.
As shown in fig. 1, step (1.1), firstly, a cell phone signaling data source is collected and stored through a Flume data collection system;
step (1.2), aiming at the analysis of a target historical track, a FlumeFlume data acquisition system acquires and stores stored historical data into an HDFS distributed file system;
step (1.3), data cleaning is carried out on historical data stored in an HDFS distributed file system;
step (1.4), after the data cleaning is finished, loading the data into a Hive data warehouse, calculating and analyzing the off-line historical track of each user according to the data of each user,
step (1.5), storing the offline historical track data of the user into a MySQL relational database, partitioning according to days, and using the partitioned data for subsequent query or loading the partitioned data into an ES search engine to facilitate quick retrieval;
step (1.6), aiming at the analysis of the target historical track, subscribing the real-time signaling data through a Kafka message system,
step (1.7), analyzing the real-time signaling data through a Storm streaming computation framework, and recording the state of a user: i.e. current location and time of occurrence;
when a new piece of data and a target position are obtained and changed, updating the state information of the user, and calculating real-time track sequence data of the user;
and (1.8) caching the real-time track sequence data of the user into the Redis, and displaying the track of the user according to the time sequence through GIS map software.
Further, in step (1.1), the real-time data source includes device data, system data set and other data.
Further, in step (1.3), the specific operation steps of performing data cleansing on the historical data stored in the HDFS distributed file system are as follows:
(1.3.1), field missing data: taking the imsi as a unique identifier of the user, grouping the data according to the imsi, and sequencing the data according to the ascending order of time, thereby obtaining the time-ordered signaling data of the user;
the signaling data comprises an imsi mobile phone serial number, a timestamp and base station longitude and latitude, and records of missing field information are deleted;
(1.3.2), drift data: for the condition that data drift is caused by base station switching, firstly setting a threshold, then obtaining the speed of a user according to the distance and time difference between two base stations, comparing the speed with the threshold, and if the speed is larger than the threshold, indicating that the user does not leave the range of the current base station;
specifically, based on the long-distance handover of the base station, the speed is very high when the data drifts, and the determination can be performed according to the speed; assuming that the longitude and latitude coordinates of two base stations are (lonA, latA) and (lonB, latB), respectively, their distances can be calculated by the following formula:
sin(latA)*sin(latB)*cos(lonA-lonB)+cos(latA)*cos(latB)
Distance=R*Arccos(C)*π/180
wherein R is the radius of the earth, and R is 6371 km;
after the distance is obtained, calculating the speed V through the time interval between the ith record and the i +1 th record, and if V is greater than a set threshold value V, wherein the value of V is 120km/h, considering that the i +1 th record is drift data and deleting the drift data; this process is repeated, traversing all the data for each user until the end.
(1.3.3), repeating data: the records repeated in latitude in each group determine that the user does not move, the base station is a stop point of the user, and the two records with earliest and latest retention time are reserved, namely the appearance time and departure time of the user in a base station signal range, and the rest records are deleted;
(1.3.4), data points for outliers: that is, noise data in the data set is filtered, preprocessing is performed by using a k-means clustering algorithm, data points which are non-outliers are filtered, and then outliers, that is, noise data, are found out from the remaining data by using an LOF outlier detection algorithm and are deleted.
Further, in step (1.4), the historical track of the user includes imsi, location of the base station, and presence time and departure time of the user.
Further, in step (1.6), the Kafka messaging system is a distributed high-throughput messaging publish-subscribe system for storing real-time data.
The specific embodiment is as follows: as shown in fig. 2, for a certain user, the k-means improved LOF algorithm is used to detect outlier noise data, and the specific steps are as follows:
the method comprises the following steps: selecting k samples as initial clustering center a ═ a1,a2,…ak;
Step two: for each sample x in the datasetiCalculating the distances from the cluster centers to the k cluster centers and dividing the cluster centers into the classes corresponding to the cluster centers with the minimum distances;
step three: for each class, recalculating its cluster center, i.e., the centroid of all samples of the class; the Euclidean space uses the sum of squares of errors as a clustering objective function;
step four: for each cluster, acquiring the farthest distance between a data point in the cluster and the center of the cluster, and marking as m; then, for any data point in the cluster, the distance from the center of the cluster is denoted as d, and then the outlier α is defined as:
step five: setting an outlier coefficient threshold R, judging the data points with the outlier coefficient alpha smaller than the threshold R as non-outliers, and not participating in the subsequent calculation process;
step six: repeating the third, fourth and fifth steps until the maximum iteration times are reached and the end is reached;
step seven: based on the calculated result of the k-means, obtaining a candidate set of outliers, and then detecting the outliers by using an LOF algorithm;
step eight: a local outlier factor is calculated for each data point,
if this value is greater than 1, indicating that the density of O is less than its neighborhood point density, O may be an outlier;
step nine: and setting an outlier coefficient threshold value P, if the outlier coefficient calculated in the previous step is larger than P, determining the data point as an outlier, and deleting the outlier.
Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of embodiments of the present invention; other variations are possible within the scope of the invention; thus, by way of example, and not limitation, alternative configurations of embodiments of the invention may be considered consistent with the teachings of the present invention; accordingly, the embodiments of the invention are not limited to the embodiments explicitly described and depicted.
Claims (5)
1. A method for analyzing and extracting a user movement track based on mobile phone signaling is characterized by comprising the following specific steps:
step (1.1), firstly, acquiring and storing a mobile phone signaling data source through a flash data acquisition system;
step (1.2), aiming at the analysis of the target historical track, a Flume data acquisition system acquires and stores the stored historical data into an HDFS distributed file system;
step (1.3), data cleaning is carried out on historical data stored in an HDFS distributed file system;
step (1.4), after the data cleaning is finished, loading the data into a Hive data warehouse, calculating and analyzing the off-line historical track of each user according to the data of each user,
step (1.5), storing the offline historical track data of the user into a MySQL relational database, partitioning according to days, and using the partitioned data for subsequent query or loading the partitioned data into an ES search engine to facilitate quick retrieval;
step (1.6), aiming at the analysis of the target real-time track, subscribing the real-time signaling data through a Kafka message system,
step (1.7), analyzing the real-time signaling data through a Storm streaming computation framework, and recording the state of a user: i.e. current location and time of occurrence; when a new piece of data and a target position are obtained and changed, updating the state information of the user, and calculating real-time track sequence data of the user;
and (1.8) caching the real-time track sequence data of the user into the Redis, and displaying the track of the user according to the time sequence through GIS map software.
2. The method for analyzing and extracting the user movement track based on the mobile phone signaling as claimed in claim 1, wherein in step (1.1), the real-time data source includes device data, system data set and other data.
3. The method for analyzing and extracting the user movement track based on the mobile phone signaling as claimed in claim 1, wherein in step (1.3), the specific operation steps of performing data cleansing on the historical data stored in the HDFS distributed file system are as follows:
(1.3.1), field missing data: the mobile phone signaling data comprises an imsi mobile phone serial number, a timestamp and base station longitude and latitude, and records of missing field information are deleted;
(1.3.2), drift data: firstly, setting a threshold, calculating the distance and time difference between two base stations to obtain the user speed, comparing the user speed with the threshold, and if the user speed is greater than the threshold, indicating that the user does not leave the range of the current base station;
(1.3.3), repeating data: according to the repeated records of the longitude and latitude of the user mobile phone signaling, two records with earliest and latest retention time are reserved, namely the appearance time and the departure time of the user in the signal range of the base station, and the rest records are deleted;
(1.3.4), outlier data points: preprocessing is carried out by using a k-means clustering algorithm, non-outlier data are filtered, then outliers are detected in the rest data by using an LOF outlier detection algorithm, and the outliers are deleted.
4. The method for analyzing and extracting the moving track of the user based on the handset signaling as claimed in claim 1, wherein in step (1.4), the historical track of the user comprises imsi, location of the base station, and presence time and departure time of the user.
5. The method for analyzing and extracting a user's movement track based on mobile phone signaling as claimed in claim 1, wherein in step (1.6), the Kafka messaging system is a distributed high-throughput message publish-subscribe system for storing real-time data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011237478.8A CN112765120A (en) | 2020-11-09 | 2020-11-09 | Method for analyzing and extracting user movement track based on mobile phone signaling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011237478.8A CN112765120A (en) | 2020-11-09 | 2020-11-09 | Method for analyzing and extracting user movement track based on mobile phone signaling |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112765120A true CN112765120A (en) | 2021-05-07 |
Family
ID=75693071
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011237478.8A Withdrawn CN112765120A (en) | 2020-11-09 | 2020-11-09 | Method for analyzing and extracting user movement track based on mobile phone signaling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112765120A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114245314A (en) * | 2021-12-17 | 2022-03-25 | 高创安邦(北京)技术有限公司 | Personnel trajectory correction method and device, storage medium and electronic equipment |
CN116822779A (en) * | 2023-02-06 | 2023-09-29 | 长安大学 | Expressway motor vehicle carbon emission calculation method based on mobile phone signaling data |
-
2020
- 2020-11-09 CN CN202011237478.8A patent/CN112765120A/en not_active Withdrawn
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114245314A (en) * | 2021-12-17 | 2022-03-25 | 高创安邦(北京)技术有限公司 | Personnel trajectory correction method and device, storage medium and electronic equipment |
CN114245314B (en) * | 2021-12-17 | 2024-01-05 | 高创安邦(北京)技术有限公司 | Personnel track correction method and device, storage medium and electronic equipment |
CN116822779A (en) * | 2023-02-06 | 2023-09-29 | 长安大学 | Expressway motor vehicle carbon emission calculation method based on mobile phone signaling data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106931974B (en) | Method for calculating personal commuting distance based on mobile terminal GPS positioning data record | |
CN106530716B (en) | The method for calculating express highway section average speed based on mobile phone signaling data | |
CN108536851B (en) | User identity recognition method based on moving track similarity comparison | |
CN112182410B (en) | User travel mode mining method based on space-time track knowledge graph | |
CN110188093A (en) | A kind of data digging system being directed to AIS information source based on big data platform | |
CN109446186B (en) | Social relation judgment method based on movement track | |
CN109885643B (en) | Position prediction method based on semantic track and storage medium | |
CN106156528B (en) | A kind of track data stops recognition methods and system | |
CN109684384B (en) | Trajectory data space-time density analysis system and analysis method thereof | |
CN107277765A (en) | A kind of mobile phone signaling track preprocess method based on cluster Outlier Analysis | |
CN110457315A (en) | A kind of group's accumulation mode analysis method and system based on user trajectory data | |
CN112013862B (en) | Pedestrian network extraction and updating method based on crowdsourcing trajectory | |
CN108882172B (en) | Indoor moving trajectory data prediction method based on HMM model | |
CN112765120A (en) | Method for analyzing and extracting user movement track based on mobile phone signaling | |
CN105243148A (en) | Checkin data based spatial-temporal trajectory similarity measurement method and system | |
CN111209457B (en) | Target typical activity pattern deviation warning method | |
Li et al. | Robust inferences of travel paths from GPS trajectories | |
CN108566620B (en) | Indoor positioning method based on WIFI | |
CN111954160A (en) | Method for converting two-dimensional mobile phone signaling data into three-dimensional space trajectory data | |
CN110275911A (en) | Private car trip hotspot path method for digging based on Frequent Sequential Patterns | |
CN111475746B (en) | Point-of-interest mining method, device, computer equipment and storage medium | |
CN116132923A (en) | High-precision space-time track restoration method based on mobile phone signaling data | |
CN116403139A (en) | Visual tracking and positioning method based on target detection | |
Moreira et al. | The impact of data quality in the context of pedestrian movement analysis | |
CN110232067B (en) | Co-generation group discovery method based on BHR-Tree index |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210507 |
|
WW01 | Invention patent application withdrawn after publication |