CN112765120A - Method for analyzing and extracting user movement track based on mobile phone signaling - Google Patents

Method for analyzing and extracting user movement track based on mobile phone signaling Download PDF

Info

Publication number
CN112765120A
CN112765120A CN202011237478.8A CN202011237478A CN112765120A CN 112765120 A CN112765120 A CN 112765120A CN 202011237478 A CN202011237478 A CN 202011237478A CN 112765120 A CN112765120 A CN 112765120A
Authority
CN
China
Prior art keywords
data
user
time
track
mobile phone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011237478.8A
Other languages
Chinese (zh)
Inventor
何利文
赵金城
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202011237478.8A priority Critical patent/CN112765120A/en
Publication of CN112765120A publication Critical patent/CN112765120A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/20Services signaling; Auxiliary data signalling, i.e. transmitting data via a non-traffic channel

Abstract

The invention provides a method for analyzing and extracting a user movement track based on mobile phone signaling. Belonging to the field of big data analysis. According to the method, the user signaling data are acquired through a Flume acquisition system, invalid and drifting data and repeated data are deleted in the process of historical track analysis, then the data are further cleaned through a k-means and LOF outlier detection algorithm, the quality of the data and the accuracy of analysis are improved, the data are loaded to a Hive warehouse for off-line calculation, and the results are stored in a MySQL database and an ES search engine for subsequent query; in the process of extracting the real-time track, Kafka docking data is used, the data are transmitted to a Storm equal-flow calculation framework, the result is cached in Redis, and the track of a user is displayed in real time through GIS map software. The scheme provided by the invention can meet the requirements in different scenes and has high practicability and flexibility.

Description

Method for analyzing and extracting user movement track based on mobile phone signaling
Technical Field
The invention relates to the field of big data analysis, in particular to a method for analyzing and extracting a target track, and particularly relates to a method for analyzing and extracting a target movement track based on mobile phone signaling.
Background
With the coming of big data era, a large amount of data are generated in each field at a very fast speed, the data cover social security, family planning, employment, trip and other aspects, and the continuous data also bring opportunities and challenges to public security organs. The traditional database has poor expansibility, and when the data volume reaches a certain scale, the performance is greatly reduced. However, with the advent of various big data technology frameworks, a good solution is provided for solving the problem.
The development of technologies such as mobile communication and the like enables the use rate of smart phones to be increasingly popularized, and the coming of the 5G era enables the signaling data of the mobile phones to grow explosively. According to the statistical data of Ministry of industry and belief, the total number of mobile phone users of three operators reaches 15.9 hundred million users and increases by 7.3 percent on a par with 4 months in 2019; in 2019, the total number of the mobile phone base stations in China is 174 thousands, the total number of the mobile phone base stations reaches 841 thousands, the total number of the 4G base stations reaches 544 thousands, the construction of the 5G base stations is accelerated along with the start of 5G business, and the network coverage scale is continuously enlarged and improved. The mobile phone signaling has the greatest characteristic of wide coverage, contains rich information such as mobile phone serial numbers, timestamps and positions, and can be applied to the aspects of urban population distribution, travel OD analysis and the like. If the track of a specific person is extracted by using the data, particularly the position information, the method has important practical significance for fighting crimes and improving the case handling efficiency.
Disclosure of Invention
Aiming at the problems, the invention provides a specific target track analysis system which can meet the requirements of specific personnel track analysis and extraction under different scenes such as off-line query, real-time query and the like, and can be displayed to a user through map visualization software.
The technical scheme of the invention is as follows: a method for analyzing and extracting a user movement track based on mobile phone signaling comprises the following specific steps:
step (1.1), firstly, acquiring and storing a mobile phone signaling data source through a flash data acquisition system;
step (1.2), aiming at the analysis of the target historical track, a Flume data acquisition system acquires and stores the stored historical data into an HDFS distributed file system;
step (1.3), data cleaning is carried out on historical data stored in an HDFS distributed file system;
step (1.4), after the data cleaning is finished, loading the data into a Hive data warehouse, calculating and analyzing the off-line historical track of each user according to the data of each user,
step (1.5), storing the offline historical track data of the user into a MySQL relational database, partitioning according to days, and using the partitioned data for subsequent query or loading the partitioned data into an ES search engine to facilitate quick retrieval;
step (1.6), aiming at the analysis of the target real-time track, subscribing the real-time signaling data through a Kafka message system,
step (1.7), analyzing the real-time signaling data through a Storm streaming computation framework, and recording the state of a user: i.e. current location and time of occurrence; when a new piece of data and a target position are obtained and changed, updating the state information of the user, and calculating real-time track sequence data of the user;
and (1.8) caching the real-time track sequence data of the user into the Redis, and displaying the track of the user according to the time sequence through GIS map software.
Further, in step (1.1), the real-time data source includes device data, system data set and other data.
Further, in step (1.3), the specific operation steps of performing data cleansing on the historical data stored in the HDFS distributed file system are as follows:
(1.3.1), field missing data: the mobile phone signaling data comprises an imsi mobile phone serial number, a timestamp and base station longitude and latitude, and records of missing field information are deleted;
(1.3.2), drift data: firstly, setting a threshold, calculating the distance and time difference between two base stations to obtain the user speed, comparing the user speed with the threshold, and if the user speed is greater than the threshold, indicating that the user does not leave the range of the current base station;
(1.3.3), repeating data: according to the repeated records of the longitude and latitude of the user mobile phone signaling, two records with earliest and latest retention time are reserved, namely the appearance time and the departure time of the user in the signal range of the base station, and the rest records are deleted;
(1.3.4), outlier data points: preprocessing is carried out by using a k-means clustering algorithm, non-outlier data are filtered, then outliers are detected in the rest data by using an LOF outlier detection algorithm, and the outliers are deleted.
Further, in step (1.4), the historical track of the user includes imsi, location of the base station, and presence time and departure time of the user.
Further, in step (1.6), the Kafka messaging system is a distributed high-throughput messaging publish-subscribe system for storing real-time data.
The invention has the beneficial effects that: (1) missing, repeated, drifting and noise data in the data set are removed, and high quality of the data is guaranteed, accuracy of trajectory analysis is improved, and calculated amount is reduced through cleaning and preprocessing of source data;
(2) and the result of the real-time calculation is cached in a Redis database, so that the client can quickly respond when inquiring.
(3) The offline track and the real-time track coexist, and the possible missing problems can be made up through mutual comparison. The selection can be carried out according to different scenes, and the flexibility is higher.
Drawings
FIG. 1 is a flow chart of the architecture of the present invention;
FIG. 2 is a diagram illustrating a method for detecting outliers in a data set according to an embodiment of the present invention.
Detailed Description
In order to more clearly illustrate the technical solution of the present invention, the following detailed description is made with reference to the accompanying drawings:
the invention provides a method for extracting the track of a user based on big data of mobile phone signaling, and the real-time track and the off-line track of a target can be extracted according to different scenes.
As shown in fig. 1, step (1.1), firstly, a cell phone signaling data source is collected and stored through a Flume data collection system;
step (1.2), aiming at the analysis of a target historical track, a FlumeFlume data acquisition system acquires and stores stored historical data into an HDFS distributed file system;
step (1.3), data cleaning is carried out on historical data stored in an HDFS distributed file system;
step (1.4), after the data cleaning is finished, loading the data into a Hive data warehouse, calculating and analyzing the off-line historical track of each user according to the data of each user,
step (1.5), storing the offline historical track data of the user into a MySQL relational database, partitioning according to days, and using the partitioned data for subsequent query or loading the partitioned data into an ES search engine to facilitate quick retrieval;
step (1.6), aiming at the analysis of the target historical track, subscribing the real-time signaling data through a Kafka message system,
step (1.7), analyzing the real-time signaling data through a Storm streaming computation framework, and recording the state of a user: i.e. current location and time of occurrence;
when a new piece of data and a target position are obtained and changed, updating the state information of the user, and calculating real-time track sequence data of the user;
and (1.8) caching the real-time track sequence data of the user into the Redis, and displaying the track of the user according to the time sequence through GIS map software.
Further, in step (1.1), the real-time data source includes device data, system data set and other data.
Further, in step (1.3), the specific operation steps of performing data cleansing on the historical data stored in the HDFS distributed file system are as follows:
(1.3.1), field missing data: taking the imsi as a unique identifier of the user, grouping the data according to the imsi, and sequencing the data according to the ascending order of time, thereby obtaining the time-ordered signaling data of the user;
the signaling data comprises an imsi mobile phone serial number, a timestamp and base station longitude and latitude, and records of missing field information are deleted;
(1.3.2), drift data: for the condition that data drift is caused by base station switching, firstly setting a threshold, then obtaining the speed of a user according to the distance and time difference between two base stations, comparing the speed with the threshold, and if the speed is larger than the threshold, indicating that the user does not leave the range of the current base station;
specifically, based on the long-distance handover of the base station, the speed is very high when the data drifts, and the determination can be performed according to the speed; assuming that the longitude and latitude coordinates of two base stations are (lonA, latA) and (lonB, latB), respectively, their distances can be calculated by the following formula:
sin(latA)*sin(latB)*cos(lonA-lonB)+cos(latA)*cos(latB)
Distance=R*Arccos(C)*π/180
wherein R is the radius of the earth, and R is 6371 km;
after the distance is obtained, calculating the speed V through the time interval between the ith record and the i +1 th record, and if V is greater than a set threshold value V, wherein the value of V is 120km/h, considering that the i +1 th record is drift data and deleting the drift data; this process is repeated, traversing all the data for each user until the end.
(1.3.3), repeating data: the records repeated in latitude in each group determine that the user does not move, the base station is a stop point of the user, and the two records with earliest and latest retention time are reserved, namely the appearance time and departure time of the user in a base station signal range, and the rest records are deleted;
(1.3.4), data points for outliers: that is, noise data in the data set is filtered, preprocessing is performed by using a k-means clustering algorithm, data points which are non-outliers are filtered, and then outliers, that is, noise data, are found out from the remaining data by using an LOF outlier detection algorithm and are deleted.
Further, in step (1.4), the historical track of the user includes imsi, location of the base station, and presence time and departure time of the user.
Further, in step (1.6), the Kafka messaging system is a distributed high-throughput messaging publish-subscribe system for storing real-time data.
The specific embodiment is as follows: as shown in fig. 2, for a certain user, the k-means improved LOF algorithm is used to detect outlier noise data, and the specific steps are as follows:
the method comprises the following steps: selecting k samples as initial clustering center a ═ a1,a2,…ak
Step two: for each sample x in the datasetiCalculating the distances from the cluster centers to the k cluster centers and dividing the cluster centers into the classes corresponding to the cluster centers with the minimum distances;
step three: for each class, recalculating its cluster center, i.e., the centroid of all samples of the class; the Euclidean space uses the sum of squares of errors as a clustering objective function;
Figure RE-GDA0002974248030000041
step four: for each cluster, acquiring the farthest distance between a data point in the cluster and the center of the cluster, and marking as m; then, for any data point in the cluster, the distance from the center of the cluster is denoted as d, and then the outlier α is defined as:
Figure RE-GDA0002974248030000051
step five: setting an outlier coefficient threshold R, judging the data points with the outlier coefficient alpha smaller than the threshold R as non-outliers, and not participating in the subsequent calculation process;
step six: repeating the third, fourth and fifth steps until the maximum iteration times are reached and the end is reached;
step seven: based on the calculated result of the k-means, obtaining a candidate set of outliers, and then detecting the outliers by using an LOF algorithm;
step eight: a local outlier factor is calculated for each data point,
Figure RE-GDA0002974248030000052
if this value is greater than 1, indicating that the density of O is less than its neighborhood point density, O may be an outlier;
step nine: and setting an outlier coefficient threshold value P, if the outlier coefficient calculated in the previous step is larger than P, determining the data point as an outlier, and deleting the outlier.
Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of embodiments of the present invention; other variations are possible within the scope of the invention; thus, by way of example, and not limitation, alternative configurations of embodiments of the invention may be considered consistent with the teachings of the present invention; accordingly, the embodiments of the invention are not limited to the embodiments explicitly described and depicted.

Claims (5)

1. A method for analyzing and extracting a user movement track based on mobile phone signaling is characterized by comprising the following specific steps:
step (1.1), firstly, acquiring and storing a mobile phone signaling data source through a flash data acquisition system;
step (1.2), aiming at the analysis of the target historical track, a Flume data acquisition system acquires and stores the stored historical data into an HDFS distributed file system;
step (1.3), data cleaning is carried out on historical data stored in an HDFS distributed file system;
step (1.4), after the data cleaning is finished, loading the data into a Hive data warehouse, calculating and analyzing the off-line historical track of each user according to the data of each user,
step (1.5), storing the offline historical track data of the user into a MySQL relational database, partitioning according to days, and using the partitioned data for subsequent query or loading the partitioned data into an ES search engine to facilitate quick retrieval;
step (1.6), aiming at the analysis of the target real-time track, subscribing the real-time signaling data through a Kafka message system,
step (1.7), analyzing the real-time signaling data through a Storm streaming computation framework, and recording the state of a user: i.e. current location and time of occurrence; when a new piece of data and a target position are obtained and changed, updating the state information of the user, and calculating real-time track sequence data of the user;
and (1.8) caching the real-time track sequence data of the user into the Redis, and displaying the track of the user according to the time sequence through GIS map software.
2. The method for analyzing and extracting the user movement track based on the mobile phone signaling as claimed in claim 1, wherein in step (1.1), the real-time data source includes device data, system data set and other data.
3. The method for analyzing and extracting the user movement track based on the mobile phone signaling as claimed in claim 1, wherein in step (1.3), the specific operation steps of performing data cleansing on the historical data stored in the HDFS distributed file system are as follows:
(1.3.1), field missing data: the mobile phone signaling data comprises an imsi mobile phone serial number, a timestamp and base station longitude and latitude, and records of missing field information are deleted;
(1.3.2), drift data: firstly, setting a threshold, calculating the distance and time difference between two base stations to obtain the user speed, comparing the user speed with the threshold, and if the user speed is greater than the threshold, indicating that the user does not leave the range of the current base station;
(1.3.3), repeating data: according to the repeated records of the longitude and latitude of the user mobile phone signaling, two records with earliest and latest retention time are reserved, namely the appearance time and the departure time of the user in the signal range of the base station, and the rest records are deleted;
(1.3.4), outlier data points: preprocessing is carried out by using a k-means clustering algorithm, non-outlier data are filtered, then outliers are detected in the rest data by using an LOF outlier detection algorithm, and the outliers are deleted.
4. The method for analyzing and extracting the moving track of the user based on the handset signaling as claimed in claim 1, wherein in step (1.4), the historical track of the user comprises imsi, location of the base station, and presence time and departure time of the user.
5. The method for analyzing and extracting a user's movement track based on mobile phone signaling as claimed in claim 1, wherein in step (1.6), the Kafka messaging system is a distributed high-throughput message publish-subscribe system for storing real-time data.
CN202011237478.8A 2020-11-09 2020-11-09 Method for analyzing and extracting user movement track based on mobile phone signaling Withdrawn CN112765120A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011237478.8A CN112765120A (en) 2020-11-09 2020-11-09 Method for analyzing and extracting user movement track based on mobile phone signaling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011237478.8A CN112765120A (en) 2020-11-09 2020-11-09 Method for analyzing and extracting user movement track based on mobile phone signaling

Publications (1)

Publication Number Publication Date
CN112765120A true CN112765120A (en) 2021-05-07

Family

ID=75693071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011237478.8A Withdrawn CN112765120A (en) 2020-11-09 2020-11-09 Method for analyzing and extracting user movement track based on mobile phone signaling

Country Status (1)

Country Link
CN (1) CN112765120A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114245314A (en) * 2021-12-17 2022-03-25 高创安邦(北京)技术有限公司 Personnel trajectory correction method and device, storage medium and electronic equipment
CN116822779A (en) * 2023-02-06 2023-09-29 长安大学 Expressway motor vehicle carbon emission calculation method based on mobile phone signaling data

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114245314A (en) * 2021-12-17 2022-03-25 高创安邦(北京)技术有限公司 Personnel trajectory correction method and device, storage medium and electronic equipment
CN114245314B (en) * 2021-12-17 2024-01-05 高创安邦(北京)技术有限公司 Personnel track correction method and device, storage medium and electronic equipment
CN116822779A (en) * 2023-02-06 2023-09-29 长安大学 Expressway motor vehicle carbon emission calculation method based on mobile phone signaling data

Similar Documents

Publication Publication Date Title
CN106931974B (en) Method for calculating personal commuting distance based on mobile terminal GPS positioning data record
CN106530716B (en) The method for calculating express highway section average speed based on mobile phone signaling data
CN108536851B (en) User identity recognition method based on moving track similarity comparison
CN112182410B (en) User travel mode mining method based on space-time track knowledge graph
CN110188093A (en) A kind of data digging system being directed to AIS information source based on big data platform
CN109446186B (en) Social relation judgment method based on movement track
CN109885643B (en) Position prediction method based on semantic track and storage medium
CN106156528B (en) A kind of track data stops recognition methods and system
CN109684384B (en) Trajectory data space-time density analysis system and analysis method thereof
CN107277765A (en) A kind of mobile phone signaling track preprocess method based on cluster Outlier Analysis
CN110457315A (en) A kind of group's accumulation mode analysis method and system based on user trajectory data
CN112013862B (en) Pedestrian network extraction and updating method based on crowdsourcing trajectory
CN108882172B (en) Indoor moving trajectory data prediction method based on HMM model
CN112765120A (en) Method for analyzing and extracting user movement track based on mobile phone signaling
CN105243148A (en) Checkin data based spatial-temporal trajectory similarity measurement method and system
CN111209457B (en) Target typical activity pattern deviation warning method
Li et al. Robust inferences of travel paths from GPS trajectories
CN108566620B (en) Indoor positioning method based on WIFI
CN111954160A (en) Method for converting two-dimensional mobile phone signaling data into three-dimensional space trajectory data
CN110275911A (en) Private car trip hotspot path method for digging based on Frequent Sequential Patterns
CN111475746B (en) Point-of-interest mining method, device, computer equipment and storage medium
CN116132923A (en) High-precision space-time track restoration method based on mobile phone signaling data
CN116403139A (en) Visual tracking and positioning method based on target detection
Moreira et al. The impact of data quality in the context of pedestrian movement analysis
CN110232067B (en) Co-generation group discovery method based on BHR-Tree index

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210507

WW01 Invention patent application withdrawn after publication