CN113613174A - Method, device and storage medium for identifying occupational sites based on mobile phone signaling data - Google Patents

Method, device and storage medium for identifying occupational sites based on mobile phone signaling data Download PDF

Info

Publication number
CN113613174A
CN113613174A CN202110778497.XA CN202110778497A CN113613174A CN 113613174 A CN113613174 A CN 113613174A CN 202110778497 A CN202110778497 A CN 202110778497A CN 113613174 A CN113613174 A CN 113613174A
Authority
CN
China
Prior art keywords
base station
user
grid
cluster
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110778497.XA
Other languages
Chinese (zh)
Inventor
蔡铭
杨颖坤
熊宸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202110778497.XA priority Critical patent/CN113613174A/en
Publication of CN113613174A publication Critical patent/CN113613174A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/18Network planning tools
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/20Services signaling; Auxiliary data signalling, i.e. transmitting data via a non-traffic channel

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a method, a device and a storage medium for identifying occupational sites based on mobile phone signaling data, wherein the method obtains a coverage area of a base station through mobile phone signaling data identification; then gridding the station coverage area to obtain a base station coverage grid; calculating the accumulated stay time of each grid in the base station coverage grid of the user; then, clustering the grids by using a Hartigan Leader clustering algorithm to obtain a cluster set; calculating the accumulated stay time of each class cluster in the class cluster set; extracting alternative position clusters according to the accumulated stay time of each class cluster in the class cluster set by the user; identifying the alternative job clusters as the working places or the residence places by dividing time windows; the invention can accurately, effectively and quickly identify the positions of the occupational sites of the users, and provides a method and data support for relevant applications such as urban development planning, traffic facility planning and the like. The method can be widely applied to the technical field of mobile phone signaling data mining.

Description

Method, device and storage medium for identifying occupational sites based on mobile phone signaling data
Technical Field
The invention relates to the technical field of mobile phone signaling data mining, in particular to a method, a device and a storage medium for identifying occupational sites based on mobile phone signaling data.
Background
With the development of mobile communication technology and the increase of the number of smart phone users, the mobile phone terminal becomes an effective data acquisition mode in user travel investigation, user distribution investigation and other researches. In order to charge the network usage of the user, analyze the user activity, etc., the mobile operator records the position and time of the user connecting with the base station when the user makes and receives calls, receives and sends messages, and connects with the network, thereby obtaining the mobile phone signaling data.
Due to the generation mechanism, the mobile phone signaling data has the advantages of strong real-time dynamic property, wide coverage range, low acquisition cost, high data updating frequency and the like, and can effectively extract the continuous time-space trip information of individuals in the city on a large scale. Compared with the prior art, the traditional research adopts questionnaire survey and other methods to count the resident position data, the data collection working cost is high, the survey time period is short, various new data sources including floating cars and video monitoring are more inclined to obtain traffic flow parameters rather than a human travel mode in the research of extracting resident travel characteristics, and therefore, the mobile phone signaling data has great advantages in researching resident position and travel characteristics.
The distribution of the occupations and residences is an important component of urban planning and traffic planning. The position distribution of mastering the working place and the living place of the residents is helpful for understanding urban spatial characteristics from the aspects of work and live balance, work and live separation, commuting behavior and the like, the urban land utilization is optimized, public traffic construction is reasonably guided, and the urban spatial operation performance of a large city can be optimized by analyzing commuting intensity distribution, urban central urban area spatial structure layout, operation characteristics and the like. The method has the advantages that a large amount of mobile phone signaling data are used for analyzing the distribution of the positions of the residents, large-scale and wide-coverage sample information can be obtained, the diversity of research objects and the integrity of regions are increased, meanwhile, the mobile phone signaling data are long in continuous investigation period, errors caused by the contingency and regular timeliness of the travel of the residents can be reduced, and therefore a set of effective method for identifying the positions of the residents by using the mobile phone signaling data is needed, and the position information of the residents is accurately obtained.
Disclosure of Invention
The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention provides a method, a device and a storage medium for identifying occupational sites based on mobile phone signaling data.
The technical scheme adopted by the invention is as follows:
in one aspect, an embodiment of the present invention includes a method for identifying a place of employment based on mobile phone signaling data, including:
acquiring mobile phone signaling data;
acquiring a base station set according to the mobile phone signaling data, wherein the base station set comprises all base stations containing mobile phone signaling data records;
identifying a base station coverage area according to the distribution of each base station and the surrounding base stations in the base station set by combining a cellular network model;
dividing the urban area into grids, and identifying the grids intersected with the coverage area of the base station as the coverage grids of the base station;
calculating the accumulated stay time of the user in each grid of the base station coverage grid;
based on the grid space position and the accumulated stay time of each grid in the base station coverage grid, clustering the grids by using a Hartigan Leader clustering algorithm to obtain a cluster set;
calculating the accumulated stay time of each class cluster in the class cluster set;
extracting alternative position clusters according to the accumulated stay time of each class cluster in the class cluster set by the user;
identifying the alternative job cluster as a place of employment or a place of residence by dividing a time window.
Further, the step of identifying the coverage area of the base station by combining a cellular network model according to the distribution of each base station and its surrounding base stations in the base station set includes:
searching a plurality of adjacent base stations closest to a target base station in the base station set to obtain an adjacent base station set, wherein the target base station is any one base station in the base station set;
calculating a first distance, wherein the first distance is an average value of distances between each adjacent base station in the adjacent base station set and the target base station;
calculating to obtain a coverage radius of the base station according to the first distance based on a cellular network model;
and with the target base station as a center, dividing the coverage radius of the base station to obtain a base station coverage area.
Further, the coverage radius of the base station is calculated by the following formula:
Figure BDA0003155235480000021
in the formula, rbsFor base station coverage radius, dbsIs the first pitch.
Further, the step of calculating the accumulated stay time of the user in each grid of the base station coverage grid comprises:
acquiring a job and live interest point set, wherein the job and live interest points comprise related attribute interest points of a workplace and related attribute interest points of a residence;
acquiring the number of the position interest points in each grid in the base station coverage grid according to the position interest point set;
and distributing the accumulated stay time of the user in the base station to the grids according to the number of the occupied interest points in each grid to obtain the accumulated stay time of the user in each grid.
Further, according to the number of the occupied interest points in each grid, the cumulative stay time of the user at the base station is allocated to the grids, and the obtained cumulative stay time of the user in each grid is executed by the following formula:
Figure BDA0003155235480000031
in the formula, TjFor the cumulative dwell time, T, of the user in grid jbsFor the accumulated stay time of the user at the base station BS, BSjSet of base stations for covering grid j, NjIs the number of points of interest, ∑, in grid jJ∈bs NJThe number of points of interest within the coverage area of base station bs.
Further, the step of calculating the accumulated stay time of the user in each cluster in the cluster set includes:
acquiring grids contained in each class cluster in the class cluster set;
calculating the accumulated stay time of the user in each grid;
and adding the accumulated stay time of the user in each grid to obtain the accumulated stay time of the user in each cluster.
Further, the step of extracting candidate position clusters according to the accumulated stay time of the user in each class cluster in the class cluster set includes:
sorting each cluster in the cluster set according to the accumulated stay time of a user in each cluster to obtain a sorted list;
and extracting alternative position clusters from the sorted list, wherein the alternative position clusters are the class clusters with the number of the front preset number or the class clusters with the number of the rear preset number in the sorted list.
Further, the step of identifying the candidate job cluster as a work place or a residence place by dividing the time window includes:
dividing a day into 24 time windows on average;
calculating the accumulated stay time of the user in the alternative job cluster under each time window;
acquiring a representative time window of the alternative job clusters according to the accumulated stay time of the user in the alternative job clusters under each time window;
acquiring the affiliated time period of the representative time window of the alternative job cluster;
and identifying the alternative job clusters as the work places or the residence places according to the affiliated periods of the alternative job cluster representative time windows.
On the other hand, the embodiment of the invention also comprises a position identification device based on the mobile phone signaling data, which comprises the following steps:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method for job site identification based on cell phone signaling data.
In another aspect, the embodiment of the present invention further includes a computer readable storage medium, on which a program executable by a processor is stored, and the program executable by the processor is used for implementing the accommodation identification method based on the mobile phone signaling data when being executed by the processor.
The invention has the beneficial effects that:
the invention provides a method for identifying occupational sites based on mobile phone signaling data, which comprises the steps of obtaining the mobile phone signaling data; acquiring a base station set according to the mobile phone signaling data, wherein the base station set comprises all base stations containing mobile phone signaling data records; identifying a base station coverage area according to the distribution of each base station and peripheral base stations in the base station set by combining a cellular network model; dividing the urban area into grids, and identifying the grids intersected with the coverage area of the base station as the coverage grids of the base station; calculating the accumulated stay time of each grid in the base station coverage grid of the user; based on the grid space position and the accumulated stay time of each grid in the base station coverage grid, clustering the grids by using a Hartigan Leader clustering algorithm to obtain a cluster-like set; calculating the accumulated stay time of each class cluster in the class cluster set; extracting alternative position clusters according to the accumulated stay time of each class cluster in the class cluster set by the user; identifying the alternative job clusters as the working places or the residence places by dividing time windows; the invention can accurately, effectively and quickly identify the positions of the occupational sites of the users, and provides a method and data support for relevant applications such as urban development planning, traffic facility planning and the like.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart illustrating steps of a method for identifying a place of employment based on mobile phone signaling data according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a cellular network coverage model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a staying location of a user within a certain time period according to an embodiment of the present invention;
FIG. 4 is a 3D interpolation plot of a user's daily grid dwell time, in accordance with an embodiment of the present invention;
FIG. 5 is a diagram illustrating a relationship between a clustering range and an average time ratio according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of the stay time of each time window of the user in the alternative place cluster according to the embodiment of the present invention;
fig. 7 is a schematic structural diagram of a place of employment identification apparatus based on mobile phone signaling data according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.
The embodiments of the present application will be further explained with reference to the drawings.
Referring to fig. 1, an embodiment of the present invention provides a method for identifying a place of employment based on mobile phone signaling data, including but not limited to the following steps:
s100, acquiring mobile phone signaling data;
s200, acquiring a base station set according to the mobile phone signaling data, wherein the base station set comprises all base stations containing mobile phone signaling data records;
s300, identifying a base station coverage area according to the distribution of each base station and peripheral base stations in the base station set and by combining a cellular network model;
s400, dividing the urban area into grids, and identifying the grids intersected with the coverage area of the base station as the coverage grids of the base station;
s500, calculating the accumulated stay time of each grid in the base station coverage grid of the user;
s600, based on the spatial position of the grids and the accumulated stay time of each grid of a user in a base station coverage grid, clustering the grids by using a Hartigan Leader clustering algorithm to obtain a cluster set;
s700, calculating the accumulated stay time of each class cluster in the class cluster set;
s800, extracting alternative position clusters according to the accumulated stay time of each class cluster in the class cluster set by the user;
and S900, identifying the alternative job clusters as the working places or the residential places by dividing the time window.
In this embodiment, the mobile phone signaling data refers to data generated during communication with the base station when a user uses a mobile phone, and the signaling data may be a record left when the user performs a call, sends information, connects to a network, switches to the base station, and the like, and generally records information such as a user number, a user age, a user gender, a connection base station number, and recording time. Referring to table 1, the data fields in the handset signaling data include: the method comprises the steps of user coding isdn, user gender sender, user age, connection base station number cid and recording timestamp time, wherein the timestamp format is yyymddhhmmss. Then, the mobile phone signaling data and the base station data are fused, specifically, the base station information table is referred, the base station position is matched based on the base station number cid, and the longitude lng and the base station latitude lat data of the user connection base station can be obtained.
Table 1 original mobile phone signaling data schematic table
isdn gender age time cid
4117*** M 30 20180715071145 23740
5701*** M 63 20180723053558 25815
5717*** M 20 20180620142020 19086
In this embodiment, in order to improve the data quality, after step S100, that is, after the mobile phone signaling data is obtained, the mobile phone signaling data is further preprocessed. The pretreatment comprises the following treatment processes:
(1) and (3) exception data processing: the abnormal data refers to records containing missing values, records with incorrect data formats and records with different time stamps, and the abnormal data is deleted;
(2) and (3) repeated data processing: the repeated data refers to that due to communication or transmission errors, the same recorded data is generated at the same time and the same position as the user, one piece of repeated data is reserved, and the rest of the data are deleted;
(3) co-location data processing: the co-location data refers to a large amount of data of co-location in a short time left by frequently connecting with the same base station when the user is in activity in the same area. Extracting the data, recording the serial number and longitude and latitude information of the connected base station, and calculating the difference (time) between the first recording time stamp of the user in the next connected base station and the first recording time stamp of the user in the current base stationlocation2-timelocation1) Calculating the Staying time of the user at the current base station, and recording the first record of the current positionThe same location data is merged with the timestamp of (1).
(4) And (3) processing drift data: the drift data refers to error data of remote drift of recorded positions when a user moves in a building dense area or a signal unstable area. Calculating the difference (time) between the recording time stamp of each piece of signaling data and the recording time stamp of the next piece of data2-time1) According to the longitude and latitude of the recording base station, the recording position distance between each piece of signaling data and the next piece of data is calculated1-2Calculating the user movement speed
Figure BDA0003155235480000061
If the moving speed of the user is greater than the upper limit threshold of the moving speed (120 km/h can be taken, and the maximum driving speed of the city is considered), the latter piece of data is considered as drift data and deleted;
(5) and (3) ping-pong data processing: ping-pong data refers to records left by a user continuously switching connections among a plurality of base stations while the user is moving at the base station boundary. Identifying data switched back and forth between a user and a base station in a short time as a group of ping-pong data, and calculating the most frequently connected base station number cid in the group of ping-pong datafrequentKeeping the record of the base station as cid in the group of datafrequentDeleting the rest data;
in this embodiment, the format of the preprocessed mobile phone signaling data is shown in table 2.
Table 2 schematic table of preprocessed mobile phone signaling data
Figure BDA0003155235480000062
In this embodiment, as can be seen from table 2, since the mobile phone signaling data includes the base station information, a base station set can be obtained according to the mobile phone signaling data, where the base station set includes all base stations including the mobile phone signaling data records.
In this embodiment, the step S300 of identifying the coverage area of the base station according to the distribution of each base station in the base station set and its surrounding base stations and by combining the cellular network model includes:
s301, searching a plurality of adjacent base stations closest to a target base station in a base station set to obtain an adjacent base station set, wherein the target base station is any one base station in the base station set;
s302, calculating a first distance, wherein the first distance is an average value of the distances between each adjacent base station in the adjacent base station set and the target base station;
s303, calculating to obtain a coverage radius of the base station according to the first interval based on the cellular network model;
and S304, with the target base station as a center, dividing the coverage radius of the base station to obtain a base station coverage area.
In this embodiment, for each base station BS in the base station set BS with the signaling record, four neighboring base stations closest to the base station BS are searched, and the neighboring base station set BS recorded as the base station BSadjacent(ii) a Then based on the adjacent base station set bsadjacentThe station spacing of each neighboring base station from base station bs, and the ideal station spacing (i.e., the first spacing) and the coverage radius of the base station are calculated. The calculation formula of the ideal station spacing of the base station is as follows:
Figure BDA0003155235480000071
the calculation formula of the coverage radius of the base station is as follows:
Figure BDA0003155235480000072
ideal base station spacing dbsIs a set bs of adjacent base stationsadjacentAverage of the inter-site distances of each neighboring base station to base station bs, where diIndicates the distance between the ith adjacent base station and the base station bs, the coverage radius r of the base stationbsAnd calculating the ideal station distance of the base station based on the cellular network model. And then, taking the base station as a center, and dividing the coverage area of the base station through the coverage radius of the base station.
Specifically, most of domestic base stations are 3-sector directional stations, the base station setting is based on a cellular network coverage model (as shown in fig. 2), and ideally, if the coverage radius of a certain base station is r, the ideal inter-station distance d is 1.5 r. The model is modified for realistic base station distribution patterns and location data,searching four base stations with the nearest distance of each base station bs, and recording the four base stations as an adjacent base station set bs of the base stations bsadjacentCalculating the ideal distance d between base stations based on the distances between four adjacent base stations and base station bsbsThen, the coverage radius of the base station is calculated
Figure BDA0003155235480000073
To determine the base station coverage area.
In this embodiment, in step S400, after the coverage area of the base station is obtained, the coverage area of the base station is further subjected to gridding processing. Specifically, a city is divided into a 100 m × 100 m grid, and then a grid intersecting with a base station coverage area is identified as a base station coverage grid. The area is divided into a mesh area having a size of about 100 m × 100 m in consideration of spatial distribution of the base stations. The base station coverage area is represented by a grid: if the coverage area of the base station is overlapped with the grid, the grid is considered to be covered by the base station.
In this embodiment, step S500, that is, the step of calculating the accumulated staying time of the user in each cell of the cell covered by the base station, includes:
s501, acquiring a job and dwelling interest point set, wherein the job and dwelling interest points comprise relevant attribute interest points of a workplace and relevant attribute interest points of a dwelling place;
s502, acquiring the number of position interest points in each grid in a base station coverage grid according to the position interest point set;
s503, distributing the accumulated stay time of the user in the base station to grids according to the number of the occupied interest points in each grid, and obtaining the accumulated stay time of the user in each grid.
In this embodiment, the relevant attribute interest points of the workplace and the relevant attribute interest points of the residential site are screened to form a working and residential site interest point set, the working and residential site interest points in each grid are screened, the accumulated residence time of the user in the base station is allocated to the grid according to the number of the interest points in the grid, and if the grid is covered by a plurality of base stations, the residence time allocated by each base station is accumulated, that is, the accumulated residence time of the grid of the user is calculated:
Figure BDA0003155235480000081
in the formula, TjFor the cumulative dwell time, T, of the user in grid jbsFor the accumulated stay time of the user at the base station BS, BSjSet of base stations for covering grid j, NjIs the number of points of interest, ∑, in grid jJ∈bs NJThe number of points of interest within the coverage area of base station bs.
When the job and live interest points are extracted, the relevant attributes of the workplace comprise: automotive services, catering services, shopping services, lifestyle services, sports and leisure services, lodging services, healthcare services, government agencies, science and education culture services, financial institutions and enterprises, and residence-related attributes include: commercial homes and residential areas.
Specifically, interest point information in the area is crawled through a network map, and interest point data related to a working place and a residence place are recorded. The point of interest refers to data representing a characteristic location on a map, which is closely related to activities of residents, including buildings, stations, signs, etc., and the point of interest data field includes a point of interest number POI _ ID, a point of interest longitude POI _ lng, a point of interest latitude POI _ lat, and a point of interest type POI _ type, and some of the point of interest data are shown in table 3.
TABLE 3 Point of interest data schematic
POI_ID POI_lng POI_lat POI_type
B02F5077VF 113.210716 22.875081 Cultural service
BOFFL1FT5P 112.880293 23.180195 Company enterprise
BOFFKUJWAO 113.002889 23.240415 Company enterprise
Then, the number of interest points in each grid is calculated according to the position data of the interest points. Aiming at each signaling record, extracting the stay time and a grid set covered by a connecting base station, calculating the number ratio of interest points of each grid, and distributing the stay time to the grids in proportion; and accumulating the distributed stay time for each grid respectively to obtain the stay time of the grid. Referring to fig. 3, fig. 3 is a schematic diagram of a staying position of a user in a certain time period, the user is connected with a base station a and a base station B in the certain time period, the coverage range of the user is shown by black boxes, and dots in a grid represent distribution positions of interest points. As shown in fig. 3, grid j1Covered by base station A and base station B at the same time, there are 7, 3 points of interest in the coverage area of base station A, B, assuming that the stay time of the user at base station A, B is 420s and 135s, respectively, the user is in grid j1The residence time of (a) was 105 s; the calculation process is
Figure BDA0003155235480000091
Figure BDA0003155235480000092
In this embodiment, regarding step S600, the grids with longer accumulated retention time are mainly focused, the number of clusters is not preset, and the grids are clustered into main active clusters, which includes the following specific steps:
(1) respectively aiming at each user, sequencing grids j in a stay grid set I according to accumulated stay time of the grids and descending order, selecting the grid with the first sequencing as an initial cluster center c, marking as an accessed grid, and recording the serial number of a cluster to which the accessed grid belongs;
(2) dividing a cluster coverage Area by taking a cluster center c as a circle center and a radius of 150 meters;
(3) sequentially accessing the next grid, if the center of the grid is positioned in the Area range, marking the grid as the accessed grid, recording the serial number of the cluster to which the grid belongs, correcting the center c of the cluster as the center of the grid in the cluster by taking the accumulated retention time as the weight, and re-dividing the coverage Area of the cluster by taking the new center c of the cluster as the center of a circle and 150 meters as the radius;
(4) repeating the step (3) until all grids are traversed once;
(5) extracting all grids which are not marked as accessed grids, selecting the first grid in the sequence as a new cluster center c, marking as the accessed grids, and recording the serial numbers of the clusters to which the grids belong;
(6) and (5) repeating the steps (2) to (5) until all grids are marked as accessed and the cluster to which the record belongs.
Specifically, since a user having a fixed place of employment spends most of his time moving in the vicinity of his place of employment, the grid in the vicinity of the place of employment shows a high dwell time of aggregation, as shown in fig. 4, which is a 3D interpolation graph of the dwell time of the grid of a day for a certain user, and in the vicinity of his place of employment and place of residence, there are double peaks of dwell time.
And processing the data of all users, extracting the residence time of each user in the cluster with the longest residence time when the clustering range parameters are set to different values, calculating the proportion of the residence time of the cluster to the total time, and calculating the average value of the residence time of all the users. As shown in fig. 5, as the clustering range parameter increases, the average time ratio increases, and when the clustering range parameter reaches 150 meters, the average time ratio increases greatly and then tends to be stable, which means that 150 meters is the moving range of a general user in a certain staying area, and the obtained clustering range parameter is 150 meters.
And then clustering the data of each user for multiple days by using a Hartigan Leader algorithm according to the clustering range parameters to obtain the active area clustering result of the user.
In this embodiment, step S700, namely the step of calculating the accumulated staying time of each cluster in the cluster set of the user, includes:
s701, acquiring grids contained in each class cluster in a class cluster set;
s702, calculating the accumulated stay time of the user in each grid;
and S703, adding the accumulated stay time of the user in each grid to obtain the accumulated stay time of the user in each class cluster.
In this embodiment, after the cumulative staying time of the user in each class cluster is obtained through calculation, the step S800 is executed, that is, the step of extracting the candidate residing cluster according to the cumulative staying time of the user in each class cluster in the class cluster set includes:
s801, sorting each cluster in the cluster set according to the accumulated stay time of each cluster by a user to obtain a sorted list;
s802, extracting alternative position clusters from the sorted list, wherein the alternative position clusters are the class clusters with the number being preset in the front or the class clusters with the number being preset in the back in the sorted list.
In this embodiment, the total accumulated stay time of the grids in each cluster is calculated and recorded as the accumulated stay time of the cluster, the clusters are sorted in a descending manner according to the accumulated stay time of the cluster, the cluster with the top two positions in the sorting is selected and marked as an alternative working stay cluster, and other clusters are marked as general active clusters.
In this embodiment, regarding step S900, that is, the step of identifying the candidate job cluster as the work place or the residence place by dividing the time window includes:
s901, equally dividing one day into 24 time windows;
s902, calculating the accumulated stay time of the user in the alternative job cluster under each time window;
s903, acquiring a representative time window of the alternative job clusters according to the accumulated stay time of the user in the alternative job clusters under each time window;
s904, acquiring the belonged time period of the candidate job cluster representative time window;
s905, according to the period of the alternative job cluster representative time window, identifying the alternative job cluster as a work place or a residence place.
In this embodiment, by dividing the time window, the job attributes of the candidate job clusters are marked, and the specific steps are as follows:
(1) averagely dividing one day into 24 time windows, respectively aiming at two alternative job clusters of a user, calculating the stay time of the user in the 24 time windows in the alternative job clusters, and selecting the time window with the longest stay time as a representative time window of the alternative job clusters;
(2) if the period of the representative time window of one alternative job cluster of the user is within the time range of 9:00-18:00, and the period of the representative time window of the other alternative job cluster is within the time range of 18:00-9:00, marking the previous alternative job cluster as a job cluster, and marking the next alternative job cluster as a living cluster, otherwise, marking the previous alternative job cluster as a living cluster, and marking the next alternative job cluster as a working cluster, and if the two cases do not occur, marking the user as a user without fixed job.
Specifically, the daily evaluation is divided into 24 time windows, data of base stations of the user connected with the two alternative working clusters are respectively screened, the staying time of the user in the two alternative working clusters in each time window is calculated, and the time windows are sorted according to the staying time. Fig. 6 is a schematic diagram of the dwell time of a user within their alternative occupational cluster for each time window. If the period of the representative time window of one alternative job cluster of the user is within the time range of 9:00-18:00, and the period of the representative time window of the other alternative job cluster is within the time range of 18:00-9:00, marking the previous alternative job cluster as a job cluster, and marking the next alternative job cluster as a living cluster, otherwise, marking the previous alternative job cluster as a living cluster, and marking the next alternative job cluster as a working cluster, and if the two cases do not occur, marking the user as a user without fixed job.
The method for identifying the occupational sites based on the mobile phone signaling data has the following technical effects:
the embodiment of the invention provides a position identification method based on mobile phone signaling data, which comprises the steps of obtaining the mobile phone signaling data; acquiring a base station set according to the mobile phone signaling data, wherein the base station set comprises all base stations containing mobile phone signaling data records; identifying a base station coverage area according to the distribution of each base station and peripheral base stations in the base station set by combining a cellular network model; dividing the urban area into grids, and identifying the grids intersected with the coverage area of the base station as the coverage grids of the base station; calculating the accumulated stay time of each grid in the base station coverage grid of the user; based on the grid space position and the accumulated stay time of each grid in the base station coverage grid, clustering the grids by using a Hartigan Leader clustering algorithm to obtain a cluster-like set; calculating the accumulated stay time of each class cluster in the class cluster set; extracting alternative position clusters according to the accumulated stay time of each class cluster in the class cluster set by the user; identifying the alternative job clusters as the working places or the residence places by dividing time windows; the invention can accurately, effectively and quickly identify the positions of the occupational sites of the users, and provides a method and data support for relevant applications such as urban development planning, traffic facility planning and the like.
Referring to fig. 7, an embodiment of the present invention further provides a device 200 for identifying a place of employment based on mobile phone signaling data, which specifically includes:
at least one processor 210;
at least one memory 220 for storing at least one program;
when executed by the at least one processor 210, causes the at least one processor 210 to implement the method as shown in fig. 1.
The memory 220, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs and non-transitory computer-executable programs. The memory 220 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 220 may optionally include remote memory located remotely from processor 210, and such remote memory may be connected to processor 210 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
It will be understood that the device structure shown in fig. 7 does not constitute a limitation of device 200, and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.
In the apparatus 200 shown in fig. 7, the processor 210 may retrieve the program stored in the memory 220 and execute, but is not limited to, the steps of the embodiment shown in fig. 1.
The above-described embodiments of the apparatus 200 are merely illustrative, and the units illustrated as separate components may or may not be physically separate, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purposes of the embodiments.
Embodiments of the present invention also provide a computer-readable storage medium, which stores a program executable by a processor, and the program executable by the processor is used for implementing the method shown in fig. 1 when being executed by the processor.
The embodiment of the application also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.
It will be understood that all or some of the steps, systems of methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims (10)

1. A method for identifying a place of employment based on mobile phone signaling data is characterized by comprising the following steps:
acquiring mobile phone signaling data;
acquiring a base station set according to the mobile phone signaling data, wherein the base station set comprises all base stations containing mobile phone signaling data records;
identifying a base station coverage area according to the distribution of each base station and the surrounding base stations in the base station set by combining a cellular network model;
dividing the urban area into grids, and identifying the grids intersected with the coverage area of the base station as the coverage grids of the base station;
calculating the accumulated stay time of the user in each grid of the base station coverage grid;
based on the grid space position and the accumulated stay time of each grid in the base station coverage grid, clustering the grids by using a Hartigan Leader clustering algorithm to obtain a cluster set;
calculating the accumulated stay time of each class cluster in the class cluster set;
extracting alternative position clusters according to the accumulated stay time of each class cluster in the class cluster set by the user;
identifying the alternative job cluster as a place of employment or a place of residence by dividing a time window.
2. The method of claim 1, wherein the step of identifying the coverage area of the base station according to the distribution of each base station in the base station set and its surrounding base stations and the cellular network model comprises:
searching a plurality of adjacent base stations closest to a target base station in the base station set to obtain an adjacent base station set, wherein the target base station is any one base station in the base station set;
calculating a first distance, wherein the first distance is an average value of distances between each adjacent base station in the adjacent base station set and the target base station;
calculating to obtain a coverage radius of the base station according to the first distance based on a cellular network model;
and with the target base station as a center, dividing the coverage radius of the base station to obtain a base station coverage area.
3. The method as claimed in claim 2, wherein the coverage radius of the base station is calculated by the following formula:
Figure FDA0003155235470000011
in the formula, rbsFor base station coverage radius, dbsIs the first pitch.
4. The method of claim 1, wherein the step of calculating the cumulative residence time of the user in each cell of the cell covered by the base station comprises:
acquiring a job and live interest point set, wherein the job and live interest points comprise related attribute interest points of a workplace and related attribute interest points of a residence;
acquiring the number of the position interest points in each grid in the base station coverage grid according to the position interest point set;
and distributing the accumulated stay time of the user in the base station to the grids according to the number of the occupied interest points in each grid to obtain the accumulated stay time of the user in each grid.
5. The method as claimed in claim 4, wherein the step of allocating the accumulated stay time of the user in the base station to the grids according to the number of the points of interest in the position of the user in each grid is performed by the following formula:
Figure FDA0003155235470000021
in the formula, TjFor the cumulative dwell time, T, of the user in grid jbsFor the accumulated stay time of the user at the base station BS, BSjSet of base stations for covering grid j, NjIs the number of points of interest, ∑, in grid jJ∈bsNJThe number of points of interest within the coverage area of base station bs.
6. The method according to claim 1, wherein the step of calculating the accumulated stay time of the user in each of the cluster sets comprises:
acquiring grids contained in each class cluster in the class cluster set;
calculating the accumulated stay time of the user in each grid;
and adding the accumulated stay time of the user in each grid to obtain the accumulated stay time of the user in each cluster.
7. The method according to claim 6, wherein the step of extracting candidate job clusters according to the accumulated stay duration of the user in each of the cluster classes comprises:
sorting each cluster in the cluster set according to the accumulated stay time of a user in each cluster to obtain a sorted list;
and extracting alternative position clusters from the sorted list, wherein the alternative position clusters are the class clusters with the number of the front preset number or the class clusters with the number of the rear preset number in the sorted list.
8. The method for identifying a place of employment based on mobile phone signaling data as claimed in claim 1, wherein the step of identifying the candidate place of employment as a place of employment or a place of residence by dividing the time window comprises:
dividing a day into 24 time windows on average;
calculating the accumulated stay time of the user in the alternative job cluster under each time window;
acquiring a representative time window of the alternative job clusters according to the accumulated stay time of the user in the alternative job clusters under each time window;
acquiring the affiliated time period of the representative time window of the alternative job cluster;
and identifying the alternative job clusters as the work places or the residence places according to the affiliated periods of the alternative job cluster representative time windows.
9. A mobile signaling data-based job site identification apparatus, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1-8.
10. Computer-readable storage medium, on which a processor-executable program is stored, which, when being executed by a processor, is adapted to carry out the method according to any one of claims 1-8.
CN202110778497.XA 2021-07-09 2021-07-09 Method, device and storage medium for identifying occupational sites based on mobile phone signaling data Pending CN113613174A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110778497.XA CN113613174A (en) 2021-07-09 2021-07-09 Method, device and storage medium for identifying occupational sites based on mobile phone signaling data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110778497.XA CN113613174A (en) 2021-07-09 2021-07-09 Method, device and storage medium for identifying occupational sites based on mobile phone signaling data

Publications (1)

Publication Number Publication Date
CN113613174A true CN113613174A (en) 2021-11-05

Family

ID=78304312

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110778497.XA Pending CN113613174A (en) 2021-07-09 2021-07-09 Method, device and storage medium for identifying occupational sites based on mobile phone signaling data

Country Status (1)

Country Link
CN (1) CN113613174A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114036623A (en) * 2021-11-19 2022-02-11 清华大学 Graphic design method based on constructed space human factor data
CN115002680A (en) * 2022-07-28 2022-09-02 北京融信数联科技有限公司 Crowd occupation type acquisition method, system and storage medium based on mobile phone signaling
CN115086879A (en) * 2022-08-22 2022-09-20 广州市城市规划勘测设计研究院 Method, device and equipment for identifying passenger flow characteristics and connection mode of rail transit station
CN116128128A (en) * 2023-01-17 2023-05-16 北京融信数联科技有限公司 Urban job-living balance prediction method, system and medium based on intelligent agent map
CN117202106A (en) * 2023-10-19 2023-12-08 北京融信数联科技有限公司 Regional space place attribute labeling method, system and medium based on signaling data
CN117336683A (en) * 2023-12-01 2024-01-02 北京航空航天大学 Method and system for identifying typical stay of large-scale personnel based on signaling data
CN117671965A (en) * 2024-02-02 2024-03-08 北京大也智慧数据科技服务有限公司 Data processing method, device, equipment and storage medium
CN114036623B (en) * 2021-11-19 2024-05-28 清华大学 Graphic design method based on artificial factor data of built-up space

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105142106A (en) * 2015-07-29 2015-12-09 西南交通大学 Traveler home-work location identification and trip chain depicting method based on mobile phone signaling data
CN106464706A (en) * 2014-04-18 2017-02-22 意大利电信股份公司 Method and system for identifying significant locations through data obtainable from telecommunication network
CN106792514A (en) * 2016-11-30 2017-05-31 南京华苏科技有限公司 User's duty residence analysis method based on signaling data
CN106897420A (en) * 2017-02-24 2017-06-27 东南大学 A kind of resident Activity recognition method of user's trip based on mobile phone signaling data
CN109688532A (en) * 2017-10-16 2019-04-26 中移(苏州)软件技术有限公司 A kind of method and device dividing city function region
CN110324787A (en) * 2019-06-06 2019-10-11 东南大学 A kind of duty residence acquisition methods of mobile phone signaling data
CN110990443A (en) * 2019-10-28 2020-04-10 上海城市交通设计院有限公司 Mobile phone signaling-based professional and living population characteristic estimation method
CN111770452A (en) * 2020-05-27 2020-10-13 中山大学 Mobile phone signaling stop point identification method based on personal travel track characteristics
CN112579718A (en) * 2020-12-14 2021-03-30 深圳市城市交通规划设计研究中心股份有限公司 Urban land function identification method and device and terminal equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106464706A (en) * 2014-04-18 2017-02-22 意大利电信股份公司 Method and system for identifying significant locations through data obtainable from telecommunication network
CN105142106A (en) * 2015-07-29 2015-12-09 西南交通大学 Traveler home-work location identification and trip chain depicting method based on mobile phone signaling data
CN106792514A (en) * 2016-11-30 2017-05-31 南京华苏科技有限公司 User's duty residence analysis method based on signaling data
CN106897420A (en) * 2017-02-24 2017-06-27 东南大学 A kind of resident Activity recognition method of user's trip based on mobile phone signaling data
CN109688532A (en) * 2017-10-16 2019-04-26 中移(苏州)软件技术有限公司 A kind of method and device dividing city function region
CN110324787A (en) * 2019-06-06 2019-10-11 东南大学 A kind of duty residence acquisition methods of mobile phone signaling data
CN110990443A (en) * 2019-10-28 2020-04-10 上海城市交通设计院有限公司 Mobile phone signaling-based professional and living population characteristic estimation method
CN111770452A (en) * 2020-05-27 2020-10-13 中山大学 Mobile phone signaling stop point identification method based on personal travel track characteristics
CN112579718A (en) * 2020-12-14 2021-03-30 深圳市城市交通规划设计研究中心股份有限公司 Urban land function identification method and device and terminal equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
何美惠: "基于运营商日志数据的用户行为分析", 《北京交通大学》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114036623A (en) * 2021-11-19 2022-02-11 清华大学 Graphic design method based on constructed space human factor data
CN114036623B (en) * 2021-11-19 2024-05-28 清华大学 Graphic design method based on artificial factor data of built-up space
CN115002680A (en) * 2022-07-28 2022-09-02 北京融信数联科技有限公司 Crowd occupation type acquisition method, system and storage medium based on mobile phone signaling
CN115002680B (en) * 2022-07-28 2022-12-27 北京融信数联科技有限公司 Crowd occupation type obtaining method and system based on mobile phone signaling and storage medium
CN115086879A (en) * 2022-08-22 2022-09-20 广州市城市规划勘测设计研究院 Method, device and equipment for identifying passenger flow characteristics and connection mode of rail transit station
CN115086879B (en) * 2022-08-22 2022-12-16 广州市城市规划勘测设计研究院 Method, device and equipment for identifying passenger flow characteristics and connection mode of rail transit station
CN116128128A (en) * 2023-01-17 2023-05-16 北京融信数联科技有限公司 Urban job-living balance prediction method, system and medium based on intelligent agent map
CN117202106A (en) * 2023-10-19 2023-12-08 北京融信数联科技有限公司 Regional space place attribute labeling method, system and medium based on signaling data
CN117202106B (en) * 2023-10-19 2024-05-14 北京融信数联科技有限公司 Regional space place attribute labeling method, system and medium based on signaling data
CN117336683A (en) * 2023-12-01 2024-01-02 北京航空航天大学 Method and system for identifying typical stay of large-scale personnel based on signaling data
CN117336683B (en) * 2023-12-01 2024-02-13 北京航空航天大学 Method and system for identifying typical stay of large-scale personnel based on signaling data
CN117671965A (en) * 2024-02-02 2024-03-08 北京大也智慧数据科技服务有限公司 Data processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113613174A (en) Method, device and storage medium for identifying occupational sites based on mobile phone signaling data
CN106912015B (en) Personnel trip chain identification method based on mobile network data
CN104239556B (en) Adaptive trajectory predictions method based on Density Clustering
CN110324787B (en) Method for acquiring occupational sites of mobile phone signaling data
CN107040894A (en) A kind of resident trip OD acquisition methods based on mobile phone signaling data
CN110213724B (en) Pseudo base station motion trajectory identification method
CN107872808B (en) WLAN station address prediction analysis method and device
CN110418287B (en) Population residence migration identification method based on mobile phone signaling
CN106156528B (en) A kind of track data stops recognition methods and system
CN109688532B (en) Method and device for dividing city functional area
CN110134865B (en) Commuting passenger social contact recommendation method and platform based on urban public transport trip big data
CN103026378A (en) Aggregating demographic distribution information
CN110856186B (en) Method and system for constructing wireless network knowledge graph
JP7210086B2 (en) AREA DIVISION METHOD AND DEVICE, ELECTRONIC DEVICE AND PROGRAM
CN112506972B (en) User resident area positioning method and device, electronic equipment and storage medium
CN110990443A (en) Mobile phone signaling-based professional and living population characteristic estimation method
Vajakas et al. Trajectory reconstruction from mobile positioning data using cell-to-cell travel time information
Pu et al. Visual analysis of people's mobility pattern from mobile phone data
Jacques Mobile phone metadata for development
CN110933601B (en) Target area determination method, device, equipment and medium
CN116017333A (en) Population identification method, system and storage medium based on big data signaling processing
Tsumura et al. Examining potentials and practical constraints of mobile phone data for improving transport planning in developing countries
Mellegård Obtaining Origin/Destination-matrices from cellular network data
CN114007186B (en) Positioning method and related product
CN115374374A (en) Method, device and equipment for acquiring population sample expansion coefficient

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211105

RJ01 Rejection of invention patent application after publication