CN115002680A - Crowd occupation type acquisition method, system and storage medium based on mobile phone signaling - Google Patents

Crowd occupation type acquisition method, system and storage medium based on mobile phone signaling Download PDF

Info

Publication number
CN115002680A
CN115002680A CN202210895704.4A CN202210895704A CN115002680A CN 115002680 A CN115002680 A CN 115002680A CN 202210895704 A CN202210895704 A CN 202210895704A CN 115002680 A CN115002680 A CN 115002680A
Authority
CN
China
Prior art keywords
signaling
mobile phone
base station
user
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210895704.4A
Other languages
Chinese (zh)
Other versions
CN115002680B (en
Inventor
于笑博
成立立
杨占军
张广志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beiling Rongxin Datalnfo Science and Technology Ltd
Original Assignee
Beiling Rongxin Datalnfo Science and Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beiling Rongxin Datalnfo Science and Technology Ltd filed Critical Beiling Rongxin Datalnfo Science and Technology Ltd
Priority to CN202210895704.4A priority Critical patent/CN115002680B/en
Publication of CN115002680A publication Critical patent/CN115002680A/en
Application granted granted Critical
Publication of CN115002680B publication Critical patent/CN115002680B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/023Services making use of location information using mutual or relative location information between multiple location based services [LBS] targets or of distance thresholds
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/20Services signaling; Auxiliary data signalling, i.e. transmitting data via a non-traffic channel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/35Services specially adapted for particular environments, situations or purposes for the management of goods or merchandise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W64/00Locating users or terminals or network equipment for network management purposes, e.g. mobility management
    • H04W64/006Locating users or terminals or network equipment for network management purposes, e.g. mobility management with additional information processing, e.g. for direction or speed determination
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a crowd occupation type acquisition method, a system and a storage medium based on mobile phone signaling, wherein the method comprises the following steps: acquiring base station data corresponding to each user based on user signaling data, wherein the base station data comprises signaling residence time and base station longitude and latitude; acquiring a target difference value of the longitude and latitude of the base station of each user mobile phone in a natural day based on different signaling stay time lengths; acquiring calling data and called data of a user mobile phone in a preset time period, and counting the number of times of calls and the number of positions; clustering the target difference, the call times and the position number to obtain a clustering result; and acquiring the occupation type of the crowd based on the clustering result matching type constraint condition. The invention distinguishes and divides the occupation types of the user group by acquiring the mobile phone signaling data of the specific user group, can accurately reflect the occupation types in continuous time periods, does not need the space position of the mobile phone user at a time point, and provides possibility for qualitatively judging the occupation types of the mobile phone users.

Description

Crowd occupation type obtaining method and system based on mobile phone signaling and storage medium
Technical Field
The invention relates to the technical field of big data analysis, in particular to a crowd occupation type acquisition method, a crowd occupation type acquisition system and a storage medium based on mobile phone signaling.
Background
With the rapid development of mobile communication business in China, mobile communication tools become indispensable tools. The jobs with the same property often have common characteristics, can be classified into one category of jobs according to the common characteristics, is beneficial to the classification management of the staff teams by the country, and adopts corresponding management methods such as recording, allocating, examining, training, awarding and punishing and the like according to different job characteristics and working requirements, so that the management is more pertinent. The career classification respectively determines the work responsibility and the career quality required by the performance responsibility and the completion of the work for each career, which provides a basis for the position responsibility system.
Modern occupational classification is the product of the industrial revolution and is also a reflection of modern humanistic spirit. The objectivity and scientificity of occupational classification gradually replace the built and graded nature inherent in traditional social occupational classification. The career classification reflects not only the external characteristics (social desirability characteristics) of the career but also the internal characteristics (individual development characteristics) of the career.
Disclosure of Invention
The invention aims to provide a method, a system and a storage medium for acquiring the occupation types of crowds based on mobile phone signaling.
The invention provides a crowd occupation type obtaining method based on mobile phone signaling in a first aspect, which comprises the following steps:
acquiring base station data corresponding to each user based on user signaling data, wherein the base station data comprises signaling dwell time and base station longitude and latitude;
acquiring a target difference value of the longitude and latitude of the base station of each user mobile phone in a natural day based on different signaling stay time lengths;
acquiring calling data and called data of a user mobile phone in a preset time period, grouping based on a mobile phone number and a signaling identifier, and counting the number of times of calls and the number of positions;
clustering the target difference, the call times and the position number to obtain a clustering result;
and acquiring the occupation type of the crowd based on the clustering result matched with a preset type constraint condition.
In this scheme, the acquiring base station data corresponding to each user based on the user signaling data specifically includes:
acquiring the signaling data based on a user mobile phone operator;
acquiring base station sector information switched by the IMSI of a user to obtain the retention time of a mobile phone signaling and a base station list;
performing time cutting grouping on the mobile phone signaling residence time to obtain the signaling residence time;
and associating a preset reference value to obtain the longitude and latitude of the base station based on the base station list, wherein the reference value comprises a Lac value and a Ci value, the Lac value represents a network system of a user mobile phone, and the Ci value represents a base station number.
In this scheme, the obtaining a target difference of the longitude and latitude of the base station of each user mobile phone within a natural day based on the different signaling dwell time specifically includes:
extracting the mobile phone signaling of the user corresponding to the retention time of each group of signaling;
acquiring the longitude and latitude of the corresponding base station based on the mobile phone signaling;
and extracting the maximum base station longitude and latitude and the minimum base station longitude and latitude corresponding to the user in the natural day, and performing difference to obtain the target difference value.
In this scheme, the obtaining of the calling data and the called data of the user mobile phone within the preset time period, grouping based on the mobile phone number and the signaling identifier, and counting the number of times of calls and the number of positions specifically include:
obtaining the number of calls corresponding to the mobile phone number based on the calling data and the called data;
and obtaining the number of times of changing the position of the mobile phone signaling in the signaling residence time to obtain the position number.
In this scheme, the clustering the target difference, the number of times of call, and the number of positions to obtain a clustering result specifically includes:
performing data clustering based on a preset clustering mode, wherein the clustering mode comprises k-means clustering;
and clustering the three types of data including the target difference value, the call times and the position number to obtain the corresponding clustering result.
In this scheme, the obtaining of the crowd occupation type based on the clustering result matching with the preset type constraint condition specifically includes:
matching preset type constraint conditions based on the clustering result, wherein the type constraint conditions comprise three preset professional fields which are respectively a courier, a network appointment driver and a bus driver;
the crowd occupation type of the clustering result matched with the first constraint condition is the courier;
the crowd occupation type of the clustering result matched with a second constraint condition is the net car appointment driver;
and the occupation type of the crowd of which the clustering result is matched with the third constraint condition is the bus driver.
The second aspect of the present invention further provides a system for acquiring occupation types of people based on mobile phone signaling, which includes a memory and a processor, wherein the memory includes a program of a method for acquiring occupation types of people based on mobile phone signaling, and when executed by the processor, the program of the method for acquiring occupation types of people based on mobile phone signaling implements the following steps:
acquiring base station data corresponding to each user based on user signaling data, wherein the base station data comprises signaling dwell time and base station longitude and latitude;
acquiring a target difference value of the longitude and latitude of the base station of each user mobile phone in a natural day based on different signaling stay time lengths;
obtaining calling data and called data of a user mobile phone in a preset time period, grouping the data based on the mobile phone number and the signaling identifier, and counting the number of calls and the number of positions;
clustering the target difference, the call times and the position number to obtain a clustering result;
and acquiring the occupation type of the crowd based on the clustering result matched with a preset type constraint condition.
In this scheme, the acquiring base station data corresponding to each user based on the user signaling data specifically includes:
acquiring the signaling data based on a user mobile phone operator;
acquiring base station sector information switched by the IMSI of a user to obtain the retention time of a mobile phone signaling and a base station list;
carrying out time cutting grouping on the mobile phone signaling residence time to obtain the signaling residence time;
and associating a preset reference value to obtain the longitude and latitude of the base station based on the base station list, wherein the reference value comprises a Lac value and a Ci value, the Lac value represents a network system of a user mobile phone, and the Ci value represents a base station number.
In this scheme, the obtaining a target difference of the longitude and latitude of the base station of each user mobile phone within a natural day based on the different signaling dwell time specifically includes:
extracting the mobile phone signaling of the user corresponding to the retention time of each group of signaling;
acquiring the longitude and latitude of the corresponding base station based on the mobile phone signaling;
and extracting the longitude and latitude of the maximum base station and the longitude and latitude of the minimum base station corresponding to the user in the natural day, and subtracting to obtain the target difference value.
In this scheme, the obtaining of the calling data and the called data of the user mobile phone within the preset time period, grouping based on the mobile phone number and the signaling identifier, and counting the number of times of calls and the number of positions specifically include:
obtaining the number of calls corresponding to the mobile phone number based on the calling data and the called data;
and obtaining the number of times of changing the position of the mobile phone signaling in the signaling residence time to obtain the position number.
In this scheme, the clustering the target difference, the number of times of call, and the number of positions to obtain a clustering result specifically includes:
performing data clustering based on a preset clustering mode, wherein the clustering mode comprises k-means clustering;
and clustering the three types of data including the target difference value, the call times and the position number to obtain the corresponding clustering result.
In this scheme, the acquiring the occupation type of the crowd based on the preset type constraint condition matched by the clustering result specifically includes:
matching preset type constraint conditions based on the clustering result, wherein the type constraint conditions comprise three preset professional fields which are respectively a courier, a network appointment driver and a bus driver;
the crowd occupation type of the clustering result matched with the first constraint condition is the courier;
the crowd occupation type of the clustering result matched with a second constraint condition is the net car appointment driver;
and the occupation type of the crowd of which the clustering result is matched with the third constraint condition is the bus driver.
A third aspect of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a program of a mobile signaling-based crowd occupation type acquisition method of a machine, and when the program of the mobile signaling-based crowd occupation type acquisition method is executed by a processor, the steps of the mobile signaling-based crowd occupation type acquisition method are implemented as described in any one of the above.
According to the method, the system and the storage medium for acquiring the occupation types of the crowds based on the mobile phone signaling, the occupation types of the crowds are distinguished and divided by acquiring the mobile phone signaling data of the specific user group, the occupation types can be accurately reflected in a continuous time period, the spatial position of the mobile phone user at a time point is not needed, and the possibility is provided for qualitatively judging the occupation types of the mobile phone user.
Drawings
FIG. 1 is a flow chart of a crowd occupation type acquisition method based on mobile phone signaling according to the present invention;
FIG. 2 is a clustering flow chart of a crowd occupation type acquisition method based on mobile phone signaling according to the present invention;
FIG. 3 is a schematic diagram illustrating a clustering result of the crowd occupation type obtaining method based on mobile phone signaling according to the present invention;
fig. 4 shows a block diagram of a crowd occupation type acquisition system based on mobile phone signaling according to the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
The occupation types of the crowds are judged through mobile phone signaling, most of the occupation types are judged through related apertures, the moving range, the moving radius, the call times and the like of the crowds need to be known, and the specific numerical value range of the specific crowds needs to be divided when the related apertures are unified. Therefore, the spatial positions of the mobile phone users at different time points are obtained, and possibility is provided for qualitatively judging the occupation types of the mobile phone users.
In the present application, clustering analysis is performed using k-means in SPSS, wherein the clustering with respect to k-means is as follows: the method comprises the steps that n numerical variables participate in rapid clustering, each sample in an n-dimensional space is a point in the space, the required classification number is K, K points are selected as initial center condensation points, points represented by other samples are condensed to class centers according to the minimum Euclidean distance principle of the clustering centers, an initial classification scheme is obtained, and the center position (mean value) of each initial classification is calculated; and re-clustering using the calculated center positions until the condensation point positions change little (or meet a convergence criterion). In the application, the occupation types of the important groups to be divided are a courier, a net appointment driver and a bus driver, so that the value of k is '3'.
Fig. 1 shows a flowchart of a crowd occupation type obtaining method based on mobile phone signaling according to the present application.
As shown in fig. 1, the application discloses a method for acquiring occupation types of people based on mobile phone signaling, which comprises the following steps:
s102, acquiring base station data corresponding to each user based on user signaling data, wherein the base station data comprises signaling dwell time and base station longitude and latitude;
s104, acquiring a target difference value of the longitude and latitude of the base station of each user mobile phone in a natural day based on different signaling stay durations;
s106, obtaining calling data and called data of the user mobile phone in a preset time period, grouping the data based on the mobile phone number and the signaling identifier, and counting the number of times of calls and the number of positions;
s108, clustering the target difference value, the call times and the position number to obtain a clustering result;
and S110, matching preset type constraint conditions based on the clustering result to obtain the occupation type of the crowd.
It should be noted that the base station data corresponding to each user is obtained based on the user signaling data of three home operators, where the base station data includes all base station sector information of the IMSI switched, the base station sector information includes sector position, sector entering and exiting time information, the signaling residence time and the base station longitude and latitude are obtained based on the base station sector information, then a target difference of the base station longitude and latitude and the call times and the location number of the user of each user's mobile phone in the natural day are obtained, the obtained three types of data (including the target difference, the call times and the location number) are clustered to obtain the clustering result, the corresponding professional type of the crowd is obtained based on the clustering result matching with preset type constraint conditions, where the type constraint conditions correspond to three conditions, and respectively corresponding to constraint conditions of a courier, a network appointment driver and a bus driver so as to obtain the occupation types of the crowd for identifying the user.
According to the embodiment of the present invention, the acquiring base station data corresponding to each user based on user signaling data specifically includes:
acquiring the signaling data based on a user mobile phone operator;
acquiring base station sector information switched by the IMSI of a user to obtain the retention time of a mobile phone signaling and a base station list;
carrying out time cutting grouping on the mobile phone signaling residence time to obtain the signaling residence time;
and associating a preset reference value to obtain the longitude and latitude of the base station based on the base station list, wherein the reference value comprises a Lac value and a Ci value, the Lac value represents a network system of a user mobile phone, and the Ci value represents a base station number.
It should be noted that the base station data includes a signaling residence time and base station longitude and latitude, specifically, the signaling data may be obtained based on a user mobile phone operator, and then the time for a mobile phone signaling in the signaling data to stay in the sector and a corresponding base station number may be obtained through base station sector information switched by a user IMSI, so as to obtain the mobile phone signaling residence time and the base station list, and perform time division and grouping on the mobile phone signaling residence time to obtain the signaling residence time, where the signaling residence time includes "10, 20, 30, 60, 120, 240, and 480" minutes; and then associating the Lac value and the Ci value according to the base station detailed table to obtain the latitude and longitude of the base station.
According to the embodiment of the present invention, the obtaining of the target difference of the longitude and latitude of the base station of each user mobile phone in the natural day based on the different signaling staying durations specifically includes:
extracting the mobile phone signaling of the user corresponding to the retention time of each group of signaling;
acquiring the longitude and latitude of the corresponding base station based on the mobile phone signaling;
and extracting the maximum base station longitude and latitude and the minimum base station longitude and latitude corresponding to the user in the natural day, and performing difference to obtain the target difference value.
It should be noted that, in this embodiment, taking the signaling staying time duration of "120" minutes as an example, the maximum base station longitude and latitude and the minimum base station longitude and latitude in all the base stations that the user passes through in "120" minutes are extracted, the two longitude and latitude are differentiated to obtain the target difference value, and the distance value between the corresponding maximum base station longitude and latitude and the minimum base station longitude and latitude is obtained based on the target difference value to obtain the user activity range.
It is worth mentioning that the functions are taken
Figure 712414DEST_PATH_IMAGE001
To obtain said distance value, if function
Figure 832817DEST_PATH_IMAGE002
Is a distance measure, some basic properties are satisfied:
nonnegativity:
Figure 550237DEST_PATH_IMAGE003
;
identity:
Figure 617550DEST_PATH_IMAGE004
if and only if
Figure 635185DEST_PATH_IMAGE005
;
Symmetry:
Figure 598593DEST_PATH_IMAGE006
;
straight transmitting property
Figure 803309DEST_PATH_IMAGE007
Given sample
Figure 533368DEST_PATH_IMAGE008
And
Figure 405509DEST_PATH_IMAGE009
the most commonly used is "Minkowski distance" (Minkowski distance):
Figure 539818DEST_PATH_IMAGE010
for the
Figure 825306DEST_PATH_IMAGE011
The above equation obviously satisfies the basic property of distance measures.
Figure 765580DEST_PATH_IMAGE012
The minkowski distance is the Euclidean distance (Euclidean distance):
Figure 429911DEST_PATH_IMAGE013
Figure 791579DEST_PATH_IMAGE014
the minkowski distance is the Manhattan distance (Manhattan distance):
Figure 970887DEST_PATH_IMAGE015
in the practical application process, the attributes are often divided into "continuous attributes" and "discrete attributes", wherein the former have infinite possible values in the definition domain, and the latter have a limited number of values in the definition domain. Then, when discussing the distance calculation, it is more important whether the "order" relationship is defined on the attribute. For example, discrete attributes with a domain of {1, 2, 3} are closer in nature to continuous attributes, and the distance can be calculated over the attribute values: "1" is closer to "2" and further from "3", and such an attribute is called "order attribute"; discrete attributes such as { plane, train, ship } in the domain of definition cannot directly compute distance on the attribute values, called "chaotic attributes". It is apparent that minkowski distances can be used to compute the ordered attributes. Since this is a conventional means for calculating a distance value by those skilled in the art, it will not be described herein.
According to the embodiment of the invention, the obtaining of the calling data and the called data of the user mobile phone in the preset time period, grouping based on the mobile phone number and the signaling identification, and counting the number of times of calls and the number of positions specifically comprise:
obtaining the number of calls corresponding to the mobile phone number based on the calling data and the called data;
and obtaining the number of times of changing the position of the mobile phone signaling in the signaling residence time to obtain the position number.
It should be noted that, the preset time period is taken as "8: 00-21: 00' (taking 24 hours for timing naturally), grouping according to the mobile phone number and the signaling identifier in the time period, screening data of a calling party and a called party of a mobile phone user to obtain the number of times of calling, obtaining the number of positions based on the number of times of changing the positions of the mobile phone signaling in the signaling residence time, and combining the number of times of calling and the number of positions for subsequent clustering.
According to the embodiment of the present invention, the clustering the target difference, the number of times of calling, and the position number to obtain a clustering result specifically includes:
performing data clustering based on a preset clustering mode, wherein the clustering mode comprises k-means clustering;
and clustering the three types of data including the target difference value, the call times and the position number to obtain the corresponding clustering result.
It should be noted that, the action track of the mobile phone user is obtained based on the signaling data, and k-means clustering is utilized on the basis of combining the base station list, assuming a sample set
Figure 183694DEST_PATH_IMAGE016
Included
Figure 764848DEST_PATH_IMAGE017
A label-free sample, each sample
Figure 303277DEST_PATH_IMAGE018
Is one
Figure 297777DEST_PATH_IMAGE019
Dimensional feature vector, then clustering is to set the sample set
Figure 314275DEST_PATH_IMAGE020
Is divided into
Figure 749936DEST_PATH_IMAGE021
A non-intersecting cluster
Figure 724845DEST_PATH_IMAGE022
Wherein
Figure 816429DEST_PATH_IMAGE023
And is
Figure 902196DEST_PATH_IMAGE024
Accordingly, we use
Figure 520260DEST_PATH_IMAGE025
Representing a sample
Figure 666070DEST_PATH_IMAGE026
"Cluster marking" of (i.e. a
Figure 244950DEST_PATH_IMAGE027
The result of the clustering can then be used to include
Figure 868829DEST_PATH_IMAGE028
Individual element cluster marker vector
Figure 606978DEST_PATH_IMAGE029
And (4) showing.
Specifically, in the present embodiment, the sample set
Figure 330215DEST_PATH_IMAGE030
Comprises "
Figure 521025DEST_PATH_IMAGE031
"multiple unlabeled samples, each sample
Figure 683016DEST_PATH_IMAGE032
Is one
Figure 676336DEST_PATH_IMAGE033
A dimensional feature vector corresponding to the signaling data of each user, including the target difference
Figure 304895DEST_PATH_IMAGE034
The number of calls
Figure 451843DEST_PATH_IMAGE035
And the number of the positions
Figure 417525DEST_PATH_IMAGE036
Due to the fact that in the present embodiment
Figure 740053DEST_PATH_IMAGE037
The value is '3', so the clusters after clustering are respectively obtained
Figure 867409DEST_PATH_IMAGE038
The corresponding cluster label vector is
Figure 298390DEST_PATH_IMAGE039
And
Figure 802184DEST_PATH_IMAGE040
according to the embodiment of the present invention, the obtaining of the occupation type of the crowd based on the preset type constraint condition matched with the clustering result specifically includes:
matching preset type constraint conditions based on the clustering result, wherein the type constraint conditions comprise three preset professional fields which are respectively a courier, a network appointment driver and a bus driver;
the crowd occupation type of the clustering result matched with the first constraint condition is the courier;
the crowd occupation type of the clustering result matched with a second constraint condition is the net car appointment driver;
and the occupation type of the crowd of which the clustering result is matched with the third constraint condition is the bus driver.
It should be noted that, the first constraint condition corresponding to the courier is: the occurrence frequency of the signaling retention time of 20 minutes is more than 3 times, and the occurrence frequency of the signaling retention time of 30 minutes is more than 5 times; diameter of the user's range of motion: more than 10 kilometers; number of callers "15" and above; after the calling, the distance range of the called user is within 1 kilometer.
The second constraint condition corresponding to the net car booking driver is as follows: the number of occurrences of the signaling dwell time of "20" minutes is greater than "10" times and the number of occurrences of the signaling dwell time of "30" minutes is less than "5" times and the number of occurrences of the signaling dwell time of "60" minutes is less than "3" times; the diameter of the user's range of motion is "15" kilometers or more.
The third constraint condition corresponding to the bus driver is as follows: and the data of the courier and the network car booking driver are removed on the basis that the data of the courier and the network car booking driver appear for 5 times or more under the same base station and the data of the 5 streets or more appear.
In particular toAs shown in fig. 2, the final clustering center, i.e., the clustering result, is based on the cluster label vector
Figure 307114DEST_PATH_IMAGE041
And
Figure 808634DEST_PATH_IMAGE042
performing matching, wherein the vectors are marked with clusters
Figure 133436DEST_PATH_IMAGE043
For example, if a cluster marks a vector
Figure 706500DEST_PATH_IMAGE044
The occurrence frequency of the signaling stay time length of 20 minutes is 5 times, the occurrence frequency of the signaling stay time length of 30 minutes is 6 times, and the diameter of the user activity range is 15 kilometers; the number of calling times is '17' and after the calling, the distance range of the called user is less than '1' kilometer, then the cluster mark vector
Figure 862675DEST_PATH_IMAGE045
The represented user group corresponding to the crowd occupation type is the courier.
It is worth mentioning that the method further includes performing data cleaning on the signaling data, which specifically includes:
carrying out interpolation compensation on the missing signaling in the signaling data;
time cutting is carried out on the signaling data of the same user in the signaling data to obtain the signaling data of the same user in different base stations;
and carrying out normalized processing on the signaling data, and rejecting non-logic data or adjusting the non-logic data into logic usable data.
It should be noted that, to perform interpolation compensation on the missing signaling entering and exiting the base station, in order to ensure the integrity of data, if the user only enters a certain sector time but does not leave the sector time, or only leaves the certain sector time but does not enter the sector time within the statistical time period, the missing data needs to be interpolated, and the interpolation time points are the start time and the end time of the statistical time period. For example, a user a enters the sector X at "5 month 1 day 23:00: 00", leaves the sector X at "5 month 2 day 7:00: 00" and the time at which the user a enters the sector Y is "5 month 2 day 23:00: 00", and leaves the sector Y at "5 month 3 day 7:00: 00", and when the information of the user a at "5 month 2 day" is collected, the time point at which the user enters the sector X and the time point at which the user leaves the sector Y are lost, so that it is necessary to interpolate the time at which the user enters the sector X at "5 month 2 day 00:00: 00" and the time point at which the user leaves the sector Y at "5 month 2 day 23:59: 59".
The method comprises the steps of dividing time periods of repeated signaling with time overlapping, namely, time cutting is carried out on signaling data of the same user in the base station data, and if the user is in two completely different base stations in the repeated time periods, the time point of entry of the next base station is required to be used as the time for cutting, and the time period is divided into two time periods. For example, the user may be in "5 months, 1 day, 10: 00: 00-10: 30: 00 "in sector a, but in" 5 months, 1 day 10: 25: 00-10: 50: 00 "in sector B, should be measured at" 5 months, 1 day, 10: 25: 00' is a division time point to distinguish the repetitive region.
And carrying out normalization processing on the base station data, namely adjusting unnormalized data or non-logic data. For example, the time is represented as "20220501000000", but when a certain data time appears suddenly, the time is represented as "00 min 00 s at 1 st 00 h 5/h 2022", and the irregular data is adjusted to be the regular data "20220501000000"; and carrying out normalization processing on the non-standard data, and removing or adjusting the non-logic data into usable data which accords with the front logic and the rear logic.
It is worth mentioning that obtaining the action track of the mobile phone user based on the signaling data specifically includes:
obtaining the activity track of the user in the range of each base station based on the mobile phone signaling residence time and the base station list;
and integrating to obtain the action track of the mobile phone user based on the action track.
It should be noted that, because the range of the sector corresponding to each base station is limited, the mobile phone user can obtain the mobile phone user's action track by integrating the user's action track in the range of each base station.
It is worth mentioning that the k-means algorithm is as follows:
given sample set
Figure 862992DEST_PATH_IMAGE046
Clustering by the k-means algorithm
Figure 612773DEST_PATH_IMAGE047
Minimizing the square error
Figure 723948DEST_PATH_IMAGE048
Figure 937892DEST_PATH_IMAGE049
It is required to be noted that
Figure 171427DEST_PATH_IMAGE050
Is a cluster
Figure 222821DEST_PATH_IMAGE051
The mean vector of (a) is intuitive, the above formula describes to some extent how close the sample surrounds the cluster mean vector,
Figure 340949DEST_PATH_IMAGE052
the smaller the value, the higher the similarity of the samples within the cluster. Minimizing the above equation is not easy and finding its optimal solution requires examining the sample set
Figure 409399DEST_PATH_IMAGE053
All possible clusters are partitioned, so K-means is used to approximate the above equation by iteration.
It is worth mentioning that, as shown in fig. 3, the clustering process using k-means is as follows:
inputting:
sample set
Figure 813836DEST_PATH_IMAGE054
Number of clusters clustered
Figure 803789DEST_PATH_IMAGE055
The process is as follows:
from
Figure 256767DEST_PATH_IMAGE056
In the random selection
Figure 773199DEST_PATH_IMAGE057
Taking samples as initial mean vector
Figure 223903DEST_PATH_IMAGE058
Repeat
Order to
Figure 825785DEST_PATH_IMAGE059
For
Figure 82454DEST_PATH_IMAGE060
do
Computing samples
Figure 328759DEST_PATH_IMAGE061
And each mean vector
Figure 340577DEST_PATH_IMAGE062
The distance of (c):
Figure 773964DEST_PATH_IMAGE063
;
determining from the nearest mean vector
Figure 568745DEST_PATH_IMAGE064
Cluster marking of (2):
Figure 794190DEST_PATH_IMAGE065
mixing the sample
Figure 852275DEST_PATH_IMAGE066
Dividing into corresponding clusters:
Figure 721093DEST_PATH_IMAGE067
End for
For
Figure 647461DEST_PATH_IMAGE068
do
calculating a new mean vector
Figure 337200DEST_PATH_IMAGE069
If
Figure 628504DEST_PATH_IMAGE070
then
Vector the current mean value
Figure 426695DEST_PATH_IMAGE071
Is updated to
Figure 766541DEST_PATH_IMAGE072
Else
Keeping current mean vector
Figure 373103DEST_PATH_IMAGE073
Is not changed
End if
End for
Until current mean vectors are all updated
And (3) outputting:
cluster partitioning
Figure 163204DEST_PATH_IMAGE074
It should be noted that the number of clusters is assumed
Figure 58479DEST_PATH_IMAGE075
At the beginning of the algorithmSelecting three samples as initial mean vector
Figure 998753DEST_PATH_IMAGE076
Observation of
Figure 53297DEST_PATH_IMAGE077
The distance from the three mean vectors to which mean vector is closer, the three mean vectors are divided into which cluster, and similarly, after all samples in the data set are examined once, the cluster division of the first iteration can be obtained
Figure 624087DEST_PATH_IMAGE078
Then, can be selected from
Figure 803395DEST_PATH_IMAGE079
Respectively obtain new mean value vectors
Figure 875257DEST_PATH_IMAGE080
After the current mean vector is updated, the above process is repeated continuously, theoretically, by self-defining the iteration times, whether the iteration result is effective or not can be obtained according to whether the iteration result can be converged, but in practice, the K-means can not ensure the convergence to be global optimal, the selection of the initial center can directly influence the clustering result, and when the movement generated in the process of re-clustering the class center is not too large, the sample can be distributed to the cluster of the center closest to the class center, so that the final cluster division is obtained. Preferably, when whether the boundary between the clusters is definite is considered, performance metric needs to be introduced for evaluation, that is, the higher the intra-cluster similarity of the cluster result is and the lower the inter-cluster similarity is, the more obvious the clustering effect is indicated, and the introduction of performance metric for evaluating whether the boundary is definite is a conventional technical means of those skilled in the art, which is not described herein again.
Fig. 4 shows a block diagram of a crowd occupation type acquisition system based on mobile phone signaling according to the present invention.
As shown in fig. 4, the present invention discloses a system for acquiring occupation types of people based on mobile phone signaling, which includes a memory and a processor, wherein the memory includes a program of a method for acquiring occupation types of people based on mobile phone signaling, and when executed by the processor, the program of the method for acquiring occupation types of people based on mobile phone signaling implements the following steps:
acquiring base station data corresponding to each user based on user signaling data, wherein the base station data comprises signaling dwell time and base station longitude and latitude;
acquiring a target difference value of the longitude and latitude of the base station of each user mobile phone in a natural day based on different signaling stay time lengths;
acquiring calling data and called data of a user mobile phone in a preset time period, grouping based on a mobile phone number and a signaling identifier, and counting the number of times of calls and the number of positions;
clustering the target difference value, the call times and the position number to obtain a clustering result;
and acquiring the occupation type of the crowd based on the clustering result matched with a preset type constraint condition.
It should be noted that the base station data corresponding to each user is obtained based on the user signaling data of three home operators, where the base station data includes all base station sector information of the IMSI switched, the base station sector information includes sector position, sector entering and exiting time information, the signaling residence time and the base station longitude and latitude are obtained based on the base station sector information, then a target difference of the base station longitude and latitude and the call times and the location number of the user of each user's mobile phone in the natural day are obtained, the obtained three types of data (including the target difference, the call times and the location number) are clustered to obtain the clustering result, the corresponding professional type of the crowd is obtained based on the clustering result matching with preset type constraint conditions, where the type constraint conditions correspond to three conditions, and respectively corresponding to constraint conditions of a courier, a network appointment driver and a bus driver so as to obtain the occupation types of the crowd for identifying the user.
According to the embodiment of the present invention, the acquiring base station data corresponding to each user based on user signaling data specifically includes:
acquiring the signaling data based on a user mobile phone operator;
acquiring base station sector information switched by the IMSI of a user to obtain the retention time of a mobile phone signaling and a base station list;
performing time cutting grouping on the mobile phone signaling residence time to obtain the signaling residence time;
and associating a preset reference value to obtain the longitude and latitude of the base station based on the base station list, wherein the reference value comprises a Lac value and a Ci value, the Lac value represents a network system of a user mobile phone, and the Ci value represents a base station number.
It should be noted that the base station data includes a signaling residence time and base station longitude and latitude, specifically, the signaling data may be obtained based on a user mobile phone operator, and then the time for a mobile phone signaling in the signaling data to stay in the sector and a corresponding base station number may be obtained through base station sector information switched by a user IMSI, so as to obtain the mobile phone signaling residence time and the base station list, and perform time division and grouping on the mobile phone signaling residence time to obtain the signaling residence time, where the signaling residence time includes "10, 20, 30, 60, 120, 240, and 480" minutes; and then associating the Lac value with the Ci value according to the base station list to obtain the longitude and latitude of the base station.
According to the embodiment of the present invention, the obtaining of the target difference of the latitude and longitude of the base station of each user mobile phone in the natural day based on the different signaling residence time specifically includes:
extracting the mobile phone signaling of the user corresponding to the retention time of each group of signaling;
acquiring the longitude and latitude of the corresponding base station based on the mobile phone signaling;
and extracting the maximum base station longitude and latitude and the minimum base station longitude and latitude corresponding to the user in the natural day, and performing difference to obtain the target difference value.
It should be noted that, in this embodiment, taking the signaling staying time duration of "120" minutes as an example, the maximum base station longitude and latitude and the minimum base station longitude and latitude in all the base stations that the user passes through within "120" minutes are extracted, the two longitude and latitude are differentiated to obtain the target difference value, and the distance value between the corresponding maximum base station longitude and latitude and the minimum base station longitude and latitude is obtained based on the target difference value to obtain the user activity range.
It is worth mentioning that the functions are taken
Figure 659673DEST_PATH_IMAGE081
To obtain said distance value, if function
Figure 525998DEST_PATH_IMAGE082
Is a distance measure, some basic properties are satisfied:
nonnegativity:
Figure 458182DEST_PATH_IMAGE083
;
identity:
Figure 943521DEST_PATH_IMAGE084
if and only if
Figure 707077DEST_PATH_IMAGE085
;
Symmetry:
Figure 416407DEST_PATH_IMAGE086
;
the direct transmission property:
Figure 39150DEST_PATH_IMAGE087
given sample
Figure 187234DEST_PATH_IMAGE088
And
Figure 946243DEST_PATH_IMAGE089
the most common is the "Minkowski distance" (Minkowski distance):
Figure 86194DEST_PATH_IMAGE090
for the
Figure 789708DEST_PATH_IMAGE091
The above equation obviously satisfies the basic property of distance measures.
Figure 616850DEST_PATH_IMAGE092
The minkowski distance is the Euclidean distance (Euclidean distance):
Figure 292682DEST_PATH_IMAGE093
Figure 671710DEST_PATH_IMAGE094
the minkowski distance is the Manhattan distance (Manhattan distance):
Figure 3466DEST_PATH_IMAGE095
in the practical application process, the attributes are often divided into "continuous attributes" and "discrete attributes", wherein the former have infinite possible values in the definition domain, and the latter have a limited number of values in the definition domain. Then, when discussing the distance calculation, it is more important whether the "order" relationship is defined on the attribute. For example, a discrete attribute with a domain {1, 2, 3} being closer in nature to a continuous attribute, can compute the distance over the attribute value: "1" is closer to "2" and farther from "3", and such an attribute is called "ordered attribute"; discrete attributes such as { plane, train, ship } in the domain of definition cannot directly compute distance on the attribute values, called "chaotic attributes". It is apparent that minkowski distances can be used to compute the ordered attributes. Since this is a conventional means for calculating a distance value by those skilled in the art, it will not be described herein.
According to the embodiment of the invention, the obtaining of the calling data and the called data of the user mobile phone in the preset time period, grouping based on the mobile phone number and the signaling identification, and counting the number of times of calls and the number of positions specifically comprise:
obtaining the number of calls corresponding to the mobile phone number based on the calling data and the called data;
and obtaining the number of times of changing the position of the mobile phone signaling in the signaling residence time to obtain the position number.
It should be noted that, the preset time period is taken as "8: 00-21: 00' (taking 24 hours for timing naturally), grouping according to the mobile phone number and the signaling identifier in the time period, screening data of a calling party and a called party of a mobile phone user to obtain the number of times of calling, obtaining the number of positions based on the number of times of changing the positions of the mobile phone signaling in the signaling residence time, and combining the number of times of calling and the number of positions for subsequent clustering.
According to the embodiment of the present invention, the clustering the target difference, the number of times of call, and the number of locations to obtain a clustering result specifically includes:
performing data clustering based on a preset clustering mode, wherein the clustering mode comprises k-means clustering;
and clustering the three types of data including the target difference value, the call times and the position number to obtain the corresponding clustering result.
It should be noted that, the action track of the mobile phone user is obtained based on the signaling data, and k-means clustering is utilized on the basis of combining the base station list, assuming a sample set
Figure 493353DEST_PATH_IMAGE096
Included
Figure 23691DEST_PATH_IMAGE097
A label-free sample, each sample
Figure 448988DEST_PATH_IMAGE098
Is one
Figure 64777DEST_PATH_IMAGE099
Dimensional feature vector, then clustering is to set the sample set
Figure 358355DEST_PATH_IMAGE100
Is divided into
Figure 680883DEST_PATH_IMAGE101
A non-intersecting cluster
Figure 667293DEST_PATH_IMAGE102
In which
Figure 770379DEST_PATH_IMAGE103
And is
Figure 743014DEST_PATH_IMAGE104
Accordingly, we use
Figure 310261DEST_PATH_IMAGE105
Representing a sample
Figure 77360DEST_PATH_IMAGE106
"Cluster marking" of, i.e.
Figure 667742DEST_PATH_IMAGE107
The result of the clustering can then be used to include
Figure 37543DEST_PATH_IMAGE108
Individual element cluster marker vector
Figure 600243DEST_PATH_IMAGE109
And (4) showing.
Specifically, in the present embodiment, the sample set
Figure 600560DEST_PATH_IMAGE110
Comprises "
Figure 740554DEST_PATH_IMAGE111
"multiple unlabeled samples, each sample
Figure 320571DEST_PATH_IMAGE112
Is one
Figure 596832DEST_PATH_IMAGE113
A dimensional feature vector corresponding to signaling data of each user, including the target difference
Figure 971312DEST_PATH_IMAGE114
The number of calls
Figure 264847DEST_PATH_IMAGE115
And the number of the positions
Figure 242030DEST_PATH_IMAGE116
Due to the fact that in the present embodiment
Figure 513743DEST_PATH_IMAGE117
The value is '3', so the clusters after clustering are respectively obtained
Figure 855862DEST_PATH_IMAGE118
The corresponding cluster label vector is
Figure 704870DEST_PATH_IMAGE119
And
Figure 157848DEST_PATH_IMAGE120
according to the embodiment of the present invention, the obtaining of the occupation type of the crowd based on the preset type constraint condition matched with the clustering result specifically includes:
matching preset type constraint conditions based on the clustering result, wherein the type constraint conditions comprise three preset professional fields which are respectively a courier, a net appointment driver and a bus driver;
the crowd occupation type of the clustering result matched with the first constraint condition is the courier;
the crowd occupation type of the clustering result matched with a second constraint condition is the net car appointment driver;
and the occupation type of the crowd of which the clustering result is matched with a third constraint condition is the bus driver.
It should be noted that, the first constraint condition corresponding to the courier is: the occurrence times of the signaling retention time length of 20 minutes are more than 3 times, and the occurrence times of the signaling retention time length of 30 minutes are more than 5 times; diameter of the user's range of motion: more than "10" kilometers; number of callers "15" and above; after the calling, the distance range of the called user is within 1 kilometer.
The second constraint condition corresponding to the net car booking driver is as follows: the number of occurrences of the signaling dwell time of "20" minutes is greater than "10" times and the number of occurrences of the signaling dwell time of "30" minutes is less than "5" times and the number of occurrences of the signaling dwell time of "60" minutes is less than "3" times; the diameter of the user's range of motion is "15" kilometers or more.
The third constraint condition corresponding to the bus driver is as follows: and the data of the courier and the taxi appointment driver are removed on the basis that the data of the courier and the taxi appointment driver appear for 5 times or more under the same base station and appear on 5 streets or more.
Specifically, as shown in fig. 2, the final clustering center, i.e., the clustering result, is based on the cluster label vector
Figure 284067DEST_PATH_IMAGE121
And
Figure 797088DEST_PATH_IMAGE122
performing matching, wherein the vectors are marked with clusters
Figure 133391DEST_PATH_IMAGE123
For example, if a cluster marks a vector
Figure 390060DEST_PATH_IMAGE124
The number of occurrences of the signaling dwell time of "20" minutes is "5" times, the number of occurrences of the signaling dwell time of "30" minutes is "6" times, and the diameter of the user activity range is "15" kilometers; the number of calling times is '17' and after the calling, the distance range of the called user is less than '1' kilometer, then the cluster mark vector
Figure 370786DEST_PATH_IMAGE125
The represented user group corresponding to the crowd occupation type is the courier.
It is worth mentioning that the method further includes performing data cleaning on the signaling data, specifically including:
carrying out interpolation compensation on the missing signaling in the signaling data;
time cutting is carried out on the signaling data of the same user in the signaling data to obtain the signaling data of the same user in different base stations;
and carrying out standardized processing on the signaling data, and rejecting non-logic data or adjusting the non-logic data into logic usable data.
It should be noted that, to perform interpolation compensation on the missing signaling entering and exiting the base station, in order to ensure the integrity of data, if the user only enters a certain sector time but does not leave the sector time, or only leaves the certain sector time but does not enter the sector time within the statistical time period, the missing data needs to be interpolated, and the interpolation time points are the start time and the end time of the statistical time period. For example, a user a enters the sector X at "5 month 1 day 23:00: 00", leaves the sector X at "5 month 2 day 7:00: 00", and the time when the user a enters the sector Y is "5 month 2 day 23:00: 00", and leaves the sector Y at "5 month 3 day 7:00: 00", and when the information of the user a at "5 month 2 day" is collected, the time point when the user a enters the sector X and the time point when the user b leaves the sector Y are missing, and therefore it is necessary to interpolate the time when the user a enters the sector X at "5 month 2 day 00: 00", and the time point when the user b leaves the sector Y at "5 month 2 day 23:59: 59".
The method comprises the steps of dividing time periods of repeated signaling with time overlapping, namely, time cutting is carried out on signaling data of the same user in the base station data, and if the user is in two completely different base stations in the repeated time periods, the time point of entry of the next base station is required to be used as the time for cutting, and the time period is divided into two time periods. For example, the user may be in "5 months, 1 day, 10: 00: 00-10: 30: 00 "in sector a, but at" 5 months, 1 day 10: 25: 00-10: 50: 00 "in sector B, should be counted" 5 months, 1 day 10: 25: 00' is a division time point for distinguishing the repeated area.
And carrying out normalization processing on the base station data, namely adjusting unnormalized data or non-logic data. For example, the time is represented as "20220501000000", but the time when a certain data suddenly appears is represented as "00 min 00 s at 5/1/2022", such irregular data is adjusted to be the regular data "20220501000000"; and carrying out normalization processing on the non-standard data, and removing or adjusting the non-logic data into usable data which accords with the front logic and the rear logic.
It is worth mentioning that obtaining the action track of the mobile phone user based on the signaling data specifically includes:
obtaining the activity track of the user in the range of each base station based on the mobile phone signaling residence time and the base station list;
and integrating to obtain the action track of the mobile phone user based on the action track.
It should be noted that, because the range of the sector corresponding to each base station is limited, the mobile phone user can obtain the mobile phone user's action track by integrating the user's action track in the range of each base station.
It is worth mentioning that the k-means algorithm is as follows:
given sample set
Figure 382604DEST_PATH_IMAGE126
K-means algorithm for clustering resulting cluster partitions
Figure 878307DEST_PATH_IMAGE127
Minimizing square error
Figure 141930DEST_PATH_IMAGE128
Figure 367375DEST_PATH_IMAGE129
It is required to be noted that
Figure 159881DEST_PATH_IMAGE130
Is a cluster
Figure 736356DEST_PATH_IMAGE131
The mean vector of (a) is intuitive, the above formula describes to some extent how close the sample surrounds the cluster mean vector,
Figure 334828DEST_PATH_IMAGE132
the smaller the value, the higher the similarity of the samples within the cluster. Minimizing the above equation is not easy and finding its optimal solution requires examining the sample set
Figure 24566DEST_PATH_IMAGE133
All possible clusters are partitioned, so K-means is used to approximate the above equation by iteration.
It is worth mentioning that, as shown in fig. 3, the process of clustering by using k-means is as follows:
inputting:
sample set
Figure 315870DEST_PATH_IMAGE054
Number of clusters clustered
Figure 848483DEST_PATH_IMAGE055
The process is as follows:
from
Figure 188328DEST_PATH_IMAGE134
In the random selection
Figure 388365DEST_PATH_IMAGE135
Taking samples as initial mean vector
Figure 850571DEST_PATH_IMAGE058
Repeat
Order to
Figure 497845DEST_PATH_IMAGE059
For
Figure 31594DEST_PATH_IMAGE060
do
Computing samples
Figure 961504DEST_PATH_IMAGE061
And each mean vector
Figure 329031DEST_PATH_IMAGE062
Distance (c):
Figure 101815DEST_PATH_IMAGE063
;
determining from the nearest mean vector
Figure 783463DEST_PATH_IMAGE136
Cluster marking of (2):
Figure 364618DEST_PATH_IMAGE065
mixing the sample
Figure 230942DEST_PATH_IMAGE066
Dividing into corresponding clusters:
Figure 897547DEST_PATH_IMAGE067
End for
For
Figure 382886DEST_PATH_IMAGE068
do
calculating a new mean vector
Figure 412022DEST_PATH_IMAGE069
If
Figure 121352DEST_PATH_IMAGE070
then
Vector the current mean value
Figure 947357DEST_PATH_IMAGE071
Is updated to
Figure 33124DEST_PATH_IMAGE072
Else
Keeping current mean vector
Figure 526554DEST_PATH_IMAGE073
Is not changed
End if
End for
Until current mean vectors are all updated
And (3) outputting:
cluster partitioning
Figure 406785DEST_PATH_IMAGE137
It should be noted that the number of clusters is assumed
Figure 110299DEST_PATH_IMAGE138
Randomly selecting three samples as initial mean vector at the beginning of algorithm
Figure 999758DEST_PATH_IMAGE139
Observation of
Figure 347693DEST_PATH_IMAGE140
The distance from the three mean vectors to which mean vector is closer, the three mean vectors are divided into which cluster, and similarly, after all samples in the data set are examined once, the cluster division of the first iteration can be obtained
Figure 992301DEST_PATH_IMAGE141
Then, can be selected from
Figure 120794DEST_PATH_IMAGE142
Respectively calculate new mean value vector
Figure 480188DEST_PATH_IMAGE143
Update the current mean valueAfter vector, the above process is repeated continuously, theoretically, by self-defining the iteration times, whether the iteration result is effective or not can be obtained according to whether the iteration result can be converged, but in practice, the convergence to the global optimum cannot be guaranteed by K-means, the clustering result can be directly influenced by the selection of the initial center, and when the movement generated in the process of re-clustering the class center is not too large, the sample can be distributed to the cluster of the nearest center, so that the final cluster division is obtained. Preferably, when whether the boundary between the clusters is definite is considered, performance metric needs to be introduced for evaluation, that is, the higher the intra-cluster similarity of the cluster result is and the lower the inter-cluster similarity is, the more obvious the clustering effect is indicated, and the introduction of performance metric for evaluating whether the boundary is definite is a conventional technical means of those skilled in the art, which is not described herein again.
A third aspect of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a mobile signaling-based crowd occupation type obtaining method program, and when the mobile signaling-based crowd occupation type obtaining method program is executed by a processor, the steps of the mobile signaling-based crowd occupation type obtaining method are implemented as described in any one of the above.
According to the method, the system and the storage medium for acquiring the occupation types of the crowds based on the mobile phone signaling, the occupation types of the crowds are distinguished and divided by acquiring the mobile phone signaling data of a specific user group, the occupation types can be accurately reflected in a continuous time period, the spatial position of the mobile phone user at a time point is not used, and the possibility is provided for qualitatively judging the occupation types of the mobile phone user.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only one logical function division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various media capable of storing program codes.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

Claims (10)

1. A crowd occupation type obtaining method based on mobile phone signaling is characterized by comprising the following steps:
acquiring base station data corresponding to each user based on user signaling data, wherein the base station data comprises signaling dwell time and base station longitude and latitude;
acquiring a target difference value of the longitude and latitude of the base station of each user mobile phone in a natural day based on different signaling stay time lengths;
acquiring calling data and called data of a user mobile phone in a preset time period, grouping based on a mobile phone number and a signaling identifier, and counting the number of times of calls and the number of positions;
clustering the target difference, the call times and the position number to obtain a clustering result;
and acquiring the occupation type of the crowd based on the clustering result matched with a preset type constraint condition.
2. The method according to claim 1, wherein the acquiring of the base station data corresponding to each user based on the user signaling data specifically comprises:
acquiring the signaling data based on a user mobile phone operator;
acquiring base station sector information switched by the IMSI of a user to obtain the retention time of a mobile phone signaling and a base station list;
carrying out time cutting grouping on the mobile phone signaling residence time to obtain the signaling residence time;
and associating a preset reference value to obtain the longitude and latitude of the base station based on the base station list, wherein the reference value comprises a Lac value and a Ci value, the Lac value represents a network system of a user mobile phone, and the Ci value represents a base station number.
3. The method as claimed in claim 2, wherein the obtaining of the target difference of the longitude and latitude of the base station of each user's mobile phone within the natural day based on the different signaling dwell time includes:
extracting the mobile phone signaling of the user corresponding to the retention time of each group of signaling;
acquiring the longitude and latitude of the corresponding base station based on the mobile phone signaling;
and extracting the maximum base station longitude and latitude and the minimum base station longitude and latitude corresponding to the user in the natural day, and performing difference to obtain the target difference value.
4. The method for acquiring the occupation type of the crowd based on the mobile phone signaling according to claim 2, wherein the acquiring of the calling data and the called data of the mobile phone of the user in the preset time period, grouping based on the mobile phone number and the signaling identifier, and counting the number of calls and the number of positions specifically comprises:
obtaining the number of calls corresponding to the mobile phone number based on the calling data and the called data;
and obtaining the number of times of changing the position of the mobile phone signaling in the signaling residence time to obtain the position number.
5. The method according to claim 1, wherein the clustering the target difference, the number of calls, and the location number to obtain a clustering result specifically comprises:
performing data clustering based on a preset clustering mode, wherein the clustering mode comprises k-means clustering;
and clustering the three types of data including the target difference value, the call times and the position number to obtain the corresponding clustering result.
6. The method according to claim 5, wherein the acquiring of the occupation type of the crowd based on the mobile phone signaling based on the clustering result matching with a preset type constraint condition specifically comprises:
matching preset type constraint conditions based on the clustering result, wherein the type constraint conditions comprise three preset professional fields which are respectively a courier, a net appointment driver and a bus driver;
the crowd occupation type of the clustering result matched with the first constraint condition is the courier;
the crowd occupation type of the clustering result matched with a second constraint condition is the net car appointment driver;
and the occupation type of the crowd of which the clustering result is matched with the third constraint condition is the bus driver.
7. A crowd occupation type obtaining system based on mobile phone signaling is characterized by comprising a memory and a processor, wherein the memory comprises a crowd occupation type obtaining method program based on mobile phone signaling, and when being executed by the processor, the crowd occupation type obtaining method program based on mobile phone signaling realizes the following steps:
acquiring base station data corresponding to each user based on user signaling data, wherein the base station data comprises signaling dwell time and base station longitude and latitude;
acquiring a target difference value of the longitude and latitude of the base station of each user mobile phone in a natural day based on different signaling stay time lengths;
acquiring calling data and called data of a user mobile phone in a preset time period, grouping based on a mobile phone number and a signaling identifier, and counting the number of times of calls and the number of positions;
clustering the target difference, the call times and the position number to obtain a clustering result;
and acquiring the occupation type of the crowd based on the clustering result matched with a preset type constraint condition.
8. The system according to claim 7, wherein the acquiring of the base station data corresponding to each user based on the user signaling data specifically comprises:
acquiring the signaling data based on a user mobile phone operator;
acquiring base station sector information switched by the IMSI of a user to obtain the retention time of a mobile phone signaling and a base station list;
carrying out time cutting grouping on the mobile phone signaling residence time to obtain the signaling residence time;
and associating a preset reference value based on the base station detailed table to obtain the latitude and longitude of the base station, wherein the reference value comprises a Lac value and a Ci value, the Lac value represents a user mobile phone network type, and the Ci value represents a base station number.
9. The system of claim 8, wherein the system for acquiring the occupation type of the crowd based on the cell phone signaling obtains the target difference of the latitude and longitude of the base station of each user cell phone in the natural day based on the different lengths of the signaling stays includes:
extracting the mobile phone signaling of the user corresponding to the retention time of each group of signaling;
acquiring the longitude and latitude of the corresponding base station based on the mobile phone signaling;
and extracting the longitude and latitude of the maximum base station and the longitude and latitude of the minimum base station corresponding to the user in the natural day, and subtracting to obtain the target difference value.
10. A computer-readable storage medium, wherein the computer-readable storage medium includes a mobile signaling-based crowd occupation type acquisition method program, and when the mobile signaling-based crowd occupation type acquisition method program is executed by a processor, the steps of the mobile signaling-based crowd occupation type acquisition method according to any one of claims 1 to 6 are implemented.
CN202210895704.4A 2022-07-28 2022-07-28 Crowd occupation type obtaining method and system based on mobile phone signaling and storage medium Active CN115002680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210895704.4A CN115002680B (en) 2022-07-28 2022-07-28 Crowd occupation type obtaining method and system based on mobile phone signaling and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210895704.4A CN115002680B (en) 2022-07-28 2022-07-28 Crowd occupation type obtaining method and system based on mobile phone signaling and storage medium

Publications (2)

Publication Number Publication Date
CN115002680A true CN115002680A (en) 2022-09-02
CN115002680B CN115002680B (en) 2022-12-27

Family

ID=83021112

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210895704.4A Active CN115002680B (en) 2022-07-28 2022-07-28 Crowd occupation type obtaining method and system based on mobile phone signaling and storage medium

Country Status (1)

Country Link
CN (1) CN115002680B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105516928A (en) * 2016-01-15 2016-04-20 中国联合网络通信有限公司广东省分公司 Position recommending method and system based on position crowd characteristics
CN110245981A (en) * 2019-05-31 2019-09-17 南京瑞栖智能交通技术产业研究院有限公司 A kind of crowd's kind identification method based on mobile phone signaling data
CN110324787A (en) * 2019-06-06 2019-10-11 东南大学 A kind of duty residence acquisition methods of mobile phone signaling data
CN113613174A (en) * 2021-07-09 2021-11-05 中山大学 Method, device and storage medium for identifying occupational sites based on mobile phone signaling data
WO2022041262A1 (en) * 2020-08-31 2022-03-03 苏州大成电子科技有限公司 Big data-based method for calculating anchor point of urban rail transit user
CN114422973A (en) * 2022-03-30 2022-04-29 北京融信数联科技有限公司 Internet taxi appointment driver intelligent identification method and system based on big data and readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105516928A (en) * 2016-01-15 2016-04-20 中国联合网络通信有限公司广东省分公司 Position recommending method and system based on position crowd characteristics
CN110245981A (en) * 2019-05-31 2019-09-17 南京瑞栖智能交通技术产业研究院有限公司 A kind of crowd's kind identification method based on mobile phone signaling data
CN110324787A (en) * 2019-06-06 2019-10-11 东南大学 A kind of duty residence acquisition methods of mobile phone signaling data
WO2022041262A1 (en) * 2020-08-31 2022-03-03 苏州大成电子科技有限公司 Big data-based method for calculating anchor point of urban rail transit user
CN113613174A (en) * 2021-07-09 2021-11-05 中山大学 Method, device and storage medium for identifying occupational sites based on mobile phone signaling data
CN114422973A (en) * 2022-03-30 2022-04-29 北京融信数联科技有限公司 Internet taxi appointment driver intelligent identification method and system based on big data and readable storage medium

Also Published As

Publication number Publication date
CN115002680B (en) 2022-12-27

Similar Documents

Publication Publication Date Title
CN107133318B (en) Population identification method based on mobile phone signaling data
CN110334111B (en) Multidimensional track analysis method and device
CN105868373B (en) Method and device for processing key data of power business information system
CN111324642A (en) Model algorithm type selection and evaluation method for power grid big data analysis
US20030061213A1 (en) Method for building space-splitting decision tree
CN109670843A (en) Data processing method, device, computer equipment and the storage medium of complaint business
CN107153656A (en) A kind of information search method and device
CN109982361A (en) Signal interference analysis method, device, equipment and medium
CN109003266A (en) A method of based on fuzzy clustering statistical picture quality subjective evaluation result
CN112712112A (en) Regional floating population identification method, device, equipment and medium
CN107290714B (en) Positioning method based on multi-identification fingerprint positioning
CN111160404A (en) Method and device for analyzing reasonable value of line loss marking pole of power distribution network
CN109916350B (en) Method and device for generating three-coordinate measuring program and terminal equipment
CN110348717A (en) Base station value methods of marking and device based on grid granularity
CN111367956A (en) Data statistical method and device
CN115002680B (en) Crowd occupation type obtaining method and system based on mobile phone signaling and storage medium
CN116485020B (en) Supply chain risk identification early warning method, system and medium based on big data
Fischer Spatial analysis in geography
CN111475746B (en) Point-of-interest mining method, device, computer equipment and storage medium
CN109935277B (en) Abnormal motif query method based on meta-path in heterogeneous network
CN116468102A (en) Pruning method and device for cutter image classification model and computer equipment
CN115392351A (en) Risk user identification method and device, electronic equipment and storage medium
CN107801418B (en) Floating population estimation device and method using map search record
CN114565031A (en) Vehicle fleet identification method and device based on longitude and latitude and computer equipment
CN108445443A (en) A kind of fingerprint point clustering method based on KNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant