CN112637781B - User traffic mode distinguishing method based on base station track - Google Patents

User traffic mode distinguishing method based on base station track Download PDF

Info

Publication number
CN112637781B
CN112637781B CN202011318016.9A CN202011318016A CN112637781B CN 112637781 B CN112637781 B CN 112637781B CN 202011318016 A CN202011318016 A CN 202011318016A CN 112637781 B CN112637781 B CN 112637781B
Authority
CN
China
Prior art keywords
base station
time
user
base stations
diff
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011318016.9A
Other languages
Chinese (zh)
Other versions
CN112637781A (en
Inventor
顾钊铨
王乐
陈小龙
汤蕓嶷
王新刚
方滨兴
贾焰
韩伟红
李树栋
仇晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Guangzhou University
Original Assignee
National University of Defense Technology
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology, Guangzhou University filed Critical National University of Defense Technology
Priority to CN202011318016.9A priority Critical patent/CN112637781B/en
Publication of CN112637781A publication Critical patent/CN112637781A/en
Application granted granted Critical
Publication of CN112637781B publication Critical patent/CN112637781B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/025Services making use of location information using location based information parameters
    • H04W4/027Services making use of location information using location based information parameters using movement velocity, acceleration information

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention relates to a user traffic mode judging method based on a base station track, which comprises the following steps: acquiring track information of a user passing base station to be distinguished; extracting characteristics according to the track information, wherein the characteristics comprise time consumption of passing through a base station, time interval range of entering and exiting the base station, number of people passing through the base station, base station oscillation of users based on sliding windows, base station clustering characteristics, statistical characteristics in continuous time of the base station, base station density and statistical characteristics of a passing mode of the base station; and inputting the extracted features into a pre-trained LightGBM model, and outputting the traffic mode of the user to be distinguished by the LightGBM model. The invention extracts a large number of new important features from a small number of basic information of the base station, adopts the LightGBM algorithm in the integrated learning, can accurately learn the motion rule of the user in the base station from a large number of track data according to the extracted features, and accurately and efficiently judges and judges the motion mode of the user.

Description

User traffic mode distinguishing method based on base station track
Technical Field
The invention relates to the technical field of base station big data processing, in particular to a method for discriminating a user passing mode based on a base station track.
Background
With the popularization of mobile phone terminals, new technologies such as mobile phone signaling and the like are adopted to collect travel information of users, and are increasingly paid attention to. In 2018, the number of mobile phone users in China reaches 15.7 hundred million, and a huge mobile phone user group provides a large amount of data sources for acquiring travel data. In addition, the travel information acquisition technology based on the mobile phone signaling has the advantages of low cost, wide coverage range and the like. Therefore, the mobile phone signaling data can be used as an important supplement of the existing traffic data acquisition technology, so that the mobile phone signaling data can effectively help to identify the movement mode of the user in trip, and further provides good technical support for urban traffic flow prediction, resident trip behavior analysis, mining of resident trip space-time distribution and the like.
The current mobile phone signaling data mainly comprises GPS, base station information and the like. The method based on the GPS signaling data can accurately obtain the position, the passing speed and the like of the user through the GPS data of the user in real time, and the movement mode of the user can be determined through the information serving as movement characteristics; and compared with GPS data, the base station signaling information has the characteristics of easier acquisition and no dependence on authority setting of users. There are generally two strategies for determining the traffic pattern of a user based on base station information, namely a rule-based method and a machine learning model-based method. The rule-based method uses travel characteristics such as time, distance, average speed, highest speed and the like of one travel of a user as judgment basis and is used for determining a movement mode of a certain travel. The method based on the machine learning model automatically learns the characteristics of the track data through a large amount of track data marked with the passing mode, and is used for assisting in judging the motion mode of the unknown track.
Although the existing method can judge the travel mode of the user to a certain extent, different defects exist. For example, the acquisition of GPS data is required to be carried out by equipment with GPS, thus causing the problems of high cost, large equipment investment and the like, the GPS also depends on whether a user opens the GPS authority, and meanwhile, the GPS data also causes a plurality of problems of data loss and the like due to the fact that the GPS data encounters urban canyons, shields and the like, so that the GPS data is not suitable for some scenes.
Compared with GPS data, the acquisition of the track information of the base station is more convenient and has low cost, the influence of the user permission setting is low, and the leakage of the user privacy can be reduced to a greater extent, so that the track information of the user is effectively supplemented. However, the track data of the base station directly contains less information, including longitude and latitude of the base station, time when a user enters the base station, time when the user leaves the base station, signal intensity, mobile phone identification and the like, so that the information which can be utilized by the rule-based method is relatively less, no information such as distance and speed of travel exists, and the distance and speed obtained by the longitude and latitude and the multi-point positioning method have larger errors, so that the conventional rule-based method cannot be used as a direct judgment basis for judging the movement mode of the user passing through the base station. The method based on the machine learning model only uses the small amount of track features and cannot sufficiently learn the motion mode of the base station track, more and more effective rule features are needed to be used as learning parameters of the model, but the existing machine learning method uses data which is formed by combining various information such as GPS, a mobile phone sensor, base station information, GIS, wifi and the like as training data, for example, the accuracy of identifying the combination of the GPS and the data features of the acceleration sensor reaches 93 percent by using [6] such as Troped and the like. There is therefore a need for corresponding improvements to methods based on rules and machine learning models.
Disclosure of Invention
Based on this, it is necessary to provide a method for discriminating a user traffic pattern based on a trajectory of a base station, in order to solve the problem that the usage information is relatively small.
A user traffic mode discriminating method based on a base station track comprises the following steps: reading data: acquiring track information of a user passing base station to be distinguished; feature extraction: extracting characteristics according to the track information, wherein the characteristics comprise time consumption of passing through a base station, time interval range of entering and exiting the base station, number of people passing through the base station, base station oscillation of users based on sliding windows, base station clustering characteristics, statistical characteristics in continuous time of the base station, base station density and statistical characteristics of a passing mode of the base station; judging a passing mode; and inputting the extracted features into a pre-trained LightGBM model, and outputting the traffic mode of the user to be distinguished by the LightGBM model.
Compared with the prior art, the invention has the beneficial effects that:
compared with the prior art, the invention can more easily obtain the track data of the user and furthest reduce the exposure of the position privacy information of the user by utilizing the track information of the user, simultaneously excavate a large number of new important features from a small amount of basic information of the base station, including the time length of the base station, the time interval of entering and exiting the base station, the number of people passing through the base station, the base station oscillation, the statistics features in the base station clustering, the base station continuous time, the base station density, the base station movement mode proportion and the like, can effectively judge the movement mode (passing mode) of the user, and can quickly and accurately identify the movement mode of the user passing through the base station by fully utilizing a large amount of track data.
Drawings
Fig. 1 is a schematic flowchart of a user traffic pattern discriminating method based on a base station track in the present embodiment.
Fig. 2 (a) is a first schematic diagram of the extraction process of "base station oscillation" based on the sliding window in the present embodiment.
Fig. 2 (b) is a second schematic diagram of the extraction process of "base station oscillation" based on the sliding window in the present embodiment.
Fig. 2 (c) is a third schematic diagram of the extraction process of "base station oscillation" based on the sliding window in the present embodiment.
Fig. 2 (d) is a fourth schematic diagram of the extraction process of "base station oscillation" based on the sliding window of the present embodiment.
Fig. 3 is a schematic flow chart of model training of the present embodiment.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a method for discriminating a user traffic mode based on a base station track includes:
reading data: acquiring track information of a user passing base station to be distinguished; the track information includes: base station number, base station longitude, base station latitude, base station entry time, base station exit time, and user equipment identification;
wherein the base station number is represented by a number; base station longitude is denoted longitude; the latitude of the base station is represented by a latitude; the base station entry time is denoted by start; the base station departure time is denoted by end; the user equipment identity is denoted device.
Feature extraction: extracting characteristics according to the track information, wherein the characteristics comprise time consumption of passing through a base station, time interval range of entering and exiting the base station, number of people passing through the base station, base station oscillation of users based on sliding windows, base station clustering characteristics, statistical characteristics in continuous time of the base station, base station density and statistical characteristics of a passing mode of the base station; the number of people features passing through the base station and the base station clustering features are the number of people features passing through the base station and the base station clustering features in a period of time containing the user to be distinguished.
Judging a passing mode; and inputting the extracted features into a pre-trained LightGBM model, and outputting the traffic mode of the user to be distinguished by the LightGBM model.
In this embodiment, the time consumed for passing through the base station refers to the time period from entering the base station to leaving the base station. In general, the time for riding through the base station is short, and the time for riding and walking through is long, so that the method can be used as an important judgment basis. However, the base station is different according to the area environment and the power, and the range is also different correspondingly, for example, the range of the base station in a city is generally about 100 m-500 m due to the problems of building shielding, more user quantity and the like, and the range of the base station in a wider country, seaside and the like can be 1 km-10 km, so that the transit time difference of the same movement mode in different base stations is larger, and the base station cannot be used as a decisive judgment basis, but can be used as one of important characteristics in a model. The time elapsed through the base station is expressed as follows:
diff=start-end
the method for judging the probability of each movement mode according to diff comprises the following steps:
if diff is more than or equal to alpha, the residence probability of the user is maximum; if diff epsilon [ beta, alpha) →walking probability is maximum; if diff epsilon (gamma, beta) to the maximum probability of riding; if diff is less than or equal to gamma, the probability of riding is maximum; wherein α represents a residence time difference threshold, i.e., a time difference greater than α is determined to be residence; beta represents the lower limit of the walking time difference threshold, alpha is the upper limit of the walking time difference threshold, namely, if the time difference is [ beta, alpha), the walking is judged; (gamma, beta) represents the range of time differences between entering and exiting the base station when riding the bicycle; gamma denotes a time difference threshold value when riding a car. Start is the time to enter the base station and end is the time to leave the base station.
In this embodiment, the "base station oscillation" based on the sliding window refers to the situation that the user to be discriminated moves back and forth between the base stations that are continuous in a certain time, and can help to effectively distinguish several movement modes. For example, when a user is at the junction of signal receiving ranges of two base stations, the user can frequently enter and exit two different base stations, so that one-time stay time of one base station is very short, the situation of misjudging a movement mode can be caused, and an erroneous recognition result is caused; or the user "oscillates" back and forth between a plurality of fixed adjacent base stations, the ride typically does not move back and forth between a few adjacent base stations for a short period of time, so the user is more likely to be on foot. This feature is therefore an important feature in this case to effectively distinguish walking from riding and cycling. For continuous base station tracks of users in a period of time, the key point is to mine oscillation conditions between adjacent base stations, so that a sliding window method of a base station track sequence is mainly adopted in the invention, and the extraction method of base station oscillation based on a sliding window of the users is as follows:
s11, if the one-time residence time of the user in the base station is more than alpha, the oscillation mark of the base station is 0, and the condition that the oscillation is not generated is indicated;
s12, if the one-time residence time of the user in the base station is smaller than alpha, adopting a sliding window with the size of n, taking n adjacent base stations taking the base station as a starting base station, wherein the n adjacent base stations are used for sequencing the base stations according to the time of entering the user, if the residence time of the n base stations is smaller than gamma, the time difference between the base stations is smaller than epsilon, and the base stations do not move back and forth between the n base stations, the oscillation mark of the base station in the window is 0, and no oscillation is indicated;
s13, if the one-time residence time of the user in the base station is smaller than alpha, adopting a sliding window with the size of n as a starting base station, taking the base station as n adjacent base stations of the starting base station, if the residence time of the n base stations is smaller than gamma, the time difference between the base stations is smaller than epsilon, and the base stations move back and forth among the n base stations, the oscillation mark of the base station in the window is 1, and the oscillation is shown;
s14, if the time difference between the base stations is larger than epsilon, cutting off the n base stations from two base stations larger than epsilon, marking the oscillation of the base station before cutting off as 0, and repeating the steps S11-S13 from the next base station after cutting off; if the residence time of the user in a certain base station in the n base stations is greater than gamma, the oscillation marks of the base station and the preceding base station in the n base stations are 0, and then the steps S11-S13 are repeated from the next base station in the base station.
The judgment of the back and forth movement in the steps S12 and S13 is mainly based on the number of identical base stations in the window, if the window size is n, the number of repeated base stations in n base station tracks is m, and if m/n is greater than 1/2, the user is considered to move back and forth between the base stations. The window size in the present invention is generally set to 3 or 4 according to circumstances. The following illustrates the process of sliding a window in the form of a legend:
referring to fig. 2 (a), the base station track 1 is smaller than γ, the sliding window size is taken to be 3, the tracks of the base stations No. 1, no. 2 and No. 3 are smaller than γ and do not move back and forth, and no oscillation is determined;
referring to fig. 2 (b), the base station track 4 is larger than γ, the track 4 oscillates and marks 0, the motion track is segmented by the track 4, the track 5 is smaller than γ, and the judgment is performed according to the sliding window size 3.
Referring to fig. 2 (c), the base station track 6 is greater than γ or the time interval between tracks 6, 7 is greater than a threshold, the tracks are segmented by track 6, tracks 5, 6 are oscillated and marked as 0, track 7 is less than γ, and the judgment is made according to the sliding window size 3.
(d) The base station tracks 7, 8, 9 all have a time difference less than γ and have no back and forth movement, the oscillation mark is 0, and the last base station tracks 10, 11, 10 all have a time difference less than γ but have back and forth movement between base stations, the oscillation mark is 1.
In this embodiment, the time interval range feature of the base station entering and exiting refers to the interval range to which the user enters and exits the time interval of the base station, such as a certain time interval between eight to nine am, between two am and two am or at night, because these features include information of early peak, late peak or small flow of people, it can help distinguish the situation that the riding is confused with walking or riding in some situations. The statistics of the number of people passing through the base station mainly comprises two parts, namely the average number of people per day in a period of time and the total number of people in each time segment in the day. The clustering characteristic of the base station is mainly to perform clustering analysis on the geographic position according to the longitude and latitude of the base station, and the purpose is to find each base station cluster closest in distance, and the centroid position of one cluster is not needed to be obtained, so that the clustering algorithm adopted in the invention is DBSCAN. The statistical characteristic of the base station in the continuous time in the invention means that in one base station motion track of a user, the motion mode in the front and back continuous time changes, for example, the motion mode of the user in a plurality of base stations in front of a certain base station is ridden, and the motion mode in a plurality of base stations in the back is ridden, so that the riding possibility of the user in the base station is higher.
In this embodiment, the base station density is defined as how many base stations are distributed over a range. The density of the base station can reflect the geographical features of the area where the base station is located to a certain extent, if the density of the base station is high, the area where the base station is located may be a commercial area of a city or an area where the flow of people is large, such as a station, and the density is low, and the area where the flow of people is relatively fixed, such as a rural road or a village, is small. The invention defines the distance range as w, the number of base stations in the range as n, and the statistical radius as the number of base stations in the range of w, so the formula of the density of the base stations is as follows: ρ=w/n; wherein w is the distance range, n is the number of base stations in the range, and the statistical radius is the number of base stations in the range w. The statistical characteristics of the base station self-traffic mode refer to the proportion of all users passing through the base station in different motion modes.
In this embodiment, the feature extraction further includes: if diff is more than or equal to alpha, the diff characteristic is assigned to be 1; if diff epsilon [ beta, alpha), the diff feature is assigned a value of 2; if diff epsilon (gamma, beta), the diff feature is assigned 3; if diff is less than or equal to the gamma characteristic, the value is 4; thus, the basic features include [ longitude, latitude, start, end, diff ];
judging whether the user 'oscillates' in the base station according to whether the user to be judged moves back and forth in the sliding window, wherein the characteristic is expressed by the vibration, if the oscillation exists, the characteristic value is 1, and otherwise, the characteristic value is 2; thus the first feature set includes [ longitude, latitude, start, end, diff, isolation ];
the time interval range of the entering and exiting base station is characterized in that the time stamp of the entering and exiting base station is converted into a format of 'xxxx-xx-xx xx xx xx', the 'xx: xx' is taken as the initial time and converted into a second difference, the characteristic value of the second difference is 1 in the early peak interval, the characteristic value of the second difference is 2 in the noon peak time interval, the characteristic value of the second difference is 3 in the late peak time interval, and the characteristic value of the second difference is 4 in the non-peak time interval, so that the second characteristic set comprises [ long time, effort, start, end, diff, division, start, ends ] and the values of start and ends are any one of 1, 2, 3 and 4;
the characteristic value of the number of people is divided into the total number of people in one day and the total number of people in a certain time slice; the method comprises the steps that the characteristics of the total number of people in one day are expressed as day_total_number, and the average number of people in each day is obtained by dividing the number of users passing through a base station in a first preset time by the number of days; counting the number of people in the time slices, and dividing the total number of people passing through each time slice in the second preset time by the number of days to obtain the number of people in each time slice; the number of people in the time slices is represented as [ cut1, cut2, … cut n ], the size of n is the number of time slices obtained according to the dividing granularity of the time of day, and the corresponding number of people is the value of the feature; the third feature set is [ longitude, latitude, start, end, diff, isolation, start, ends, day_total_number, cut1, cut2, … cut ], if there is an outlier, list in the feature set alone; the time slice dividing method comprises the following steps:
s21, taking a time stamp of 0 point of 1 month and 1 day in 1970 as a timing starting point time, dividing the in-out time of the base station track to be segmented into two parts according to 'xxxx-xx-xx' and 'xx: xx', for example '2020-05-08-12:32:09' into '2020-05-08' and '12:32:09';
s22, converting the xxxx-xx-xx into corresponding time stamps by taking preset starting time as a reference, and converting xx: xx: xx into seconds by taking 00:00 as the starting time;
s23, dividing the time of day into time periods in seconds according to the time granularity of the slices, such as 2 hours; such as [0s,7200s ], [7200s,14400s ];
s24, comparing the seconds of the part 'xx: xx: xx' corresponding to the entering and exiting base station obtained in the step S22 with each time period of one day obtained by dividing in the step S23, and finding out a time period corresponding to the entering base station and a time period corresponding to the exiting base station;
s25, according to the time slicing of the entering and exiting base station time obtained in the step S24, the time stamp of the section of xxxxxx-xx-xx is added according to the time granularity parameter increment to obtain the time segment division of a certain base station track, all the base station tracks are divided according to the method, so that the base station track time can be unified to the same standard time period;
s26, counting the number of people in each time slice of the base station according to the unified base station track time period obtained in the step S25.
Obtaining a base station cluster according to DBSCAN, and obtaining base station cluster characteristics by counting the proportion of users with different movement modes in a third preset time of all base stations except the current base station in the cluster, wherein the base station cluster characteristics are represented by a cluster, and characteristic values are the proportions of the movement modes obtained by statistics, so that the base station cluster characteristics are represented by 4 sub-characteristics as [ cluster1, cluster2, cluster3, cluster4] which respectively correspond to four movement modes of parking, walking, riding and riding, and a fourth characteristic set comprises [ cluster, location, start, end, diff, start, ends, day_total_number, cut1, cut2, …, cut n, maintenance, cluster1, cluster2, cluster3 and cluster4];
counting the change condition of the motion mode of a user to be judged in a plurality of continuous base stations in a sliding window mode, segmenting the track by the resident base stations, and obtaining the 'base station oscillation' characteristic without the resident base stations in the sliding window; the characteristic of base station oscillation is expressed as sequence, the characteristic value is 1 if the front and rear base stations walk, the characteristic value is 2 if the front and rear base stations ride the bicycle, the characteristic value is 3 if the front and rear base stations ride the bicycle, and the characteristic value is 0 if the front and rear movement modes change; thus, the fifth feature set includes [ longitude, latitude, start, end, diff, start, ends, day_total_number, cut1, cut2, …, cut n, visualization, cluster1, cluster2, cluster3, cluster4, sequence ];
calculating the number of base stations in a preset range by taking a current base station as a center to obtain the density of the base stations in the area where the current base station is located, wherein the density characteristics of the base stations are represented by the density, and the characteristic values are the number of the base stations; thus, the sixth feature set is [ longitude, latitude, start, end, diff, start, ends, day_total_number, cut1, cut2, …, cut n, alignment, cluster1, cluster2, cluster3, cluster4, sequence, density ];
counting the total number of different passing movement modes of all users of each base station in the data set at a first preset time, dividing the number of each movement mode by the total number to obtain the proportion of each movement mode, and obtaining the self passing mode statistical characteristics of the base station, wherein the self passing mode statistical characteristics of the base station are represented by a pass_percentage; the ratio can be used as the value of the characteristic bypass_percentage; thus the total feature set is [ longitude, latitude, start, end, diff, start, ends, day_total_number, cut1, cut2, …, cut n, maintenance, cluster1, cluster2, cluster3, cluster4, sequence, density, pass_percentage ]. And finally, inputting the extracted total feature set into a pre-trained LightGBM model.
In this embodiment, the base station track motion mode discrimination aimed at by the invention is essentially a multi-classification problem, so that the invention adopts the LightGBM in the integrated learning algorithm as a model, the LightGBM takes a decision tree as a basis and is a weak classifier, which can be well used for the multi-classification problem, and supports direct input of class characteristics and parallel learning, including characteristic parallel and data parallel, and compared with a common single machine learning model, the integrated learning can integrate the results of a plurality of weak classifiers to obtain classification prediction with higher accuracy. Referring to fig. 3, training the LightGBM model includes: reading a data set from a data file or a database, wherein the data set comprises track information of a user passing through a base station; extracting characteristics according to the user track information, wherein the characteristics comprise time consumption of passing through a base station, time interval range of entering and exiting the base station, number of people passing through the base station, base station oscillation of users based on sliding windows, base station clustering characteristics, statistical characteristics in continuous time of the base station, base station density and statistical characteristics of a passing mode of the base station; dividing the extracted features into a training set and a testing set; the dividing ratio of the training set and the testing set is 8:2; inputting a training set into a LightGBM model for 10-fold cross training, automatically performing model parameter adjustment by using a gridsearch in the training process, and searching for optimal model parameters; and verifying the model obtained by training by using a test set, and if the accuracy is more than alpha and the area under the curve AUC of the ROC is more than beta, using the model for identifying the scene by the base station track motion mode.
In this embodiment, due to a change in the traveling habit of the user or a change in the surrounding environment of the base station, the trained model for a long period of time may not be able to well determine the movement mode of the new base station track, so the method for determining the user traffic mode based on the base station track further includes: and counting the accuracy of the discrimination result of the LightGBM model, and retraining the LightGBM model if the accuracy is lower than a threshold epsilon.
It should be noted that: the step of extracting features according to the user trajectory information in training the LightGBM model is the same as the feature extraction described above.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (6)

1. The method for discriminating the user passing mode based on the base station track is characterized by comprising the following steps:
reading data: acquiring track information of a user passing base station to be distinguished;
the track information includes: base station number, base station longitude, base station latitude, base station entry time, base station exit time, and user equipment identification;
wherein the base station number is represented by a number; base station longitude is denoted longitude; the latitude of the base station is represented by a latitude; the base station entry time is denoted by start; the base station departure time is denoted by end; the user equipment identification is represented by device;
feature extraction: extracting characteristics according to the track information, wherein the characteristics comprise time consumption of passing through a base station, time interval range of entering and exiting the base station, number of people passing through the base station, base station oscillation of users based on sliding windows, base station clustering characteristics, statistical characteristics in continuous time of the base station, base station density and statistical characteristics of a passing mode of the base station;
the time elapsed through the base station is expressed as follows:
diff=start-end
the method for judging the probability of each movement mode according to diff comprises the following steps:
if diff is more than or equal to alpha, the residence probability of the user is maximum; if diff epsilon [ beta, alpha) →walking probability is maximum; if diff epsilon (gamma, beta) to the maximum probability of riding; if diff is less than or equal to gamma, the probability of riding is maximum;
wherein Start is the time of entering the base station and end is the time of leaving the base station; alpha represents a residence time difference threshold, i.e. if the time difference is greater than alpha, the residence is judged; beta represents the lower limit of the walking time difference threshold, alpha is the upper limit of the walking time difference threshold, namely, if the time difference is [ beta, alpha), the walking is judged; (gamma, beta) represents the range of time differences between entering and exiting the base station when riding the bicycle; gamma represents a time difference threshold value when riding a car;
the extraction method of the base station oscillation based on the sliding window comprises the following steps:
s11, if the one-time residence time of the user in the base station is more than alpha, the oscillation mark of the base station is 0, and the condition that the oscillation is not generated is indicated;
s12, if the one-time residence time of the user in the base station is smaller than alpha, adopting a sliding window with the size of n, taking n adjacent base stations taking the base station as a starting base station, wherein the n adjacent base stations are used for sequencing the base stations according to the time of entering the user, if the residence time of the n base stations is smaller than gamma, the time difference between the base stations is smaller than epsilon, and the base stations do not move back and forth between the n base stations, the oscillation mark of the base station in the window is 0, and no oscillation is indicated;
s13, if the one-time residence time of the user in the base station is smaller than alpha, adopting a sliding window with the size of n as a starting base station, taking the base station as n adjacent base stations of the starting base station, if the residence time of the n base stations is smaller than gamma, the time difference between the base stations is smaller than epsilon, and the base stations move back and forth among the n base stations, the oscillation mark of the base station in the window is 1, and the oscillation is shown;
s14, if the time difference between the base stations is larger than epsilon, cutting off the n base stations from two base stations larger than epsilon, marking the oscillation of the base station before cutting off as 0, and repeating the steps S11-S13 from the next base station after cutting off; if the residence time of the user in a certain base station in the n base stations is greater than gamma, the oscillation marks of the base station and the preceding base stations in the n base stations are 0, and then repeating the steps S11-S13 from the next base station of the base station;
judging a passing mode; inputting the extracted features into a pre-trained LightGBM model, and outputting a passing mode of a user to be distinguished by the LightGBM model;
training the LightGBM model includes:
reading a data set from a data file or a database, wherein the data set comprises track information of a user passing through a base station;
extracting characteristics according to the user track information, wherein the characteristics comprise time consumption of passing through a base station, time interval range of entering and exiting the base station, number of people passing through the base station, base station oscillation of users based on sliding windows, base station clustering characteristics, statistical characteristics in continuous time of the base station, base station density and statistical characteristics of a passing mode of the base station;
dividing the extracted features into a training set and a testing set;
inputting a training set into a LightGBM model for training, automatically performing model parameter adjustment by using a gridsearch in the training process, and searching for optimal model parameters;
and verifying the model obtained by training by using a test set, and if the accuracy is more than alpha and the area under the curve AUC of the ROC is more than beta, using the model for identifying the scene by the base station track motion mode.
2. The method for discriminating a user traffic pattern based on a trajectory of a base station according to claim 1 wherein the formula of the base station density is: ρ=w/n
Wherein w is the distance range, n is the number of base stations in the range, and the statistical radius is the number of base stations in the range w.
3. The base station trajectory-based user traffic pattern discrimination method according to claim 2, wherein the feature extraction further includes:
if diff is more than or equal to alpha, the diff characteristic is assigned to be 1; if diff epsilon [ beta, alpha), the diff feature is assigned a value of 2; if diff epsilon (gamma, beta), the diff feature is assigned 3; if diff is less than or equal to the gamma characteristic, the value is 4; thus, the basic features include [ longitude, latitude, start, end, diff ];
judging whether the user 'oscillates' in the base station according to whether the user to be judged moves back and forth in the sliding window, wherein the characteristic is expressed by the vibration, if the oscillation exists, the characteristic value is 1, and otherwise, the characteristic value is 2; thus the first feature set includes [ longitude, latitude, start, end, diff, isolation ];
the time interval range of the entering and exiting base station is characterized in that the time stamp of the entering and exiting base station is converted into a format of 'xxxx-xx-xx xx xx xx', the 'xx: xx' is taken as the initial time and converted into a second difference, the characteristic value of the second difference is 1 in the early peak interval, the characteristic value of the second difference is 2 in the noon peak time interval, the characteristic value of the second difference is 3 in the late peak time interval, and the characteristic value of the second difference is 4 in the non-peak time interval, so that the second characteristic set comprises [ long time, effort, start, end, diff, division, start, ends ] and the values of start and ends are any one of 1, 2, 3 and 4;
the characteristic value of the number of people is divided into the total number of people in one day and the total number of people in a certain time slice; the method comprises the steps that the characteristics of the total number of people in one day are expressed as day_total_number, and the average number of people in each day is obtained by dividing the number of users passing through a base station in a first preset time by the number of days; counting the number of people in the time slices, and dividing the total number of people passing through each time slice in the second preset time by the number of days to obtain the number of people in each time slice; the number of people in the time slices is represented as [ cut1, cut2, … cut n ], the size of n is the number of time slices obtained according to the dividing granularity of the time of day, and the corresponding number of people is the value of the feature; the third feature set is at this time [ longitude, latitude, start, end, diff, isolation, start, ends, day_total_number, cut1, cut2, … cut ];
obtaining a base station cluster according to DBSCAN, and obtaining base station cluster characteristics by counting the proportion of users with different movement modes in a third preset time of all base stations except the current base station in the cluster, wherein the base station cluster characteristics are represented by a cluster, and characteristic values are the proportions of the movement modes obtained by statistics, so that the base station cluster characteristics are represented by 4 sub-characteristics as [ cluster1, cluster2, cluster3, cluster4] which respectively correspond to four movement modes of parking, walking, riding and riding, and a fourth characteristic set comprises [ cluster, location, start, end, diff, start, ends, day_total_number, cut1, cut2, …, cut n, maintenance, cluster1, cluster2, cluster3 and cluster4];
counting the change condition of the motion mode of a user to be judged in a plurality of continuous base stations in a sliding window mode, segmenting the track by the resident base stations, and obtaining the 'base station oscillation' characteristic without the resident base stations in the sliding window; the characteristic of base station oscillation is expressed as sequence, the characteristic value is 1 if the front and rear base stations walk, the characteristic value is 2 if the front and rear base stations ride the bicycle, the characteristic value is 3 if the front and rear base stations ride the bicycle, and the characteristic value is 0 if the front and rear movement modes change; thus, the fifth feature set includes [ longitude, latitude, start, end, diff, start, ends, day_total_number, cut1, cut2, …, cut n, visualization, cluster1, cluster2, cluster3, cluster4, sequence ];
calculating the number of base stations in a preset range by taking a current base station as a center to obtain the density of the base stations in the area where the current base station is located, wherein the density characteristics of the base stations are represented by the density, and the characteristic values are the number of the base stations; thus, the sixth feature set is [ longitude, latitude, start, end, diff, start, ends, day_total_number, cut1, cut2, …, cut n, alignment, cluster1, cluster2, cluster3, cluster4, sequence, density ];
counting the total number of different passing movement modes of all users of each base station in the data set at a first preset time, dividing the number of each movement mode by the total number to obtain the proportion of each movement mode, and obtaining the self passing mode statistical characteristics of the base station, wherein the self passing mode statistical characteristics of the base station are represented by a pass_percentage; the ratio can be used as the value of the characteristic bypass_percentage; thus the total feature set is [ longitude, latitude, start, end, diff, start, ends, day_total_number, cut1, cut2, …, cut n, maintenance, cluster1, cluster2, cluster3, cluster4, sequence, density, pass_percentage ].
4. The base station trajectory-based user traffic pattern discrimination method according to claim 3, wherein a time slice division method is as follows:
s21, dividing the in-out time of the base station track to be segmented into two parts according to the 'xxxx-xx-xx' and the 'xx: xx: xx',
s22, converting the xxxx-xx-xx into corresponding time stamps by taking preset starting time as a reference, and converting xx: xx: xx into seconds by taking 00:00 as the starting time;
s23, dividing the time of day into time periods in seconds according to the time granularity of the slices;
s24, comparing the seconds of the part 'xx: xx: xx' corresponding to the entering and exiting base station obtained in the step S22 with each time period of one day obtained by dividing in the step S23, and finding out a time period corresponding to the entering base station and a time period corresponding to the exiting base station;
s25, according to the time slicing of the entering and exiting base station time obtained in the step S24, the time stamp of the section of xxxxxx-xx-xx is added according to the time granularity parameter increment to obtain the time segment division of a certain base station track, all the base station tracks are divided according to the method, so that the base station track time can be unified to the same standard time period;
s26, counting the number of people in each time slice of the base station according to the unified base station track time period obtained in the step S25.
5. The method for discriminating a user traffic pattern based on a base station track according to claim 1 wherein the dividing ratio of the training set and the test set is 8:2.
6. The base station trajectory-based user traffic pattern discrimination method according to claim 1, further comprising: and counting the accuracy of the discrimination result of the LightGBM model, and retraining the LightGBM model if the accuracy is lower than a threshold epsilon.
CN202011318016.9A 2020-11-23 2020-11-23 User traffic mode distinguishing method based on base station track Active CN112637781B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011318016.9A CN112637781B (en) 2020-11-23 2020-11-23 User traffic mode distinguishing method based on base station track

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011318016.9A CN112637781B (en) 2020-11-23 2020-11-23 User traffic mode distinguishing method based on base station track

Publications (2)

Publication Number Publication Date
CN112637781A CN112637781A (en) 2021-04-09
CN112637781B true CN112637781B (en) 2023-10-03

Family

ID=75304039

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011318016.9A Active CN112637781B (en) 2020-11-23 2020-11-23 User traffic mode distinguishing method based on base station track

Country Status (1)

Country Link
CN (1) CN112637781B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117539874A (en) * 2023-10-11 2024-02-09 湖北泰跃卫星技术发展股份有限公司 Method and device for acquiring agronomic activities based on movement tracks
CN117332376B (en) * 2023-12-01 2024-02-27 北京航空航天大学 Method and system for identifying commuter and mode based on mobile phone signaling data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109272032A (en) * 2018-09-05 2019-01-25 广州视源电子科技股份有限公司 Trip mode recognition methods, device, computer equipment and storage medium
CN111104468A (en) * 2019-09-25 2020-05-05 西安交通大学 Method for deducing user activity based on semantic track
CN111681421A (en) * 2020-06-10 2020-09-18 南京瑞栖智能交通技术产业研究院有限公司 Mobile phone signaling data-based external passenger transport hub centralized-sparse space distribution analysis method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8407213B2 (en) * 2006-08-31 2013-03-26 Ektimisi Semiotics Holdings, Llc System and method for identifying a location of interest to be named by a user
US11141205B2 (en) * 2017-06-27 2021-10-12 Xtreme Orthopedics Llc Device for and method of treating acromioclavicular joint dislocations

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109272032A (en) * 2018-09-05 2019-01-25 广州视源电子科技股份有限公司 Trip mode recognition methods, device, computer equipment and storage medium
CN111104468A (en) * 2019-09-25 2020-05-05 西安交通大学 Method for deducing user activity based on semantic track
CN111681421A (en) * 2020-06-10 2020-09-18 南京瑞栖智能交通技术产业研究院有限公司 Mobile phone signaling data-based external passenger transport hub centralized-sparse space distribution analysis method

Also Published As

Publication number Publication date
CN112637781A (en) 2021-04-09

Similar Documents

Publication Publication Date Title
CN110047277B (en) Urban road traffic jam ranking method and system based on signaling data
CN106600960B (en) Travel origin-destination point identification method based on space-time clustering analysis algorithm
CN106530716A (en) Method for calculating highway section average speed based on mobile phone signaling data
CN112637781B (en) User traffic mode distinguishing method based on base station track
CN104217593B (en) A kind of method for obtaining road condition information in real time towards mobile phone travelling speed
CN105513351A (en) Traffic travel characteristic data extraction method based on big data
CN108848460B (en) Man-vehicle association method based on RFID and GPS data
WO2021082464A1 (en) Method and device for predicting destination of vehicle
CN111144452B (en) Mobile user trip chain extraction method based on signaling data and clustering algorithm
CN111653099A (en) Bus passenger flow OD obtaining method based on mobile phone signaling data
CN104504099A (en) Position-trajectory-based travel state splitting method
CN111768619A (en) Express way vehicle OD point determining method based on checkpoint data
CN108062857A (en) For the Forecasting Methodology of cab-getter's trip purpose
Zhu et al. Learning transportation annotated mobility profiles from GPS data for context-aware mobile services
CN110727714A (en) Resident travel feature extraction method integrating space-time clustering and support vector machine
US20230274633A1 (en) Method and system for extracting od locations of vehicle on expressway
Saremi et al. Combining map-based inference and crowd-sensing for detecting traffic regulators
Tempelmeier et al. Data4urbanmobility: Towards holistic data analytics for mobility applications in urban regions
CN111341135B (en) Mobile phone signaling data travel mode identification method based on interest points and navigation data
Wang et al. Detecting urban traffic congestion with single vehicle
Zong et al. Bus-Car Mode Identification: Traffic Condition–Based Random-Forests Method
Rodrigues et al. Impact of crowdsourced data quality on travel pattern estimation
Wang et al. Tracking hit-and-run vehicle with sparse video surveillance cameras and mobile taxicabs
Erdelić et al. Classification of travel modes using streaming GNSS data
CN102156909A (en) Method for identifying rail transit trip mode based on mobile phone signal data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220701

Address after: 510006 No. 230 West Ring Road, Panyu District University, Guangdong, Guangzhou

Applicant after: Guangzhou University

Applicant after: National University of Defense Technology

Address before: 510006 No. 230 West Ring Road, Panyu District University, Guangdong, Guangzhou

Applicant before: Guangzhou University

GR01 Patent grant
GR01 Patent grant