CN112104979B - User track extraction method based on WiFi scanning record - Google Patents
User track extraction method based on WiFi scanning record Download PDFInfo
- Publication number
- CN112104979B CN112104979B CN202010856149.5A CN202010856149A CN112104979B CN 112104979 B CN112104979 B CN 112104979B CN 202010856149 A CN202010856149 A CN 202010856149A CN 112104979 B CN112104979 B CN 112104979B
- Authority
- CN
- China
- Prior art keywords
- user
- location
- mac addresses
- fingerprints
- wifi
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 16
- 230000000694 effects Effects 0.000 claims abstract description 45
- 238000001914 filtration Methods 0.000 claims abstract description 20
- 238000000034 method Methods 0.000 claims abstract description 11
- 230000000875 corresponding effect Effects 0.000 claims abstract description 9
- 238000013507 mapping Methods 0.000 claims abstract description 7
- 239000012634 fragment Substances 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 12
- 239000013598 vector Substances 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000002354 daily effect Effects 0.000 claims description 6
- 230000003203 everyday effect Effects 0.000 claims description 2
- 238000012163 sequencing technique Methods 0.000 claims description 2
- 239000000284 extract Substances 0.000 abstract description 2
- 238000005065 mining Methods 0.000 abstract description 2
- 230000006399 behavior Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 235000012054 meals Nutrition 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013439 planning Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/02—Services making use of location information
- H04W4/029—Location-based management or tracking services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a user track extraction method based on WiFi scanning record, which comprises the following steps: filtering data which is not enough to depict the movement behavior of the user by taking a day as a unit; merging data according to the intensive situation of the data in time; filtering noise data in the WiFi scanning record by calculating the frequency of each MAC address combination in the WiFi scanning record to obtain a plurality of position fingerprints consisting of MAC addresses as access places; clustering according to the position fingerprint to obtain an activity place from the access places; mapping the WiFi scanning records of the user at each moment to corresponding activity places according to the similarity of the position fingerprints and generating track fragments; filtering the short staying track segments; and generating a user track containing semantic information. The method can extract the user track containing the semantic information, can effectively meet the requirements of mining the user movement mode, the user work and rest rules and the like, and meanwhile, the activity places extracted based on the method are all logic places, so that the privacy safety of the user is effectively protected.
Description
Technical Field
The invention relates to a user track extraction method, in particular to a user track extraction method based on WiFi scanning record.
Background
The rapid development of intelligent terminals and positioning technologies greatly promotes the popularization of location-based service application, nowadays, users are the core foundation of providing services for many enterprises, user behaviors can be described by analyzing the position changes of the users, great significance is brought to the aspects of optimizing a user recommendation system, improving the service quality of the enterprises, assisting smart city layout and the like, close association is brought to the daily behaviors of the users by considering that the daily movement tracks of the users contain information of the users on time and space, and the study on the user tracks is always concerned by students.
With continuous progress of science and technology, demands of users for location-based services are more and more diversified, but most of current research is based on GPS data mining user tracks.
Disclosure of Invention
The invention aims to provide a user track extraction method based on WiFi scanning records, which is in line with the trend that WiFi is widely deployed in smart cities, and extracts user information from a large amount of WiFi data to describe user behaviors. The method can help enterprises to discover potential requirements of users in time and provide personalized information services so as to improve user experience, and meanwhile, the method can also face city management requirements and provide support for layout planning and public resource allocation of smart cities.
The purpose of the invention is realized by the following technical scheme: a user track extraction method based on WiFi scanning record comprises the following steps:
step S10, filtering data which is not enough to depict the user movement behavior by taking the day as a unit;
step S20, merging data according to the intensive situation of the data in time;
step S30, filtering noise data in WiFi scanning records by calculating the frequency of each MAC address combination in the WiFi scanning records, and obtaining a plurality of position fingerprints composed of MAC addresses as access points;
step S40, further clustering to obtain an activity place from the access places according to the position fingerprint;
step S50, mapping the WiFi scanning records of each moment of the user to corresponding activity places according to the similarity of the position fingerprints and generating track fragments;
step S60, filtering the track segment staying for a short time;
step S70, semanticizing the activity place and generating a user track containing semantic information.
Further, the step S10 further includes:
step S11, mapping the data of the user every day to the corresponding 24 hours;
step S12, filtering the days with the number of hours recorded on the day less than a certain threshold value;
and step S13, filtering the days when the data volume of the current day of the user is less than a certain threshold value.
Further, the step S20 further includes:
step S21, in order to avoid the influence of the excessively dense data in a certain period of time on the subsequent frequency calculation, a proper time threshold value is determined according to the requirement;
step S22, merging the data according to the time threshold, that is, if the time recording intervals of the multiple pieces of temporally adjacent data are smaller than the time threshold, merging the time and location fingerprints of the multiple pieces of data into one time and location fingerprint.
Further, the step S30 further includes:
in step S31, the WiFi scan record is represented as the set W ═ { W ═ W1,w2,...,w|W|In which wi=(Ai,ti) Representing a time tiPosition fingerprint is AiWherein the location fingerprint Ai={m1,m2,...,mkIs a set of MAC addresses of APs; for each location fingerprint A in the user's WiFi scanning recordiAny two MAC addresses (a)j,ak) A support s can be calculatedj,kBy calculating to obtain aA distribution formed by a support degree, whereiniI represents AiThe number of MAC addresses, the size of the support degree indicates how frequently these two MAC addresses appear in the set W:
wherein, c (a)j,ak) The representative set W contains aj,akNumber of location fingerprints for two MAC addresses, c (a)j) The representative set W contains ajThe number of location fingerprints for this MAC address, min {. cndot.) represents the chosen minimum, sj,k∈[0,1];
Step S32, for each position fingerprint A in the WiFi scanning record of the useriCalculating the frequency f:
wherein,andthe greater the frequency f, the higher the probability that such a location fingerprint appears in the set W and the more stable the MAC address combination therein is, respectively, the mean and the variance of the distribution in step S31;
step S33, arranging all records in the set W according to the frequency f in descending order, and simultaneously establishing a new set WFLocation fingerprint A for later deposit of access pointsiIn addition, Δ is defined as all non-duplicate MAC addresses contained in the set W, ΔFIs a set WFAll non-duplicate MAC addresses contained therein;
step S34, scanning the position fingerprints A in the set W in the order from top to bottomiIf present, AiJoin the set WFAfter which delta can be increasedFElement (i.e. current A)iContaining the set WFMAC address not present) will be current aiJoin the set WFWhen set WFAfter all MAC addresses in the set W are included (i.e., Δ)FΔ) stopping to set WFIn which is added AiAt this time, set WFAll of A contained in (A)iIs the location fingerprint of the access point extracted.
Further, the step S40 further includes:
step S41, calculating the similarity gamma of the fingerprints of any two visiting place positions according to the Jaccard similarityp,q:
Where N (p, q) represents the number of MAC addresses common to both location fingerprints,representing the number of MAC addresses belonging to location fingerprint p but not to location fingerprint q,represents the number of MAC addresses that do not belong to location fingerprint p but belong to location fingerprint q;
step S42, a undirected weight graph G ═ (V, E) is created, each access point represents a vertex VpEdge e between any two verticesp,qIs the similarity gamma between these two pointsp,qDefining a similarity threshold value thetaγAnd the similarity gamma is retainedp,qGreater than or equal to the critical value thetaγGet the graphThen build a graphOf a neighboring matrixThe value of p rows and q columns corresponds to a weight e greater than or equal to a threshold valuep,q;
Step S43, calculating the mapDegree d of each vertex inpAnd all the vertexes V are arranged according to the degree dpIn descending order, then scan the vertices from high to low: for each vertex vpIf the vertex has not been classified, then it is taken as the center of the new class, according to the adjacency matrixExpanding the adjacent points with the weight to all the vertexes classified to obtain a candidate cluster CC;
Step S44, merging the position fingerprints of the access points in the same candidate cluster obtained in the step S43 to form a new position fingerprint, and calculating the similarity delta between every two position fingerprintsp,q:
Wherein N (p, q) represents the number of MAC addresses common to both location fingerprints, and N (p) represents the number of MAC addresses in location fingerprint p;
step S45, the similarity δ -based is again established with reference to step S42p,qUndirected weight graph ofDefining another similarity threshold value thetaδAnd building an adjacency matrixRefer to step S43 to obtain the final cluster set CF;
Step S46, clustering the set CFMultiple location fingerprints of the same category are combined, and each combined location fingerprint represents an activity place.
Further, the step S50 further includes:
step S51, calculating Jaccard phase between each WiFi scanning record of user and all activity places according to position fingerprintSimilarity, using the activity place with highest similarity as the occurrence place of the WiFi scanning record, namely ri=(ti,pi) Represents each WiFi scanning record, tiDenotes the scan time, piA code number representing the location of the activity;
step S52, sequencing all WiFi scanning records of the user according to the time sequence, and if any m continuous records meet pi=pi+1=…=pi+m-1Then, the m records are merged into a track segment tr ═ t (t)star,tend,pi) Wherein t isstarRepresents ti,tendRepresents ti+m-1。
Further, the step S60 further includes:
step S61, defining a suitable time threshold ωt;
Step S62, for each track segment tr in step S50, retaining t thereinend-tstart>ωtThe track segment of (1).
Further, the step S70 further includes:
step S71, representing the stay time of the user at each activity site as a set Pj={(tstar,tend)1,(tstar,tend)2,...,(tstar,tend)l};
Step S72, collecting each set PjUsing a 24-dimensional vector Tj=[t0,t1,...,t23]Is represented by the formula (I) in which thRepresents the average dwell time of the user at the activity site in h, thIs in the range of [0, 60 ]];
Step S73, a 24-dimensional vector T for an activity sitejAccording to the daily life experience, giving a semantic label to the activity place; for example, if an activity site has a certain duration in 24 hours throughout the day, and has a longer stay time in the evening and a shorter stay time in the daytime, the activity site can be considered as the home of the user; if it isThe residence time is long in the daytime, and the residence time is short in the evening, which is possibly the working place of the user; if the meal point has a certain stay time, the meal point can be a place similar to a restaurant;
step S74, generating user trajectory TR with semantic information consisting of trajectory segments TR { TR ═ TR }1,tr2,...,trn}。
Further, the location fingerprints are merged as: all non-duplicate MAC addresses it contains are extracted from the multiple location fingerprints that need to be merged and then combined into a new location fingerprint.
The invention has the beneficial effects that: the method and the device reduce the influence of the excessively dense data in a certain period of time on the subsequent frequent calculation by merging the data. By extracting the access points, the diversity of the position fingerprints is reduced as much as possible under the condition of ensuring that any MAC address in the data is not lost, the quantity of noise data is effectively reduced, and the quality of the position fingerprints in the data is improved. Meanwhile, temporary activity places in the track are obviously reduced by filtering the track segments staying for a short time, the user track containing semantic information is extracted, and support is provided for the requirements of mining user movement modes, user work and rest rules and the like.
Drawings
FIG. 1 is a schematic diagram of a method for extracting a user trajectory based on WiFi scanning record according to the present invention;
FIG. 2 is a data set to which the present invention is applied;
FIG. 3 is a graph of a 24 hour dwell time profile for a user at an activity site with the present invention applied to an embodiment.
Detailed Description
The following detailed description of the embodiments and the working principles of the present invention is made with reference to the accompanying drawings:
examples
The data set used in this embodiment is campus data from yuquan school district of the university of zhejiang, as shown in fig. 2, in the given data, all information recorded by one user at a certain time is called a record or a piece of data, and the data is composed of information of 50000 users for 7 days a week (2018-08-14 to 2018-08-20), in this embodiment, all records of one user are only used as an original data set, as shown in fig. 1, and the method implementation steps are specifically as follows:
step S10, filtering data that is not enough to depict the user' S movement behavior in units of days, the specific implementation steps are as follows:
step S11: the user's daily data is mapped into the corresponding 24 hours. In this embodiment, after mapping the data of the user each day to the corresponding 24 hours, the data of the user each day may be composed of 24 parts at most;
step S12: the days on which the number of hours recorded on the day is less than a certain threshold are filtered. In this embodiment, the hour number threshold is 12, and the data of the user 2018, 8, 16 and 2018, consists of only 5 parts, that is, the user has data recorded for only 5 hours in 24 hours a day, and the number of hours recorded on the day is not greater than the threshold, so that the data of the day is filtered;
step S13: and filtering the days when the data volume of the user on the day is less than a certain threshold value. In this embodiment, the data amount threshold value of the day is 200, and the user has only 152 pieces of data in 2018, 8 and 14 days, so that the data of the day is filtered out.
Step S20, merging the data according to the dense situation of the data in time, and the specific implementation steps are as follows:
step S21: in order to avoid the influence of data which is too dense in a certain period of time on the subsequent frequency calculation, an appropriate time threshold value is determined according to the requirement. In this embodiment, the time interval of the user data is approximately 1-2 minutes, and since the data acquisition frequency is not fixed, the data time interval of the partial time period can be as small as 10 seconds, so the selected time threshold is 1 minute;
step S22: the data is merged according to a time threshold, i.e. if the time recording intervals of a plurality of temporally adjacent data are smaller than the time threshold, the time and position fingerprints of the plurality of data are merged into one time and position fingerprint. In this embodiment, since the time threshold is 1 minute, the time in seconds is directly removed for the data with the time in seconds of (0, 29), and the time in seconds is removed while adding one to minutes for the data with the time in seconds of [30, 60 ]. And then combining the position fingerprints in the plurality of pieces of data corresponding to the same time into a new position fingerprint.
Step S30, filtering the noise data in the WiFi scan record by calculating the frequency of each MAC address combination in the WiFi scan record, and obtaining a plurality of location fingerprints composed of MAC addresses as access points, which includes the following specific implementation steps:
step S31: WiFi scan record is represented as set W ═ W1,w2,...,w|W|In which wi=(Ai,ti) Representing a time tiPosition fingerprint is AiWherein the location fingerprint Ai={m1,m2,...,mkIs a set of MAC addresses of APs; pairwise MAC addresses (a) for each location fingerprint in a user WiFi scan recordj,ak) Calculating a support sj,kEach location fingerprint is obtained byA distribution formed by a support degree, whereiniI represents AiThe number of MAC addresses, the size of the support degree indicates how frequently these two MAC addresses appear in the set W:
wherein, c (a)j,ak) The representative set W contains aj,akNumber of location fingerprints for two MAC addresses, c (a)j) The representative set W contains ajThe number of location fingerprints for this MAC address, min {. cndot. }, represents the select minimum, sj,k∈[0,1];
In this embodiment, one of the location fingerprints A1Is [ '286 c075518 ed', '6 aa19517d6 cb', 'bcd 17766a 338','d 4ee 07380092']After calculation, the distribution of the position fingerprints is as follows: [1.0,1.0,0.9954,0.8,0.56,0.4674];
Step S32: fingerprint A for each position in WiFi scanning record of useriCalculating the frequency f:
wherein,andthe greater the frequency f, the higher the mean and variance of the distribution in step S31, respectively, the higher the probability that such location fingerprints occur in the set W and the stable MAC address combinations therein. In this example, the location fingerprint A1Mean of distribution 0.8038, variance 0.04772, location fingerprint A10.7672;
step S33: arranging all records in the set W according to the frequency f in a descending order, and simultaneously establishing a new set WFLocation fingerprint A for later deposit of access pointsiIn addition, Δ is defined as all non-repeating MAC addresses contained in the set W, ΔFIs a set WFAll non-duplicate MAC addresses contained therein. In this example, there are 679 different location fingerprints in the user's 3-day active records (926 records total), and the frequency of all records in the user set W is at most 1.0 and at least 0.4092;
step S34: scanning the position fingerprints A in the set W in top-to-bottom orderiIf present, AiJoin the set WFAfter which delta can be increasedFElement (i.e. current A)iContaining the set WFMAC address not present) will be current aiJoin the set WFWhen set WFAfter all MAC addresses in the set W are included (i.e., Δ)FΔ) stopping to set WFIn which is added AiAt this time, set WFAll of A contained in (A)iIs the location fingerprint of the access point extracted. In this example, W is aggregated by extractionFThe number of location fingerprints in (1) is reduced to 187, that is, the current 187 MAC address combinations are the location fingerprints of the extracted access points. Under the condition of ensuring that any MAC address in the set W is not lost, the diversity of the position fingerprints is reduced as much as possible, and the quality of the position fingerprints in the set W is improved.
Step S40, further clustering out the activity places from the access places according to the position fingerprints, and the specific implementation steps are as follows:
step S41: calculating the similarity gamma of the position fingerprints of any two access points through the Jaccard similarityp,q:
Where N (p, q) represents the number of MAC addresses common to both location fingerprints,representing the number of MAC addresses belonging to location fingerprint p but not to location fingerprint q,represents the number of MAC addresses that do not belong to location fingerprint p but belong to location fingerprint q;
in the present example, the similarity of the fingerprints of any two activity locations is 0.8182 at the maximum and 0.0 at the minimum;
step S42: establishing a undirected weight graph G ═ (V, E), each access point representing a vertex VpEdge e between any two verticesp,qIs the similarity gamma between these two pointsp,qDefining a similarity threshold value thetaγAnd the similarity gamma is retainedp,qGreater than or equal to the critical value thetaγGet the graphThen theBuilding a graphOf a neighboring matrixThe value of p rows and q columns corresponds to a weight e greater than or equal to a threshold valuep,q. In this example, the similarity threshold θγIs 0.05, the generated adjacency matrixIs a symmetric matrix of 187 × 187 with diagonal values of 0;
step S43: calculation chartDegree d of each vertex inpAnd all the vertexes V are arranged according to the degree dpIn descending order, then scan the vertices from high to low: for each vertex vpIf the vertex has not been classified, then it is taken as the center of the new class, according to the adjacency matrixExpanding the adjacent points with the weight to all the vertexes classified to obtain a candidate cluster CC. In this example, 187 access points result in 27 clusters by clustering;
step S44: merging the position fingerprints of the access points in the same candidate cluster obtained in the step S23 to form a new position fingerprint, and calculating the similarity delta between every two position fingerprintsp,q:
Wherein N (p, q) represents the number of MAC addresses common to both location fingerprints, and N (p) represents the number of MAC addresses in location fingerprint p;
in this example, for the merged position fingerprints, the maximum value of the pairwise similarity is 1.0, and the minimum value is 0.0;
step S45: the similarity δ -based basis is again established with reference to step S42p,qUndirected weight graph ofDefining another similarity threshold value thetaδAnd building an adjacency matrixRefer to step S43 to obtain the final cluster set CF. In this example, the similarity threshold θδIs 0.1, a 27 × 27 adjacency matrix is establishedThe 27 position fingerprints obtain 16 clusters through clustering again;
step S46: cluster set CFMultiple location fingerprints of the same category are combined, and each combined location fingerprint represents an activity place. In this example, for each of the 16 clusters, all of the non-duplicate MAC addresses it contains are extracted from the location fingerprints in this cluster, and then these MAC addresses are combined into the location fingerprint of the new location, which is the extracted active location.
Step S50, mapping the WiFi scan record of each time of the user to the corresponding activity location according to the similarity of the location fingerprint and generating a track segment, the specific implementation steps are as follows:
step S51: calculating Jaccard similarity between each WiFi scanning record of the user and all activity places according to the position fingerprints, and taking the activity place with the highest similarity as the occurrence place of the WiFi scanning record, namely ri=(ti,pi) Represents each WiFi scanning record, tiDenotes the scan time, piAn ID representing the location of the activity. In this example, the 926 records of the user are tagged with 16 activity places by location fingerprint similarity;
step S52: in time sequence toSequencing all WiFi scanning records of the user, and if any m continuous records meet pi=pi+1=…=pi+m-1Then, the m records are merged into a track segment tr ═ t (t)star,tend,pi) Wherein t isstarRepresents ti,tendRepresents ti+m-1. In this example, the user has 926 records for 3 days that are combined into 48 track segments.
Step S60, filtering the short staying track segments, the specific implementation steps are as follows:
step S61: defining a suitable time threshold ωt. In the present example, the time threshold ωtIs 10 minutes;
step S62: for each track segment tr in step S52, retaining t thereinend-tstart>ωtThe track segment of (1). In the present example, the elapsed time threshold ωtAnd in the filtering, the number of track segments is reduced from 48 to 22, each of the rest track segments belongs to one activity place, and the 22 track segments comprise 4 activity places.
Step S70, semantization of the activity location and generation of a user track containing semantic information, the specific implementation steps are as follows:
step S71: representing the dwell time of the user at each activity site as a set Pj={(tstar,tend)1,(tstar,tend)2,...,(tstar,tend)l}. In this example, 4 activity sites are used as set P0,P1,P2,P3Represents;
step S72: each set PjUsing a 24-dimensional vector Tj=[t0,t1,...,t23]Is represented by the formula (I) in which thRepresents the average dwell time of the user at the activity site in h, thIs in the range of [0, 60 ]]. In this example, a 24-dimensional vector T for site 00Denoted as [46.2, 54.6, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0,50.2,24.0,16.8,26.2,22.6,12.0,12.0,19.4,57.4,48.0,48.0,47.2,40.0,48.0,50.6];
Step S73: 24-dimensional vector T for an activity sitejAnd according to the daily life experience, giving semantic labels to the activity places. In this example, four activity sites are represented by 24-dimensional vectors, fig. 3 shows the distribution of the stay time of the user in 24 hours in the four activity sites, and it can be seen from the figure that site 0 has a certain duration all day long and has longer stay time in the early morning and evening, and the site is the home of the user with a high probability according to the life experience; the residence time of the place 1 is longer in the daytime, the place is more likely to be a work place, and the user has certain residence time in the place from 18 o 'clock to 0 o' clock in the evening, which is probably caused by overtime of the user on a certain day; site 3 has a certain dwell time around 12 am and around 20 pm, which may be a restaurant; site 2 has a short dwell time around 16 pm, and its semantic features are not as obvious as home, work, and canteen, which may be where the user shops, eases, or otherwise moves;
step S74: generating a user track TR (TR) with semantic information, which is composed of track segments TR1,tr2,...,trn}. In this embodiment, table 1 shows the trajectories of the user for one day.
TABLE 1 example of a user's one day trace
The method is based on WiFi scanning records, the user track is extracted through the steps of merging data, extracting access places, activity places, filtering track fragments, semantization activity places and the like so as to describe user behaviors, and support is provided in the aspects of optimizing a user recommendation system, improving the service quality of enterprises, assisting smart city layout and the like.
The foregoing is only a preferred embodiment of the present invention, and although the present invention has been disclosed in the preferred embodiments, it is not intended to limit the present invention. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.
Claims (9)
1. A user track extraction method based on WiFi scanning record is characterized by comprising the following steps:
step S10, filtering data which is not enough to depict the user movement behavior by taking the day as a unit;
step S20, merging data according to the intensive situation of the data in time;
step S30, filtering noise data in WiFi scanning records by calculating the frequency of each MAC address combination in the WiFi scanning records, and obtaining a plurality of position fingerprints composed of MAC addresses as access points; the frequency calculation specifically includes: for each WiFi scanning record of a user, calculating the frequency of common occurrence of any two MAC addresses in all records so as to form support degree distribution, and calculating the frequency of the WiFi scanning record according to the mean value and the variance of the support degree distribution;
step S40, further clustering to obtain an activity place from the access places according to the position fingerprint;
step S50, mapping the WiFi scanning records of each moment of the user to corresponding activity places according to the similarity of the position fingerprints and generating track fragments;
step S60, filtering the track segment staying for a short time;
step S70, semanticizing the activity place and generating a user track containing semantic information.
2. The WiFi scan recording based user trajectory extraction method according to claim 1, wherein the step S10 further includes:
step S11, mapping the data of the user every day to the corresponding 24 hours;
step S12, filtering the days with the number of hours recorded on the day less than a certain threshold value;
and step S13, filtering the days when the data volume of the current day of the user is less than a certain threshold value.
3. The WiFi scan recording based user trajectory extraction method according to claim 1, wherein the step S20 further includes:
step S21, in order to avoid the influence of excessively dense data in a certain period of time on the subsequent frequency calculation, a time threshold value is determined according to the requirement;
step S22, merging the data according to the time threshold, that is, if the time recording intervals of the multiple pieces of temporally adjacent data are smaller than the time threshold, merging the time and location fingerprints of the multiple pieces of data into one time and location fingerprint.
4. The method for extracting user trajectory based on WiFi scan record according to claim 1, wherein the step S30 further includes:
in step S31, the WiFi scan record is represented as the set W ═ { W ═ W1,w2,...,w|W|In which wi=(Ai,ti) Representing a time tiPosition fingerprint is AiWherein the location fingerprint Ai={m1,m2,...,mkIs a set of MAC addresses of APs; for each location fingerprint A in the user's WiFi scanning recordiAny two MAC addresses (a)j,ak) A support s can be calculatedj,kBy calculating to obtain aA distribution formed by a support degree, whereiniI represents AiThe number of MAC addresses, the size of the support degree indicates how frequently these two MAC addresses appear in the set W:
wherein, c (a)j,ak) The representative set W contains aj,akNumber of location fingerprints for two MAC addresses, c (a)j) The representative set W contains ajThe number of location fingerprints for this MAC address, min {. cndot.) represents the chosen minimum, sj,k∈[0,1];
Step S32, each location fingerprint A in the WiFi scanning record of the useriCalculating the frequency f:
wherein,andthe greater the frequency f, the higher the probability that such a location fingerprint appears in the set W and the more stable the MAC address combination therein is, respectively, the mean and the variance of the distribution in step S31;
step S33, arranging all records in the set W according to the frequency f in descending order, and simultaneously establishing a new set WFLocation fingerprint A for later deposit of access pointsiIn addition, Δ is defined as all non-repeating MAC addresses contained in the set W, ΔFIs a set WFAll non-duplicate MAC addresses contained therein;
step S34, scanning the position fingerprints A in the set W in the order from top to bottomiIf present, AiJoin the set WFAfter which delta can be increasedFIs the current AiContaining the set WFIf there is no MAC address, then A will be presentiJoin the set WFWhen set WFAfter including all MAC addresses in the set W, i.e. ΔFAt Δ, stop to set WFIn which is added AiAt this time, set WFAll of A contained iniIs the location fingerprint of the access point extracted.
5. The WiFi scan recording based user trajectory extraction method according to claim 1, wherein the step S40 further includes:
step S41, calculating the similarity gamma of the fingerprints of any two visiting place positions according to the Jaccard similarityp,q:
Where N (p, q) represents the number of MAC addresses common to both location fingerprints,representing the number of MAC addresses belonging to location fingerprint p but not to location fingerprint q,represents the number of MAC addresses that do not belong to location fingerprint p but belong to location fingerprint q;
step S42, a undirected weight graph G ═ (V, E) is created, each access point represents a vertex VpEdge e between any two verticesp,qIs the similarity gamma between these two pointsp,qDefining a similarity threshold value thetaγAnd the similarity gamma is retainedp,qGreater than or equal to the critical value thetaγGet the graphThen build a graphOf a neighboring matrixThe value of p rows and q columns corresponds to a weight e greater than or equal to a threshold valuep,q;
Step S43, calculating the mapDegree d of each vertex inpAnd all the vertexes V are arranged according to the degree dpIn descending order, then scan the vertices from high to low: for each vertex vpIf the vertex has not been classified, then it is taken as the center of the new class, according to the adjacency matrixExpanding the adjacent points with the weight to all the vertexes classified to obtain a candidate cluster CC;
Step S44, merging the position fingerprints of the access points in the same candidate cluster obtained in step S43 to form a new position fingerprint, and calculating the similarity delta between every two position fingerprintsp,q:
Wherein N (p, q) represents the number of MAC addresses common to both location fingerprints, and N (p) represents the number of MAC addresses in location fingerprint p;
step S45, the similarity δ -based is again established with reference to step S42p,qUndirected weight graph ofDefining another similarity threshold value thetaδAnd building an adjacency matrixRefer to step S43 to obtain the final cluster set CF;
Step S46, clustering the set CFMultiple location fingerprints of the same category are combined, and each combined location fingerprint represents an activity place.
6. The WiFi scan recording based user trajectory extraction method according to claim 1, wherein the step S50 further includes:
step S51, calculating Jaccard similarity between each WiFi scanning record and all activity places of the user according to the position fingerprint, and taking the activity place with the highest similarity as the occurrence place of the WiFi scanning record, namely ri=(ti,pi) Represents each WiFi scanning record, tiDenotes the scan time, piA code number representing the location of the activity;
step S52, sequencing all WiFi scanning records of the user according to the time sequence, and if any m continuous records meet pi=pi+1=…=pi+m-1Then, the m records are merged into a track segment tr ═ t (t)start,tend,pi) Wherein t isstartRepresents ti,tendRepresents ti+m-1。
7. The WiFi scan recording based user trajectory extraction method according to claim 6, wherein the step S60 further includes:
step S61, defining a time threshold value omegat;
Step S62, for each track segment tr in step S50, retaining t thereinend-tstart>ωtThe track segment of (1).
8. The WiFi scan recording based user trajectory extraction method according to claim 6, wherein the step S70 further includes:
step S71, representing the stay time of the user at each activity site as a set Pj={(tstart,tend)1,(tstart,tend)2,...,(tstart,tend)l};
Step S72, collecting each set PjUsing a 24-dimensional vector Tj=[t0,t1,...,t23]Is represented by the formula (I) in which thRepresents the average dwell time of the user at the activity site in h, thIs in the range of [0, 60 ]];
Step S73, a 24-dimensional vector T for an activity sitejAccording to the daily life experience, giving a semantic label to the activity place;
step S74, generating user trajectory TR with semantic information consisting of trajectory segments TR { TR ═ TR }1,tr2,...,trn}。
9. The WiFi scan recording based user trajectory extraction method according to claim 3 or 5, wherein the location fingerprints are merged as: all non-duplicate MAC addresses it contains are extracted from the multiple location fingerprints that need to be merged and then combined into a new location fingerprint.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010856149.5A CN112104979B (en) | 2020-08-24 | 2020-08-24 | User track extraction method based on WiFi scanning record |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010856149.5A CN112104979B (en) | 2020-08-24 | 2020-08-24 | User track extraction method based on WiFi scanning record |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112104979A CN112104979A (en) | 2020-12-18 |
CN112104979B true CN112104979B (en) | 2022-05-03 |
Family
ID=73754318
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010856149.5A Active CN112104979B (en) | 2020-08-24 | 2020-08-24 | User track extraction method based on WiFi scanning record |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112104979B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105117424A (en) * | 2015-07-31 | 2015-12-02 | 中国科学院软件研究所 | Dwell-time-based moving object semantic behavior pattern mining method |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106792523B (en) * | 2016-12-10 | 2019-12-03 | 武汉白虹软件科技有限公司 | A kind of anomaly detection method based on extensive WiFi activity trajectory |
CN106790468B (en) * | 2016-12-10 | 2020-06-02 | 武汉白虹软件科技有限公司 | Distributed implementation method for analyzing WiFi (Wireless Fidelity) activity track rule of user |
CN110892760B (en) * | 2017-08-21 | 2021-11-23 | 北京嘀嘀无限科技发展有限公司 | Positioning terminal equipment based on deep learning |
CN108173978A (en) * | 2017-11-23 | 2018-06-15 | 浙江大学 | Unmanned plane detection method based on smart machine parsing Wi-Fi MAC Address |
US11562168B2 (en) * | 2018-07-16 | 2023-01-24 | Here Global B.V. | Clustering for K-anonymity in location trajectory data |
CN109413587A (en) * | 2018-09-20 | 2019-03-01 | 广州纳斯威尔信息技术有限公司 | User trajectory prediction technique based on WiFi log |
-
2020
- 2020-08-24 CN CN202010856149.5A patent/CN112104979B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105117424A (en) * | 2015-07-31 | 2015-12-02 | 中国科学院软件研究所 | Dwell-time-based moving object semantic behavior pattern mining method |
Also Published As
Publication number | Publication date |
---|---|
CN112104979A (en) | 2020-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Uncovering inconspicuous places using social media check-ins and street view images | |
Jiang et al. | Activity-based human mobility patterns inferred from mobile phone data: A case study of Singapore | |
Soto et al. | Automated land use identification using cell-phone records | |
Liu et al. | Functional zone based hierarchical demand prediction for bike system expansion | |
Ahas et al. | Using mobile positioning data to model locations meaningful to users of mobile phones | |
CN103870485B (en) | Method and device for achieving augmented reality application | |
CN106991142A (en) | A kind of method that urban function region is recognized based on wechat data and interest point data | |
US20160205510A1 (en) | Systems and methods to identify home addresses of mobile devices | |
EP2805531A1 (en) | A method for the automatic detection and labelling of user point of interest | |
CN107801202A (en) | A kind of user's portrait method based on WiFi accesses | |
Qian et al. | Quantify city-level dynamic functions across China using social media and POIs data | |
CN101557582B (en) | Method and device for mobile communication user information statistics | |
CN111432417A (en) | Sports center site selection method based on mobile phone signaling data | |
CN108733692A (en) | A kind of social information recommendation method and apparatus | |
CN113593020A (en) | Large-scale three-dimensional city scene generation method based on ArcGIS | |
Cao et al. | Understanding metropolitan crowd mobility via mobile cellular accessing data | |
Bordogna et al. | An interoperable open data framework for discovering popular tours based on geo-tagged tweets | |
Yang et al. | Fusing mobile phone and travel survey data to model urban activity dynamics | |
CN112104979B (en) | User track extraction method based on WiFi scanning record | |
Yu et al. | Multi-scale cross-city community detection of urban agglomeration using signaling big data | |
Li et al. | Delineation of the Shanghai megacity region of China from a commuting perspective: Study based on cell phone network data in the Yangtze River Delta | |
CN107577727A (en) | A kind of One-male unit behavioral trait analysis method | |
Girardin et al. | Uncovering the presence and movements of tourists from user-generated content | |
Pearce | The spatial structure of coastal tourism: a behavioural approach | |
CN115588086A (en) | Map dividing method, map dividing device, computer readable storage medium and processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20231012 Address after: No.26, Fengnan Road, Ouhai District, Wenzhou City, Zhejiang Province 325000 Patentee after: Wenzhou Research Institute of Zhejiang University Address before: Room 501, 5th floor, building 12, Wenzhou National University Science Park, No.26, Fengnan Road, Ouhai Economic Development Zone, Wenzhou City, Zhejiang Province 325000 Patentee before: Zhejiang Yunhe Data Technology Co.,Ltd. |
|
TR01 | Transfer of patent right |