CN112104979B - User track extraction method based on WiFi scanning record - Google Patents

User track extraction method based on WiFi scanning record Download PDF

Info

Publication number
CN112104979B
CN112104979B CN202010856149.5A CN202010856149A CN112104979B CN 112104979 B CN112104979 B CN 112104979B CN 202010856149 A CN202010856149 A CN 202010856149A CN 112104979 B CN112104979 B CN 112104979B
Authority
CN
China
Prior art keywords
user
location
mac addresses
fingerprints
wifi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010856149.5A
Other languages
Chinese (zh)
Other versions
CN112104979A (en
Inventor
陈积明
郑行言
李超
贺诗波
方毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wenzhou Research Institute Of Zhejiang University
Original Assignee
Zhejiang Yunhe Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Yunhe Data Technology Co ltd filed Critical Zhejiang Yunhe Data Technology Co ltd
Priority to CN202010856149.5A priority Critical patent/CN112104979B/en
Publication of CN112104979A publication Critical patent/CN112104979A/en
Application granted granted Critical
Publication of CN112104979B publication Critical patent/CN112104979B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a user track extraction method based on WiFi scanning record, which comprises the following steps: filtering data which is not enough to depict the movement behavior of the user by taking a day as a unit; merging data according to the intensive situation of the data in time; filtering noise data in the WiFi scanning record by calculating the frequency of each MAC address combination in the WiFi scanning record to obtain a plurality of position fingerprints consisting of MAC addresses as access places; clustering according to the position fingerprint to obtain an activity place from the access places; mapping the WiFi scanning records of the user at each moment to corresponding activity places according to the similarity of the position fingerprints and generating track fragments; filtering the short staying track segments; and generating a user track containing semantic information. The method can extract the user track containing the semantic information, can effectively meet the requirements of mining the user movement mode, the user work and rest rules and the like, and meanwhile, the activity places extracted based on the method are all logic places, so that the privacy safety of the user is effectively protected.

Description

User track extraction method based on WiFi scanning record
Technical Field
The invention relates to a user track extraction method, in particular to a user track extraction method based on WiFi scanning record.
Background
The rapid development of intelligent terminals and positioning technologies greatly promotes the popularization of location-based service application, nowadays, users are the core foundation of providing services for many enterprises, user behaviors can be described by analyzing the position changes of the users, great significance is brought to the aspects of optimizing a user recommendation system, improving the service quality of the enterprises, assisting smart city layout and the like, close association is brought to the daily behaviors of the users by considering that the daily movement tracks of the users contain information of the users on time and space, and the study on the user tracks is always concerned by students.
With continuous progress of science and technology, demands of users for location-based services are more and more diversified, but most of current research is based on GPS data mining user tracks.
Disclosure of Invention
The invention aims to provide a user track extraction method based on WiFi scanning records, which is in line with the trend that WiFi is widely deployed in smart cities, and extracts user information from a large amount of WiFi data to describe user behaviors. The method can help enterprises to discover potential requirements of users in time and provide personalized information services so as to improve user experience, and meanwhile, the method can also face city management requirements and provide support for layout planning and public resource allocation of smart cities.
The purpose of the invention is realized by the following technical scheme: a user track extraction method based on WiFi scanning record comprises the following steps:
step S10, filtering data which is not enough to depict the user movement behavior by taking the day as a unit;
step S20, merging data according to the intensive situation of the data in time;
step S30, filtering noise data in WiFi scanning records by calculating the frequency of each MAC address combination in the WiFi scanning records, and obtaining a plurality of position fingerprints composed of MAC addresses as access points;
step S40, further clustering to obtain an activity place from the access places according to the position fingerprint;
step S50, mapping the WiFi scanning records of each moment of the user to corresponding activity places according to the similarity of the position fingerprints and generating track fragments;
step S60, filtering the track segment staying for a short time;
step S70, semanticizing the activity place and generating a user track containing semantic information.
Further, the step S10 further includes:
step S11, mapping the data of the user every day to the corresponding 24 hours;
step S12, filtering the days with the number of hours recorded on the day less than a certain threshold value;
and step S13, filtering the days when the data volume of the current day of the user is less than a certain threshold value.
Further, the step S20 further includes:
step S21, in order to avoid the influence of the excessively dense data in a certain period of time on the subsequent frequency calculation, a proper time threshold value is determined according to the requirement;
step S22, merging the data according to the time threshold, that is, if the time recording intervals of the multiple pieces of temporally adjacent data are smaller than the time threshold, merging the time and location fingerprints of the multiple pieces of data into one time and location fingerprint.
Further, the step S30 further includes:
in step S31, the WiFi scan record is represented as the set W ═ { W ═ W1,w2,...,w|W|In which wi=(Ai,ti) Representing a time tiPosition fingerprint is AiWherein the location fingerprint Ai={m1,m2,...,mkIs a set of MAC addresses of APs; for each location fingerprint A in the user's WiFi scanning recordiAny two MAC addresses (a)j,ak) A support s can be calculatedj,kBy calculating to obtain a
Figure BDA0002646413800000021
A distribution formed by a support degree, whereiniI represents AiThe number of MAC addresses, the size of the support degree indicates how frequently these two MAC addresses appear in the set W:
Figure BDA0002646413800000022
wherein, c (a)j,ak) The representative set W contains aj,akNumber of location fingerprints for two MAC addresses, c (a)j) The representative set W contains ajThe number of location fingerprints for this MAC address, min {. cndot.) represents the chosen minimum, sj,k∈[0,1];
Step S32, for each position fingerprint A in the WiFi scanning record of the useriCalculating the frequency f:
Figure BDA0002646413800000023
wherein,
Figure BDA0002646413800000024
and
Figure BDA0002646413800000025
the greater the frequency f, the higher the probability that such a location fingerprint appears in the set W and the more stable the MAC address combination therein is, respectively, the mean and the variance of the distribution in step S31;
step S33, arranging all records in the set W according to the frequency f in descending order, and simultaneously establishing a new set WFLocation fingerprint A for later deposit of access pointsiIn addition, Δ is defined as all non-duplicate MAC addresses contained in the set W, ΔFIs a set WFAll non-duplicate MAC addresses contained therein;
step S34, scanning the position fingerprints A in the set W in the order from top to bottomiIf present, AiJoin the set WFAfter which delta can be increasedFElement (i.e. current A)iContaining the set WFMAC address not present) will be current aiJoin the set WFWhen set WFAfter all MAC addresses in the set W are included (i.e., Δ)FΔ) stopping to set WFIn which is added AiAt this time, set WFAll of A contained in (A)iIs the location fingerprint of the access point extracted.
Further, the step S40 further includes:
step S41, calculating the similarity gamma of the fingerprints of any two visiting place positions according to the Jaccard similarityp,q
Figure BDA0002646413800000031
Where N (p, q) represents the number of MAC addresses common to both location fingerprints,
Figure BDA0002646413800000032
representing the number of MAC addresses belonging to location fingerprint p but not to location fingerprint q,
Figure BDA0002646413800000033
represents the number of MAC addresses that do not belong to location fingerprint p but belong to location fingerprint q;
step S42, a undirected weight graph G ═ (V, E) is created, each access point represents a vertex VpEdge e between any two verticesp,qIs the similarity gamma between these two pointsp,qDefining a similarity threshold value thetaγAnd the similarity gamma is retainedp,qGreater than or equal to the critical value thetaγGet the graph
Figure BDA0002646413800000034
Then build a graph
Figure BDA0002646413800000035
Of a neighboring matrix
Figure BDA0002646413800000036
The value of p rows and q columns corresponds to a weight e greater than or equal to a threshold valuep,q
Step S43, calculating the map
Figure BDA0002646413800000037
Degree d of each vertex inpAnd all the vertexes V are arranged according to the degree dpIn descending order, then scan the vertices from high to low: for each vertex vpIf the vertex has not been classified, then it is taken as the center of the new class, according to the adjacency matrix
Figure BDA0002646413800000038
Expanding the adjacent points with the weight to all the vertexes classified to obtain a candidate cluster CC
Step S44, merging the position fingerprints of the access points in the same candidate cluster obtained in the step S43 to form a new position fingerprint, and calculating the similarity delta between every two position fingerprintsp,q
Figure BDA0002646413800000039
Wherein N (p, q) represents the number of MAC addresses common to both location fingerprints, and N (p) represents the number of MAC addresses in location fingerprint p;
step S45, the similarity δ -based is again established with reference to step S42p,qUndirected weight graph of
Figure BDA00026464138000000310
Defining another similarity threshold value thetaδAnd building an adjacency matrix
Figure BDA00026464138000000311
Refer to step S43 to obtain the final cluster set CF
Step S46, clustering the set CFMultiple location fingerprints of the same category are combined, and each combined location fingerprint represents an activity place.
Further, the step S50 further includes:
step S51, calculating Jaccard phase between each WiFi scanning record of user and all activity places according to position fingerprintSimilarity, using the activity place with highest similarity as the occurrence place of the WiFi scanning record, namely ri=(ti,pi) Represents each WiFi scanning record, tiDenotes the scan time, piA code number representing the location of the activity;
step S52, sequencing all WiFi scanning records of the user according to the time sequence, and if any m continuous records meet pi=pi+1=…=pi+m-1Then, the m records are merged into a track segment tr ═ t (t)star,tend,pi) Wherein t isstarRepresents ti,tendRepresents ti+m-1
Further, the step S60 further includes:
step S61, defining a suitable time threshold ωt
Step S62, for each track segment tr in step S50, retaining t thereinend-tstart>ωtThe track segment of (1).
Further, the step S70 further includes:
step S71, representing the stay time of the user at each activity site as a set Pj={(tstar,tend)1,(tstar,tend)2,...,(tstar,tend)l};
Step S72, collecting each set PjUsing a 24-dimensional vector Tj=[t0,t1,...,t23]Is represented by the formula (I) in which thRepresents the average dwell time of the user at the activity site in h, thIs in the range of [0, 60 ]];
Step S73, a 24-dimensional vector T for an activity sitejAccording to the daily life experience, giving a semantic label to the activity place; for example, if an activity site has a certain duration in 24 hours throughout the day, and has a longer stay time in the evening and a shorter stay time in the daytime, the activity site can be considered as the home of the user; if it isThe residence time is long in the daytime, and the residence time is short in the evening, which is possibly the working place of the user; if the meal point has a certain stay time, the meal point can be a place similar to a restaurant;
step S74, generating user trajectory TR with semantic information consisting of trajectory segments TR { TR ═ TR }1,tr2,...,trn}。
Further, the location fingerprints are merged as: all non-duplicate MAC addresses it contains are extracted from the multiple location fingerprints that need to be merged and then combined into a new location fingerprint.
The invention has the beneficial effects that: the method and the device reduce the influence of the excessively dense data in a certain period of time on the subsequent frequent calculation by merging the data. By extracting the access points, the diversity of the position fingerprints is reduced as much as possible under the condition of ensuring that any MAC address in the data is not lost, the quantity of noise data is effectively reduced, and the quality of the position fingerprints in the data is improved. Meanwhile, temporary activity places in the track are obviously reduced by filtering the track segments staying for a short time, the user track containing semantic information is extracted, and support is provided for the requirements of mining user movement modes, user work and rest rules and the like.
Drawings
FIG. 1 is a schematic diagram of a method for extracting a user trajectory based on WiFi scanning record according to the present invention;
FIG. 2 is a data set to which the present invention is applied;
FIG. 3 is a graph of a 24 hour dwell time profile for a user at an activity site with the present invention applied to an embodiment.
Detailed Description
The following detailed description of the embodiments and the working principles of the present invention is made with reference to the accompanying drawings:
examples
The data set used in this embodiment is campus data from yuquan school district of the university of zhejiang, as shown in fig. 2, in the given data, all information recorded by one user at a certain time is called a record or a piece of data, and the data is composed of information of 50000 users for 7 days a week (2018-08-14 to 2018-08-20), in this embodiment, all records of one user are only used as an original data set, as shown in fig. 1, and the method implementation steps are specifically as follows:
step S10, filtering data that is not enough to depict the user' S movement behavior in units of days, the specific implementation steps are as follows:
step S11: the user's daily data is mapped into the corresponding 24 hours. In this embodiment, after mapping the data of the user each day to the corresponding 24 hours, the data of the user each day may be composed of 24 parts at most;
step S12: the days on which the number of hours recorded on the day is less than a certain threshold are filtered. In this embodiment, the hour number threshold is 12, and the data of the user 2018, 8, 16 and 2018, consists of only 5 parts, that is, the user has data recorded for only 5 hours in 24 hours a day, and the number of hours recorded on the day is not greater than the threshold, so that the data of the day is filtered;
step S13: and filtering the days when the data volume of the user on the day is less than a certain threshold value. In this embodiment, the data amount threshold value of the day is 200, and the user has only 152 pieces of data in 2018, 8 and 14 days, so that the data of the day is filtered out.
Step S20, merging the data according to the dense situation of the data in time, and the specific implementation steps are as follows:
step S21: in order to avoid the influence of data which is too dense in a certain period of time on the subsequent frequency calculation, an appropriate time threshold value is determined according to the requirement. In this embodiment, the time interval of the user data is approximately 1-2 minutes, and since the data acquisition frequency is not fixed, the data time interval of the partial time period can be as small as 10 seconds, so the selected time threshold is 1 minute;
step S22: the data is merged according to a time threshold, i.e. if the time recording intervals of a plurality of temporally adjacent data are smaller than the time threshold, the time and position fingerprints of the plurality of data are merged into one time and position fingerprint. In this embodiment, since the time threshold is 1 minute, the time in seconds is directly removed for the data with the time in seconds of (0, 29), and the time in seconds is removed while adding one to minutes for the data with the time in seconds of [30, 60 ]. And then combining the position fingerprints in the plurality of pieces of data corresponding to the same time into a new position fingerprint.
Step S30, filtering the noise data in the WiFi scan record by calculating the frequency of each MAC address combination in the WiFi scan record, and obtaining a plurality of location fingerprints composed of MAC addresses as access points, which includes the following specific implementation steps:
step S31: WiFi scan record is represented as set W ═ W1,w2,...,w|W|In which wi=(Ai,ti) Representing a time tiPosition fingerprint is AiWherein the location fingerprint Ai={m1,m2,...,mkIs a set of MAC addresses of APs; pairwise MAC addresses (a) for each location fingerprint in a user WiFi scan recordj,ak) Calculating a support sj,kEach location fingerprint is obtained by
Figure BDA0002646413800000061
A distribution formed by a support degree, whereiniI represents AiThe number of MAC addresses, the size of the support degree indicates how frequently these two MAC addresses appear in the set W:
Figure BDA0002646413800000062
wherein, c (a)j,ak) The representative set W contains aj,akNumber of location fingerprints for two MAC addresses, c (a)j) The representative set W contains ajThe number of location fingerprints for this MAC address, min {. cndot. }, represents the select minimum, sj,k∈[0,1];
In this embodiment, one of the location fingerprints A1Is [ '286 c075518 ed', '6 aa19517d6 cb', 'bcd 17766a 338','d 4ee 07380092']After calculation, the distribution of the position fingerprints is as follows: [1.0,1.0,0.9954,0.8,0.56,0.4674];
Step S32: fingerprint A for each position in WiFi scanning record of useriCalculating the frequency f:
Figure BDA0002646413800000063
wherein,
Figure BDA0002646413800000064
and
Figure BDA0002646413800000065
the greater the frequency f, the higher the mean and variance of the distribution in step S31, respectively, the higher the probability that such location fingerprints occur in the set W and the stable MAC address combinations therein. In this example, the location fingerprint A1Mean of distribution 0.8038, variance 0.04772, location fingerprint A10.7672;
step S33: arranging all records in the set W according to the frequency f in a descending order, and simultaneously establishing a new set WFLocation fingerprint A for later deposit of access pointsiIn addition, Δ is defined as all non-repeating MAC addresses contained in the set W, ΔFIs a set WFAll non-duplicate MAC addresses contained therein. In this example, there are 679 different location fingerprints in the user's 3-day active records (926 records total), and the frequency of all records in the user set W is at most 1.0 and at least 0.4092;
step S34: scanning the position fingerprints A in the set W in top-to-bottom orderiIf present, AiJoin the set WFAfter which delta can be increasedFElement (i.e. current A)iContaining the set WFMAC address not present) will be current aiJoin the set WFWhen set WFAfter all MAC addresses in the set W are included (i.e., Δ)FΔ) stopping to set WFIn which is added AiAt this time, set WFAll of A contained in (A)iIs the location fingerprint of the access point extracted. In this example, W is aggregated by extractionFThe number of location fingerprints in (1) is reduced to 187, that is, the current 187 MAC address combinations are the location fingerprints of the extracted access points. Under the condition of ensuring that any MAC address in the set W is not lost, the diversity of the position fingerprints is reduced as much as possible, and the quality of the position fingerprints in the set W is improved.
Step S40, further clustering out the activity places from the access places according to the position fingerprints, and the specific implementation steps are as follows:
step S41: calculating the similarity gamma of the position fingerprints of any two access points through the Jaccard similarityp,q
Figure BDA0002646413800000066
Where N (p, q) represents the number of MAC addresses common to both location fingerprints,
Figure BDA0002646413800000067
representing the number of MAC addresses belonging to location fingerprint p but not to location fingerprint q,
Figure BDA0002646413800000068
represents the number of MAC addresses that do not belong to location fingerprint p but belong to location fingerprint q;
in the present example, the similarity of the fingerprints of any two activity locations is 0.8182 at the maximum and 0.0 at the minimum;
step S42: establishing a undirected weight graph G ═ (V, E), each access point representing a vertex VpEdge e between any two verticesp,qIs the similarity gamma between these two pointsp,qDefining a similarity threshold value thetaγAnd the similarity gamma is retainedp,qGreater than or equal to the critical value thetaγGet the graph
Figure BDA0002646413800000071
Then theBuilding a graph
Figure BDA0002646413800000072
Of a neighboring matrix
Figure BDA0002646413800000073
The value of p rows and q columns corresponds to a weight e greater than or equal to a threshold valuep,q. In this example, the similarity threshold θγIs 0.05, the generated adjacency matrix
Figure BDA0002646413800000074
Is a symmetric matrix of 187 × 187 with diagonal values of 0;
step S43: calculation chart
Figure BDA0002646413800000075
Degree d of each vertex inpAnd all the vertexes V are arranged according to the degree dpIn descending order, then scan the vertices from high to low: for each vertex vpIf the vertex has not been classified, then it is taken as the center of the new class, according to the adjacency matrix
Figure BDA0002646413800000076
Expanding the adjacent points with the weight to all the vertexes classified to obtain a candidate cluster CC. In this example, 187 access points result in 27 clusters by clustering;
step S44: merging the position fingerprints of the access points in the same candidate cluster obtained in the step S23 to form a new position fingerprint, and calculating the similarity delta between every two position fingerprintsp,q
Figure BDA0002646413800000077
Wherein N (p, q) represents the number of MAC addresses common to both location fingerprints, and N (p) represents the number of MAC addresses in location fingerprint p;
in this example, for the merged position fingerprints, the maximum value of the pairwise similarity is 1.0, and the minimum value is 0.0;
step S45: the similarity δ -based basis is again established with reference to step S42p,qUndirected weight graph of
Figure BDA0002646413800000078
Defining another similarity threshold value thetaδAnd building an adjacency matrix
Figure BDA0002646413800000079
Refer to step S43 to obtain the final cluster set CF. In this example, the similarity threshold θδIs 0.1, a 27 × 27 adjacency matrix is established
Figure BDA00026464138000000710
The 27 position fingerprints obtain 16 clusters through clustering again;
step S46: cluster set CFMultiple location fingerprints of the same category are combined, and each combined location fingerprint represents an activity place. In this example, for each of the 16 clusters, all of the non-duplicate MAC addresses it contains are extracted from the location fingerprints in this cluster, and then these MAC addresses are combined into the location fingerprint of the new location, which is the extracted active location.
Step S50, mapping the WiFi scan record of each time of the user to the corresponding activity location according to the similarity of the location fingerprint and generating a track segment, the specific implementation steps are as follows:
step S51: calculating Jaccard similarity between each WiFi scanning record of the user and all activity places according to the position fingerprints, and taking the activity place with the highest similarity as the occurrence place of the WiFi scanning record, namely ri=(ti,pi) Represents each WiFi scanning record, tiDenotes the scan time, piAn ID representing the location of the activity. In this example, the 926 records of the user are tagged with 16 activity places by location fingerprint similarity;
step S52: in time sequence toSequencing all WiFi scanning records of the user, and if any m continuous records meet pi=pi+1=…=pi+m-1Then, the m records are merged into a track segment tr ═ t (t)star,tend,pi) Wherein t isstarRepresents ti,tendRepresents ti+m-1. In this example, the user has 926 records for 3 days that are combined into 48 track segments.
Step S60, filtering the short staying track segments, the specific implementation steps are as follows:
step S61: defining a suitable time threshold ωt. In the present example, the time threshold ωtIs 10 minutes;
step S62: for each track segment tr in step S52, retaining t thereinend-tstart>ωtThe track segment of (1). In the present example, the elapsed time threshold ωtAnd in the filtering, the number of track segments is reduced from 48 to 22, each of the rest track segments belongs to one activity place, and the 22 track segments comprise 4 activity places.
Step S70, semantization of the activity location and generation of a user track containing semantic information, the specific implementation steps are as follows:
step S71: representing the dwell time of the user at each activity site as a set Pj={(tstar,tend)1,(tstar,tend)2,...,(tstar,tend)l}. In this example, 4 activity sites are used as set P0,P1,P2,P3Represents;
step S72: each set PjUsing a 24-dimensional vector Tj=[t0,t1,...,t23]Is represented by the formula (I) in which thRepresents the average dwell time of the user at the activity site in h, thIs in the range of [0, 60 ]]. In this example, a 24-dimensional vector T for site 00Denoted as [46.2, 54.6, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0,50.2,24.0,16.8,26.2,22.6,12.0,12.0,19.4,57.4,48.0,48.0,47.2,40.0,48.0,50.6];
Step S73: 24-dimensional vector T for an activity sitejAnd according to the daily life experience, giving semantic labels to the activity places. In this example, four activity sites are represented by 24-dimensional vectors, fig. 3 shows the distribution of the stay time of the user in 24 hours in the four activity sites, and it can be seen from the figure that site 0 has a certain duration all day long and has longer stay time in the early morning and evening, and the site is the home of the user with a high probability according to the life experience; the residence time of the place 1 is longer in the daytime, the place is more likely to be a work place, and the user has certain residence time in the place from 18 o 'clock to 0 o' clock in the evening, which is probably caused by overtime of the user on a certain day; site 3 has a certain dwell time around 12 am and around 20 pm, which may be a restaurant; site 2 has a short dwell time around 16 pm, and its semantic features are not as obvious as home, work, and canteen, which may be where the user shops, eases, or otherwise moves;
step S74: generating a user track TR (TR) with semantic information, which is composed of track segments TR1,tr2,...,trn}. In this embodiment, table 1 shows the trajectories of the user for one day.
TABLE 1 example of a user's one day trace
Figure BDA0002646413800000081
Figure BDA0002646413800000091
The method is based on WiFi scanning records, the user track is extracted through the steps of merging data, extracting access places, activity places, filtering track fragments, semantization activity places and the like so as to describe user behaviors, and support is provided in the aspects of optimizing a user recommendation system, improving the service quality of enterprises, assisting smart city layout and the like.
The foregoing is only a preferred embodiment of the present invention, and although the present invention has been disclosed in the preferred embodiments, it is not intended to limit the present invention. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

Claims (9)

1. A user track extraction method based on WiFi scanning record is characterized by comprising the following steps:
step S10, filtering data which is not enough to depict the user movement behavior by taking the day as a unit;
step S20, merging data according to the intensive situation of the data in time;
step S30, filtering noise data in WiFi scanning records by calculating the frequency of each MAC address combination in the WiFi scanning records, and obtaining a plurality of position fingerprints composed of MAC addresses as access points; the frequency calculation specifically includes: for each WiFi scanning record of a user, calculating the frequency of common occurrence of any two MAC addresses in all records so as to form support degree distribution, and calculating the frequency of the WiFi scanning record according to the mean value and the variance of the support degree distribution;
step S40, further clustering to obtain an activity place from the access places according to the position fingerprint;
step S50, mapping the WiFi scanning records of each moment of the user to corresponding activity places according to the similarity of the position fingerprints and generating track fragments;
step S60, filtering the track segment staying for a short time;
step S70, semanticizing the activity place and generating a user track containing semantic information.
2. The WiFi scan recording based user trajectory extraction method according to claim 1, wherein the step S10 further includes:
step S11, mapping the data of the user every day to the corresponding 24 hours;
step S12, filtering the days with the number of hours recorded on the day less than a certain threshold value;
and step S13, filtering the days when the data volume of the current day of the user is less than a certain threshold value.
3. The WiFi scan recording based user trajectory extraction method according to claim 1, wherein the step S20 further includes:
step S21, in order to avoid the influence of excessively dense data in a certain period of time on the subsequent frequency calculation, a time threshold value is determined according to the requirement;
step S22, merging the data according to the time threshold, that is, if the time recording intervals of the multiple pieces of temporally adjacent data are smaller than the time threshold, merging the time and location fingerprints of the multiple pieces of data into one time and location fingerprint.
4. The method for extracting user trajectory based on WiFi scan record according to claim 1, wherein the step S30 further includes:
in step S31, the WiFi scan record is represented as the set W ═ { W ═ W1,w2,...,w|W|In which wi=(Ai,ti) Representing a time tiPosition fingerprint is AiWherein the location fingerprint Ai={m1,m2,...,mkIs a set of MAC addresses of APs; for each location fingerprint A in the user's WiFi scanning recordiAny two MAC addresses (a)j,ak) A support s can be calculatedj,kBy calculating to obtain a
Figure FDA0003555934840000021
A distribution formed by a support degree, whereiniI represents AiThe number of MAC addresses, the size of the support degree indicates how frequently these two MAC addresses appear in the set W:
Figure FDA0003555934840000022
wherein, c (a)j,ak) The representative set W contains aj,akNumber of location fingerprints for two MAC addresses, c (a)j) The representative set W contains ajThe number of location fingerprints for this MAC address, min {. cndot.) represents the chosen minimum, sj,k∈[0,1];
Step S32, each location fingerprint A in the WiFi scanning record of the useriCalculating the frequency f:
Figure FDA0003555934840000023
wherein,
Figure FDA0003555934840000024
and
Figure FDA0003555934840000025
the greater the frequency f, the higher the probability that such a location fingerprint appears in the set W and the more stable the MAC address combination therein is, respectively, the mean and the variance of the distribution in step S31;
step S33, arranging all records in the set W according to the frequency f in descending order, and simultaneously establishing a new set WFLocation fingerprint A for later deposit of access pointsiIn addition, Δ is defined as all non-repeating MAC addresses contained in the set W, ΔFIs a set WFAll non-duplicate MAC addresses contained therein;
step S34, scanning the position fingerprints A in the set W in the order from top to bottomiIf present, AiJoin the set WFAfter which delta can be increasedFIs the current AiContaining the set WFIf there is no MAC address, then A will be presentiJoin the set WFWhen set WFAfter including all MAC addresses in the set W, i.e. ΔFAt Δ, stop to set WFIn which is added AiAt this time, set WFAll of A contained iniIs the location fingerprint of the access point extracted.
5. The WiFi scan recording based user trajectory extraction method according to claim 1, wherein the step S40 further includes:
step S41, calculating the similarity gamma of the fingerprints of any two visiting place positions according to the Jaccard similarityp,q
Figure FDA0003555934840000026
Where N (p, q) represents the number of MAC addresses common to both location fingerprints,
Figure FDA0003555934840000027
representing the number of MAC addresses belonging to location fingerprint p but not to location fingerprint q,
Figure FDA0003555934840000028
represents the number of MAC addresses that do not belong to location fingerprint p but belong to location fingerprint q;
step S42, a undirected weight graph G ═ (V, E) is created, each access point represents a vertex VpEdge e between any two verticesp,qIs the similarity gamma between these two pointsp,qDefining a similarity threshold value thetaγAnd the similarity gamma is retainedp,qGreater than or equal to the critical value thetaγGet the graph
Figure FDA0003555934840000029
Then build a graph
Figure FDA00035559348400000210
Of a neighboring matrix
Figure FDA00035559348400000211
The value of p rows and q columns corresponds to a weight e greater than or equal to a threshold valuep,q
Step S43, calculating the map
Figure FDA00035559348400000212
Degree d of each vertex inpAnd all the vertexes V are arranged according to the degree dpIn descending order, then scan the vertices from high to low: for each vertex vpIf the vertex has not been classified, then it is taken as the center of the new class, according to the adjacency matrix
Figure FDA0003555934840000031
Expanding the adjacent points with the weight to all the vertexes classified to obtain a candidate cluster CC
Step S44, merging the position fingerprints of the access points in the same candidate cluster obtained in step S43 to form a new position fingerprint, and calculating the similarity delta between every two position fingerprintsp,q
Figure FDA0003555934840000032
Wherein N (p, q) represents the number of MAC addresses common to both location fingerprints, and N (p) represents the number of MAC addresses in location fingerprint p;
step S45, the similarity δ -based is again established with reference to step S42p,qUndirected weight graph of
Figure FDA0003555934840000033
Defining another similarity threshold value thetaδAnd building an adjacency matrix
Figure FDA0003555934840000034
Refer to step S43 to obtain the final cluster set CF
Step S46, clustering the set CFMultiple location fingerprints of the same category are combined, and each combined location fingerprint represents an activity place.
6. The WiFi scan recording based user trajectory extraction method according to claim 1, wherein the step S50 further includes:
step S51, calculating Jaccard similarity between each WiFi scanning record and all activity places of the user according to the position fingerprint, and taking the activity place with the highest similarity as the occurrence place of the WiFi scanning record, namely ri=(ti,pi) Represents each WiFi scanning record, tiDenotes the scan time, piA code number representing the location of the activity;
step S52, sequencing all WiFi scanning records of the user according to the time sequence, and if any m continuous records meet pi=pi+1=…=pi+m-1Then, the m records are merged into a track segment tr ═ t (t)start,tend,pi) Wherein t isstartRepresents ti,tendRepresents ti+m-1
7. The WiFi scan recording based user trajectory extraction method according to claim 6, wherein the step S60 further includes:
step S61, defining a time threshold value omegat
Step S62, for each track segment tr in step S50, retaining t thereinend-tstart>ωtThe track segment of (1).
8. The WiFi scan recording based user trajectory extraction method according to claim 6, wherein the step S70 further includes:
step S71, representing the stay time of the user at each activity site as a set Pj={(tstart,tend)1,(tstart,tend)2,...,(tstart,tend)l};
Step S72, collecting each set PjUsing a 24-dimensional vector Tj=[t0,t1,...,t23]Is represented by the formula (I) in which thRepresents the average dwell time of the user at the activity site in h, thIs in the range of [0, 60 ]];
Step S73, a 24-dimensional vector T for an activity sitejAccording to the daily life experience, giving a semantic label to the activity place;
step S74, generating user trajectory TR with semantic information consisting of trajectory segments TR { TR ═ TR }1,tr2,...,trn}。
9. The WiFi scan recording based user trajectory extraction method according to claim 3 or 5, wherein the location fingerprints are merged as: all non-duplicate MAC addresses it contains are extracted from the multiple location fingerprints that need to be merged and then combined into a new location fingerprint.
CN202010856149.5A 2020-08-24 2020-08-24 User track extraction method based on WiFi scanning record Active CN112104979B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010856149.5A CN112104979B (en) 2020-08-24 2020-08-24 User track extraction method based on WiFi scanning record

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010856149.5A CN112104979B (en) 2020-08-24 2020-08-24 User track extraction method based on WiFi scanning record

Publications (2)

Publication Number Publication Date
CN112104979A CN112104979A (en) 2020-12-18
CN112104979B true CN112104979B (en) 2022-05-03

Family

ID=73754318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010856149.5A Active CN112104979B (en) 2020-08-24 2020-08-24 User track extraction method based on WiFi scanning record

Country Status (1)

Country Link
CN (1) CN112104979B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117424A (en) * 2015-07-31 2015-12-02 中国科学院软件研究所 Dwell-time-based moving object semantic behavior pattern mining method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106792523B (en) * 2016-12-10 2019-12-03 武汉白虹软件科技有限公司 A kind of anomaly detection method based on extensive WiFi activity trajectory
CN106790468B (en) * 2016-12-10 2020-06-02 武汉白虹软件科技有限公司 Distributed implementation method for analyzing WiFi (Wireless Fidelity) activity track rule of user
CN110892760B (en) * 2017-08-21 2021-11-23 北京嘀嘀无限科技发展有限公司 Positioning terminal equipment based on deep learning
CN108173978A (en) * 2017-11-23 2018-06-15 浙江大学 Unmanned plane detection method based on smart machine parsing Wi-Fi MAC Address
US11562168B2 (en) * 2018-07-16 2023-01-24 Here Global B.V. Clustering for K-anonymity in location trajectory data
CN109413587A (en) * 2018-09-20 2019-03-01 广州纳斯威尔信息技术有限公司 User trajectory prediction technique based on WiFi log

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117424A (en) * 2015-07-31 2015-12-02 中国科学院软件研究所 Dwell-time-based moving object semantic behavior pattern mining method

Also Published As

Publication number Publication date
CN112104979A (en) 2020-12-18

Similar Documents

Publication Publication Date Title
Zhang et al. Uncovering inconspicuous places using social media check-ins and street view images
Jiang et al. Activity-based human mobility patterns inferred from mobile phone data: A case study of Singapore
Soto et al. Automated land use identification using cell-phone records
Liu et al. Functional zone based hierarchical demand prediction for bike system expansion
Ahas et al. Using mobile positioning data to model locations meaningful to users of mobile phones
CN103870485B (en) Method and device for achieving augmented reality application
CN106991142A (en) A kind of method that urban function region is recognized based on wechat data and interest point data
US20160205510A1 (en) Systems and methods to identify home addresses of mobile devices
EP2805531A1 (en) A method for the automatic detection and labelling of user point of interest
CN107801202A (en) A kind of user's portrait method based on WiFi accesses
Qian et al. Quantify city-level dynamic functions across China using social media and POIs data
CN101557582B (en) Method and device for mobile communication user information statistics
CN111432417A (en) Sports center site selection method based on mobile phone signaling data
CN108733692A (en) A kind of social information recommendation method and apparatus
CN113593020A (en) Large-scale three-dimensional city scene generation method based on ArcGIS
Cao et al. Understanding metropolitan crowd mobility via mobile cellular accessing data
Bordogna et al. An interoperable open data framework for discovering popular tours based on geo-tagged tweets
Yang et al. Fusing mobile phone and travel survey data to model urban activity dynamics
CN112104979B (en) User track extraction method based on WiFi scanning record
Yu et al. Multi-scale cross-city community detection of urban agglomeration using signaling big data
Li et al. Delineation of the Shanghai megacity region of China from a commuting perspective: Study based on cell phone network data in the Yangtze River Delta
CN107577727A (en) A kind of One-male unit behavioral trait analysis method
Girardin et al. Uncovering the presence and movements of tourists from user-generated content
Pearce The spatial structure of coastal tourism: a behavioural approach
CN115588086A (en) Map dividing method, map dividing device, computer readable storage medium and processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20231012

Address after: No.26, Fengnan Road, Ouhai District, Wenzhou City, Zhejiang Province 325000

Patentee after: Wenzhou Research Institute of Zhejiang University

Address before: Room 501, 5th floor, building 12, Wenzhou National University Science Park, No.26, Fengnan Road, Ouhai Economic Development Zone, Wenzhou City, Zhejiang Province 325000

Patentee before: Zhejiang Yunhe Data Technology Co.,Ltd.

TR01 Transfer of patent right