CN109068272B - Similar user identification method, device, equipment and readable storage medium - Google Patents

Similar user identification method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN109068272B
CN109068272B CN201811005730.5A CN201811005730A CN109068272B CN 109068272 B CN109068272 B CN 109068272B CN 201811005730 A CN201811005730 A CN 201811005730A CN 109068272 B CN109068272 B CN 109068272B
Authority
CN
China
Prior art keywords
wireless access
access point
fingerprint
point fingerprint
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811005730.5A
Other languages
Chinese (zh)
Other versions
CN109068272A (en
Inventor
秦博
段航
孙翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN201811005730.5A priority Critical patent/CN109068272B/en
Publication of CN109068272A publication Critical patent/CN109068272A/en
Application granted granted Critical
Publication of CN109068272B publication Critical patent/CN109068272B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W64/00Locating users or terminals or network equipment for network management purposes, e.g. mobility management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/02Hierarchically pre-organised networks, e.g. paging networks, cellular networks, WLAN [Wireless Local Area Network] or WLL [Wireless Local Loop]
    • H04W84/10Small scale networks; Flat hierarchical networks
    • H04W84/12WLAN [Wireless Local Area Networks]

Abstract

The invention provides a similar user identification method, a device, equipment and a readable storage medium, wherein the fingerprint data of each wireless access point in a preset wireless access point fingerprint database are clustered into a plurality of wireless access point fingerprint clusters; sorting the fingerprint clusters according to the number of the wireless access point fingerprint data in each wireless access point fingerprint cluster; extracting a preset cluster number in the sequencing result, wherein the wireless access point fingerprint cluster in which the number of the wireless access point fingerprint data is greater than a first preset number is used as a wireless access point fingerprint feature vector of a user; calculating the similarity between the fingerprint feature vectors of the wireless access points; and determining the similarity between corresponding users according to the similarity between the fingerprint feature vectors of the wireless access points. The problem of among the prior art because of the user produces the same position under different consumption scenes and can not realize more accurate similar user identification is solved.

Description

Similar user identification method, device, equipment and readable storage medium
Technical Field
The embodiment of the invention relates to the technical field of electronics, in particular to a method, a device, equipment and a readable storage medium for identifying similar users.
Background
While social applications and shopping applications are widespread, a large amount of user data is generated, wherein user similarity analysis based on the user data is an important aspect of user behavior analysis.
In the prior art, the common index of a user relative to a base station is calculated mainly by acquiring and extracting the position and time of occurrence of a user communication behavior, and the similarity index between different users is calculated by extracting a feature vector of the common base station of the user according to the common index.
However, although the method has universality, the coverage area of the base station is wide, so that the characterization of the characteristic vector of the base station on the user behavior is not accurate, and the problem that the same position is generated in different consumption scenes in the same market is caused.
Disclosure of Invention
The invention provides a similar user identification method, which aims to solve the problem that more accurate similar user identification cannot be realized because users generate the same position in different consumption scenes in the prior art.
According to a first aspect of the present invention, there is provided a similar user identification method, the method comprising:
clustering the fingerprint data of each wireless access point in a preset wireless access point fingerprint database into a plurality of wireless access point fingerprint clusters;
sorting the fingerprint clusters according to the number of the wireless access point fingerprint data in each wireless access point fingerprint cluster;
extracting a preset cluster number in the sequencing result, wherein the wireless access point fingerprint cluster in which the number of the wireless access point fingerprint data is greater than a first preset number is used as a wireless access point fingerprint feature vector of a user;
calculating the similarity between the fingerprint feature vectors of the wireless access points;
and determining the similarity between corresponding users according to the similarity between the fingerprint feature vectors of the wireless access points.
According to a second aspect of the present invention, there is provided a similar user identification apparatus, the apparatus comprising:
the clustering module is used for clustering the fingerprint data of each wireless access point in a preset wireless access point fingerprint database into a plurality of wireless access point fingerprint clusters;
the sorting module is used for sorting the fingerprint clusters according to the number of the wireless access point fingerprint data in each wireless access point fingerprint cluster;
the characteristic vector determining module is used for extracting the number of the preset clusters from the sequencing result, and the number of the wireless access point fingerprint clusters in which the number of the wireless access point fingerprint data is greater than the first preset number is used as the wireless access point fingerprint characteristic vector of the user;
the similarity calculation module is used for calculating the similarity between the fingerprint feature vectors of the wireless access points;
and the similar user determining module is used for determining the similarity between corresponding users according to the similarity between the fingerprint feature vectors of the wireless access points.
According to a third aspect of the invention, there is provided an apparatus comprising:
a processor, a memory and a computer program stored on the memory and executable on the processor, characterized in that the processor implements a similar user identification method as described before when executing the program.
According to a fourth aspect of the present invention, there is provided a readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the aforementioned similar user identification method.
The embodiment of the invention provides a similar user identification method, a device, equipment and a readable storage medium, wherein each wireless access point fingerprint data in a preset wireless access point fingerprint database is clustered into a plurality of wireless access point fingerprint clusters; sorting the fingerprint clusters according to the number of the wireless access point fingerprint data in each wireless access point fingerprint cluster; extracting a preset cluster number in the sequencing result, wherein the wireless access point fingerprint cluster in which the number of the wireless access point fingerprint data is greater than a first preset number is used as a wireless access point fingerprint feature vector of a user; calculating the similarity between the fingerprint feature vectors of the wireless access points; and determining the similarity between corresponding users according to the similarity between the fingerprint feature vectors of the wireless access points. The method and the device solve the problem that in the prior art, the representation of the user behavior is not accurate enough, so that the user generates the same position in different consumption scenes. The method has the advantages of being more suitable for indoor scenes, subdividing user scenes and improving the similarity judgment precision of users.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
Fig. 1 is a flowchart illustrating steps of a similar user identification method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating steps of a similar user identification method according to an embodiment of the present invention;
FIG. 3 is a block diagram of a similar subscriber identity device provided by an embodiment of the present invention;
fig. 4 is a structural diagram of a similar subscriber identity device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The following first introduces terms involved in embodiments of the present invention:
Wi-Fi fingerprint: the GPS is difficult to solve some positioning problems in indoor environment, and WiFi exists in most indoor environments, so that additional hardware equipment does not need to be deployed when positioning is carried out by utilizing WiFi, and the method is very cost-saving. General outdoor positioning facilities (such as GPS) do not work efficiently inside buildings due to severe signal attenuation and multipath effects. Positioning accuracy is also a problem, GPS may indicate which building the mobile device is in, but in an indoor setting, one would like to have a more accurate indoor location, which requires more precise map information and higher positioning accuracy. The first consideration for the wireless signal based positioning method is to use WiFi (WLAN based on IEEE802.11 standard) as an infrastructure positioning facility.
However, WiFi signals are not designed for positioning, and are usually single antenna, small in bandwidth, and the complex signal propagation environment in the room makes the traditional time of arrival/time difference of arrival (TOA/TDOA) based ranging method difficult to implement, and the method based on the angle of the arriving signal is also difficult to implement, and if a directional antenna is installed in the WiFi network, additional cost is required. Therefore, in recent years, a location fingerprinting method has been studied in detail.
WiFi is widely used in various large or small buildings such as homes, hotels, cafes, airports, shopping malls, etc., which makes WiFi a most attractive wireless technology in the field of positioning. Typically, a WiFi system consists of fixed Access Points (APs) deployed in a room at convenient locations, the locations of which are typically known to a system or network administrator. WiFi-capable mobile devices (e.g., laptops, mobile phones) can communicate with each other directly or indirectly (via APs), and thus it is contemplated to implement positioning functionality in addition to communication functionality.
Where a "location fingerprint" relates a location in the actual environment to a certain "fingerprint", a location corresponding to a unique fingerprint. A Wi-Fi hotspot may correspond to a unique fingerprint that maps to a particular location, which may be one or more dimensions, such as where the device to be located is receiving or sending information, and the fingerprint may be a characteristic or characteristics of the information or signal (most commonly signal strength). If the device to be positioned is transmitting signals, some fixed receiving devices sense the signals or information of the device to be positioned and then position it, which is often called remote positioning or network positioning. If the device to be positioned receives signals or information of some fixed sending devices, and then estimates the position of the device according to the detected characteristics, the mode can be called self-positioning. A mobile device to be positioned may communicate the features it detects to a server node in the network, which may use all of the information it can obtain to estimate the location of the mobile device, a manner known as hybrid positioning. In all of these approaches, the perceived signal features need to be matched to signal features in a database, a process that can be viewed as a pattern recognition problem.
The location fingerprint can be of various types, and any "location-unique" (helpful in distinguishing locations) feature can be used as a location fingerprint. Such as the multipath structure of the communication signal at a location, whether an access point or base station can be detected at a location, the RSS (received signal strength) of the signal from the base station detected at a location, the round trip time or delay of the signal when communicated at a location, can be used as a location fingerprint, or can be combined to form a location fingerprint. We describe the two most common signal characteristics below: multipath structure, RSS.
Example one
Referring to fig. 1, a flowchart of steps of a similar user identification method is shown, which includes the following specific steps:
step 101, clustering the fingerprint data of each wireless access point in a preset wireless access point fingerprint database into a plurality of wireless access point fingerprint clusters.
In the embodiment of the invention, when a user carries a mobile terminal to conduct business behaviors in an application scene, the mobile terminal can periodically acquire the positioning data of the user, wherein a GPS solution and longitude and latitude labels generated by the GPS solution are the accepted standards of the current geographic position data and are the basic modes of most smart phones for acquiring the geographic position of the user. The mobile phone can obtain relevant data as long as the user turns on the GPS positioning function, and when the GPS chip of the mobile device cannot receive GPS signals, the mobile device needs to communicate with a cell phone signal tower connected with the mobile device and estimate the distance between the mobile device and the signal tower so as to continuously report the geographic position of the mobile device, however, the geographic position data obtained by the method is not as accurate as pure GPS data. In the embodiment of the application, Wi-Fi connection is taken as an example, a method for acquiring user positioning data by utilizing Wi-Fi connection is a method capable of accurately acquiring geographic position data, but effective Wi-Fi hotspots need to be used, addresses of Wi-Fi are in one-to-one correspondence with GPS coordinates, the positions of users can be accurately marked, in many user consumption places, many retailers provide free Wi-Fi hotspots, Wi-Fi fingerprint data in the Wi-Fi hotspots aiming at the current positions of the users are randomly extracted and clustered, and Wi-Fi fingerprint categories are generated.
In the present application, the cosine similarity between Wi-Fi fingerprint data is used for clustering, however, in practical application, the clustering method is not limited, and the embodiment of the present invention is not limited thereto.
And step 102, sorting the fingerprint clusters according to the number of the wireless access point fingerprint data in each wireless access point fingerprint cluster.
In the embodiment of the invention, after the clustering result of the Wi-Fi fingerprint data is obtained, the Wi-Fi fingerprint data are sequenced from high to low according to the number of the Wi-Fi fingerprint data in each category.
Step 103, extracting a preset cluster number from the sorting result, wherein the wireless access point fingerprint cluster in which the number of the wireless access point fingerprint data is greater than a first preset number is used as a wireless access point fingerprint feature vector of the user.
In the embodiment of the invention, in the sorting result obtained in step 102, the top M categories are screened, and the category of which the Wi-Fi fingerprint data in each category exceeds the first preset data PrintVal is used as the Wi-Fi fingerprint class characteristics of the user.
And 104, calculating the similarity between the fingerprint feature vectors of the wireless access points.
In the embodiment of the invention, the similarity between any two fingerprint features is calculated according to the following formula:
Figure BDA0001783953210000061
wherein S is the similarity between Wi-Fi fingerprint characteristics of users a and b, WaAnd WbWi-Fi fingerprint class for users a and b, Wa·WbIs the cosine similarity of the fingerprint, xaAnd ybAre respectively a fingerprint Wa,WbNumber of fingerprintsAmount of the compound (A).
And 105, determining the similarity between corresponding users according to the similarity between the fingerprint feature vectors of the wireless access points.
In the embodiment of the invention, the similarity between the two fingerprint features is calculated according to the steps, the similarity between corresponding users can be obtained, and if the similarity S is greater than a preset value, the two users are considered to have similar behavior features.
It should be understood that the wireless access point in the embodiment of the present invention is illustrated by taking WiFi as an example, and in practical applications, it is also possible to perform wireless access through bluetooth, a mobile phone hotspot, and the like, which is not limited by the embodiment of the present invention.
In summary, in the method for identifying similar users provided in the embodiments of the present invention, each wireless access point fingerprint data in a preset wireless access point fingerprint database is clustered into a plurality of wireless access point fingerprint clusters; sorting the fingerprint clusters according to the number of the wireless access point fingerprint data in each wireless access point fingerprint cluster; extracting a preset cluster number in the sequencing result, wherein the wireless access point fingerprint cluster in which the number of the wireless access point fingerprint data is greater than a first preset number is used as a wireless access point fingerprint feature vector of a user; calculating the similarity between the fingerprint feature vectors of the wireless access points; and determining the similarity between corresponding users according to the similarity between the fingerprint feature vectors of the wireless access points. The method and the device solve the problem that in the prior art, the representation of the user behavior is not accurate enough, so that the user generates the same position in different consumption scenes. The method has the advantages of being more suitable for indoor scenes, subdividing user scenes and improving the similarity judgment precision of users.
Example two
Referring to fig. 2, a flow chart of steps of a similar user identification method is shown, which includes the following specific steps:
step 201, acquiring network positioning data of a user.
In the embodiment of the invention, when a user moves in an indoor scene, the network positioning data of the user is obtained through the Wi-Fi hotspot, the Wi-Fi fingerprint data in the network positioning data are extracted, and invalid Wi-Fi, such as mobile Wi-Fi, large Wi-Fi and Wi-Fi with weak signal strength, is removed.
Step 202, extracting corresponding wireless access point fingerprint data and reporting time from the network positioning data.
In the embodiment of the invention, when the corresponding Wi-Fi fingerprint data is extracted from the network positioning data, the reporting time of the network positioning data is extracted at the same time.
Specifically, each wireless AP (router) has a globally unique MAC address, and the wireless AP will not generally move for a period of time, when the device is on Wi-Fi, i.e., surrounding AP signals, whether encrypted or not, or connected, may be scanned and collected, even if the signal strength is not sufficient to be displayed in the wireless signal list, the MAC address broadcast by the AP may be obtained, the device sends the data identifying the AP to a location server, the server retrieves the geographical location of each AP, and the geographical position of the equipment is calculated and returned to the user equipment by combining the intensity degree of each signal, the position service provider needs to continuously update and supplement the own database, the accuracy of the data is guaranteed, and when the position data of the current user equipment is extracted, the current network time is acquired at the same time.
Step 203, calculating the time interval of each wireless access point fingerprint data according to the reporting time.
In the embodiment of the invention, the time interval is calculated through each piece of extracted Wi-Fi fingerprint data and the corresponding reporting time, and if the time interval is smaller than the preset time interval, the time interval is not reserved, for example, if the two positioning time intervals are smaller than one second, the positioning data acquired later is not reserved, because in practical application, the time of one second is not enough for the positioning data of the user to be changed to a greater extent.
Of course, in practical applications, the preset time is set by a related technician according to actual requirements, and the embodiment of the invention is not limited thereto.
And 204, extracting the wireless access point fingerprint data of which the time interval is greater than the preset time interval and the signal intensity is greater than the preset intensity, and generating a preset wireless access point fingerprint database.
In the embodiment of the invention, the Wi-Fi fingerprint data of a plurality of different Wi-Fi hotspots acquired according to the position of the current user are stored, wherein the time interval is larger than the preset time interval, and the Wi-Fi fingerprint data has enough signal strength, so that a preset Wi-Fi fingerprint database is generated.
Step 205, randomly selecting a second preset number of wireless access point fingerprint data from the preset wireless access point fingerprint database.
And step 206, generating an initial wireless access point fingerprint cluster according to the relation between the first cosine similarity among the second preset number of wireless access point fingerprint data and a preset threshold value.
In the embodiment of the invention, N fingerprint data are randomly selected from a generated Wi-Fi preset fingerprint database to serve as N fingerprint clusters, the number of fingerprints contained in each class is 1, the similarity between the N fingerprint classes is respectively calculated, if the similarity exceeds a preset threshold value, the similarity between the two classes is confirmed and is combined into one class until all the N classes are compared, and finally Nf initial fingerprint clusters are obtained, wherein Nf is less than or equal to N due to the combination of the classes.
Preferably, step 206 specifically includes: substeps A1-A4;
substep a1, setting the second preset number of wireless access point fingerprint data to a second preset number of initial wireless access point fingerprint clusters, each initial wireless access point fingerprint cluster containing one wireless access point fingerprint data;
substep A2, calculating a first cosine similarity between the initial wireless access point fingerprint clusters;
substep A3, merging the initial wireless access point fingerprint clusters with cosine similarity greater than a preset threshold into an initial wireless access point fingerprint cluster;
and a substep A4, keeping the initial wireless access point fingerprint cluster with cosine similarity smaller than the preset threshold value unchanged.
Specifically, N Wi-Fi fingerprint data are randomly selected from a Wi-Fi preset fingerprint database to serve as N fingerprint clusters, namely the number of Wi-Fi fingerprints contained in each cluster is 1, then cosine similarity between the N fingerprint clusters is respectively calculated, for example, if the cosine similarity SKnn of the fingerprints of the fingerprint cluster A and the fingerprint cluster B exceeds a preset threshold value VWEight, the fingerprint cluster A and the fingerprint cluster B are merged into one fingerprint cluster, the Wi-Fi fingerprint data in the fingerprint cluster A is selected to serve as the characteristic of the fingerprint cluster, the fingerprint number of the cluster A is increased by 1, the cosine similarity is calculated between the fingerprint cluster C and the merged fingerprint cluster A, if the cosine similarity SKNNEI of the fingerprints of the fingerprint cluster A and the fingerprint cluster C exceeds the preset threshold value VWEIight, the fingerprint cluster C and the fingerprint cluster A are merged, and still taking Wi-Fi fingerprint data in the fingerprint cluster clusterica as the characteristics of the fingerprint class, keeping two fingerprint clusters unchanged, calculating cosine similarity with other fingerprint clusters, increasing the number of the fingerprints of the cluster by 1, keeping the number of the fingerprints of the cluster unchanged at the moment, keeping the two clusters unchanged if the cosine similarity SKnn of the fingerprints of the fingerprint cluster and the fingerprint cluster B does not exceed a preset threshold value VWEight, then comparing the two clusters with other fingerprints of the cluster and the fingerprint cluster B, and repeatedly executing the merging steps described above.
And by analogy, calculating cosine similarity among the N fingerprint cluster clusters, combining the clusters with the cosine similarity exceeding a preset threshold, keeping unchanged the clusters without exceeding the preset threshold, and finally obtaining Nf initial fingerprint clusters, wherein Nf is less than or equal to N.
Step 207, respectively calculating a second cosine similarity between the remaining wireless access point fingerprint data in the preset wireless access point fingerprint database and the original wireless access point fingerprint data in each original wireless access point fingerprint cluster.
And step 208, clustering the wireless access point fingerprint data into a plurality of wireless access point fingerprint clusters according to the relationship between the second cosine similarity and the preset threshold.
In the embodiment of the invention, after Nf initial Wi-Fi fingerprint clusters are obtained, the cosine similarity, namely the second cosine similarity, of other Wi-Fi fingerprint data left in the preset Wi-Fi fingerprint database is calculated with the Nf initial fingerprint clusters one by one, then the cosine similarity is compared with a preset threshold value, and each Wi-Fi fingerprint data is clustered according to the comparison result until all Wi-Fi fingerprint data are clustered.
Preferably, step 208 specifically includes: substeps B1-B2;
sub-step B1, if the second cosine similarity is greater than the preset threshold, adding the wireless access point fingerprint data to the corresponding initial wireless access point fingerprint cluster to generate a wireless access point fingerprint cluster;
and a substep B2, if the cosine similarity is smaller than a preset threshold, setting the fingerprint data of the wireless access point as a new fingerprint cluster of the wireless access point.
Specifically, for example, the cosine similarity SKnn between other Wi-Fi fingerprint printas and the Nf Wi-Fi fingerprint classes is calculated, and the Wi-Fi fingerprint class ClusterA with the largest similarity is selected. If the cosine similarity of the two Wi-Fi fingerprints exceeds a threshold value VWEight, the number of the fingerprints contained in the ClusterA is increased by 1; if the fingerprint number is smaller than the threshold VWEight, a fingerprint class ClusterB with the printA as the fingerprint feature is added, and the number of the fingerprints contained in the ClusterB is 1. And repeating the clustering step until all the Wi-Fi fingerprint data in the preset Wi-Fi fingerprint database are distributed to the respective fingerprint class.
Step 209, sorting the fingerprint clusters according to the number of the wireless access point fingerprint data in each wireless access point fingerprint cluster.
This step is the same as step 102 and will not be described in detail here.
Step 210, extracting a preset cluster number from the sorting result, wherein the wireless access point fingerprint cluster in which the number of the wireless access point fingerprint data is greater than a first preset number is used as a wireless access point fingerprint feature vector of the user.
In the embodiment of the invention, after all Wi-Fi fingerprints are clustered, according to the number of the fingerprints contained in each Wi-Fi fingerprint category, the first MCluster fingerprints are extracted from high to low, and the Wi-FiCluster-like fingerprints with the number NPrint exceeding PrintVal are used as the Wi-Fi fingerprint-like characteristics of the user.
And step 211, calculating the similarity between the fingerprint feature vectors of the wireless access points.
This step is the same as step 104 and will not be described in detail here.
Step 212, determining the similarity between corresponding users according to the similarity between the fingerprint feature vectors of the wireless access points.
This step is the same as step 105 and will not be described in detail.
In summary, the similar user identification method provided in the embodiment of the present invention obtains the network location data of the user; extracting corresponding wireless access point fingerprint data and reporting time from the network positioning data; calculating the time interval of each wireless access point fingerprint data according to the reporting time; and extracting the wireless access point fingerprint data of which the time interval is greater than the preset time interval and the signal intensity is greater than the preset intensity, and generating a preset wireless access point fingerprint database. Clustering the fingerprint data of each wireless access point in a preset wireless access point fingerprint database into a plurality of wireless access point fingerprint clusters; sorting the fingerprint clusters according to the number of the wireless access point fingerprint data in each wireless access point fingerprint cluster; extracting a preset cluster number in the sequencing result, wherein the wireless access point fingerprint cluster in which the number of the wireless access point fingerprint data is greater than a first preset number is used as a wireless access point fingerprint feature vector of a user; calculating the similarity between the fingerprint feature vectors of the wireless access points; and determining the similarity between corresponding users according to the similarity between the fingerprint feature vectors of the wireless access points. The method and the device solve the problem that in the prior art, the representation of the user behavior is not accurate enough, so that the user generates the same position in different consumption scenes. The method has the advantages of being more suitable for indoor scenes, subdividing user scenes and improving the similarity judgment precision of users.
EXAMPLE III
Referring to fig. 3, a block diagram of a similar user identification device is shown, which is as follows:
a clustering module 301, configured to cluster each wireless access point fingerprint data in a preset wireless access point fingerprint database into a plurality of wireless access point fingerprint clusters;
a sorting module 302, configured to sort the fingerprint clusters according to the number of the wireless access point fingerprint data in each of the wireless access point fingerprint clusters;
a feature vector determining module 303, configured to extract a preset number of clusters from the sorting result, where the number of the wireless access point fingerprint data in the fingerprint cluster is greater than a first preset number of wireless access point fingerprint clusters, and use the wireless access point fingerprint cluster as a wireless access point fingerprint feature vector of the user;
a similarity calculation module 304, configured to calculate a similarity between the fingerprint feature vectors of the wireless access points;
a similar user determining module 305, configured to determine similarity between corresponding users according to similarity between the fingerprint feature vectors of the wireless access points.
Referring to fig. 4, it shows a block diagram of another similar user identification apparatus based on the embodiment of fig. 3, which is as follows:
a positioning data obtaining module 306, configured to obtain network positioning data of a user;
a wireless access point fingerprint data and reporting time extracting module 307, configured to extract corresponding wireless access point fingerprint data and reporting time from the network positioning data;
a time interval calculation module 308, configured to calculate a time interval of each wireless access point fingerprint data according to the reporting time;
the preset wireless access point fingerprint database generating module 309 is configured to extract the wireless access point fingerprint data of which the time interval is greater than the preset time interval and the signal strength is greater than the preset strength, and generate the preset wireless access point fingerprint database.
A clustering module 301, configured to cluster each wireless access point fingerprint data in a preset wireless access point fingerprint database into a plurality of wireless access point fingerprint clusters;
preferably, the clustering module 301 includes:
a wireless access point fingerprint data selecting sub-module 3011, configured to randomly select a second preset number of wireless access point fingerprint data from the preset wireless access point fingerprint database;
an initial wireless access point fingerprint cluster generating sub-module 3012, configured to generate an initial wireless access point fingerprint cluster according to a relationship, between the first cosine similarity between the second preset number of wireless access point fingerprint data and a preset threshold;
preferably, the initial wireless access point fingerprint cluster generating sub-module 3012 includes:
an initial wireless access point fingerprint cluster setting unit, configured to set the second preset number of wireless access point fingerprint data as a second preset number of initial wireless access point fingerprint clusters, where each initial wireless access point fingerprint cluster includes one wireless access point fingerprint data;
the first cosine similarity calculation unit is used for calculating first cosine similarity between the initial wireless access point fingerprint clusters;
the merging unit is used for merging the initial wireless access point fingerprint clusters with cosine similarity larger than a preset threshold into an initial wireless access point fingerprint cluster;
and the initial wireless access point fingerprint cluster retaining unit is used for keeping the initial wireless access point fingerprint cluster with cosine similarity smaller than the preset threshold unchanged.
A second cosine similarity operator module 3013, configured to respectively calculate second cosine similarities between the remaining fingerprint data of the wireless access point in the preset wireless access point fingerprint database and the original fingerprint data of the wireless access point in each original wireless access point fingerprint cluster;
and the clustering submodule 3014 is configured to cluster the fingerprint data of the wireless access point into a plurality of fingerprint clusters of the wireless access point according to a relationship between the second cosine similarity and the preset threshold.
Preferably, the clustering sub-module 3014 includes:
a wireless access point fingerprint category generating unit, configured to add the wireless access point fingerprint data to the corresponding initial wireless access point fingerprint cluster to generate a wireless access point fingerprint cluster if the second cosine similarity is greater than the preset threshold;
and the wireless access point fingerprint category setting unit is used for setting the wireless access point fingerprint data as a new wireless access point fingerprint cluster if the cosine similarity is smaller than a preset threshold.
A sorting module 302, configured to sort the fingerprint clusters according to the number of the wireless access point fingerprint data in each of the wireless access point fingerprint clusters;
a feature vector determining module 303, configured to extract a preset number of clusters from the sorting result, where the number of the wireless access point fingerprint data in the fingerprint cluster is greater than a first preset number of wireless access point fingerprint clusters, and use the wireless access point fingerprint cluster as a wireless access point fingerprint feature vector of the user;
a similarity calculation module 304, configured to calculate a similarity between the fingerprint feature vectors of the wireless access points;
a similar user determining module 305, configured to determine similarity between corresponding users according to similarity between the fingerprint feature vectors of the wireless access points.
An embodiment of the present invention further provides an apparatus, including: a processor, a memory, and a computer program stored on the memory and executable on the processor, wherein the processor implements a similar user identification method as described in one or more of the above when executing the program.
Embodiments of the present invention also provide a readable storage medium, and when instructions in the storage medium are executed by a processor of an electronic device, the electronic device is enabled to execute the similar user identification method.
In summary, the similar user identification apparatus provided in the embodiment of the present invention is configured to obtain the network location data of the user through the location data obtaining module; then, the wireless access point fingerprint data and reporting time extraction module is used for extracting corresponding wireless access point fingerprint data and reporting time from the network positioning data; the reuse time interval calculation module is used for calculating the time interval of the fingerprint data of each wireless access point according to the reporting time; and then, a preset wireless access point fingerprint database generating module is used for extracting the wireless access point fingerprint data of which the time interval is greater than the preset time interval and the signal intensity is greater than the preset intensity, and generating a preset wireless access point fingerprint database. The clustering module is used for clustering the fingerprint data of each wireless access point in a preset wireless access point fingerprint database into a plurality of wireless access point fingerprint clusters; the sorting module is used for sorting the fingerprint clusters according to the number of the wireless access point fingerprint data in each wireless access point fingerprint cluster; the characteristic vector determining module is used for extracting the number of the preset clusters from the sequencing result, and the number of the wireless access point fingerprint clusters in which the number of the wireless access point fingerprint data is greater than the first preset number is used as the wireless access point fingerprint characteristic vector of the user; the similarity calculation module is used for calculating the similarity between the fingerprint feature vectors of the wireless access points; and finally, determining the similarity between corresponding users according to the similarity between the fingerprint feature vectors of the wireless access points by a similar user determination module. The method and the device solve the problem that in the prior art, the representation of the user behavior is not accurate enough, so that the user generates the same position in different consumption scenes. The method has the advantages of being more suitable for indoor scenes, subdividing user scenes and improving the similarity judgment precision of users.
It has the following advantages:
one is as follows: the Wi-Fi fingerprints are used for clustering instead of the user positions and serve as the user feature points, and the method is more suitable for indoor scenes.
Secondly, the user similarity is calculated according to the similarity of Wi-Fi fingerprint feature vectors of different users, so that the user scene is subdivided, and the user judgment precision is high.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in a payment information processing apparatus according to embodiments of the present invention. The invention may also be embodied as an apparatus or device program (e.g., a computer program and computer program product data) for carrying out a part or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (12)

1. A method for identifying similar users, the method comprising:
clustering the fingerprint data of each wireless access point in a preset wireless access point fingerprint database into a plurality of wireless access point fingerprint clusters;
sorting the fingerprint clusters according to the number of the wireless access point fingerprint data in each wireless access point fingerprint cluster;
extracting a preset cluster number in the sequencing result, wherein the wireless access point fingerprint cluster in which the number of the wireless access point fingerprint data is greater than a first preset number is used as a wireless access point fingerprint feature vector of a user;
calculating the similarity between the fingerprint feature vectors of the wireless access points;
and determining the similarity between corresponding users according to the similarity between the fingerprint feature vectors of the wireless access points.
2. The method of claim 1, further comprising, prior to the step of clustering each of the wireless access point fingerprint data in a preset wireless access point fingerprint database into a plurality of wireless access point fingerprint clusters:
acquiring network positioning data of a user;
extracting corresponding wireless access point fingerprint data and reporting time from the network positioning data;
calculating the time interval of the fingerprint data of each wireless access point according to the reporting time;
and extracting the wireless access point fingerprint data of which the time interval is greater than the preset time interval and the signal intensity is greater than the preset intensity, and generating a preset wireless access point fingerprint database.
3. The method of claim 1, wherein the step of clustering each of the wireless access point fingerprint data in a predetermined wireless access point fingerprint database into a plurality of wireless access point fingerprint clusters comprises:
randomly selecting a second preset number of wireless access point fingerprint data from the preset wireless access point fingerprint database;
generating an initial wireless access point fingerprint cluster according to the relation between the first cosine similarity among the second preset number of wireless access point fingerprint data and a preset threshold;
respectively calculating second cosine similarity between the remaining wireless access point fingerprint data in the preset wireless access point fingerprint database and original wireless access point fingerprint data, wherein the original wireless access point fingerprint data are fingerprint data in the initial wireless access point fingerprint cluster;
and clustering the wireless access point fingerprint data into a plurality of wireless access point fingerprint clusters according to the relationship between the second cosine similarity and the preset threshold.
4. The method according to claim 3, wherein the step of generating an initial wireless access point fingerprint cluster according to the relationship between the first cosine similarity between the second preset number of wireless access point fingerprint data and a preset threshold comprises:
setting the fingerprint data of the second preset number of wireless access points as fingerprint clusters of a second preset number of initial wireless access points, wherein each fingerprint cluster of the initial wireless access points comprises one fingerprint data of the wireless access points;
calculating first cosine similarity between the initial wireless access point fingerprint clusters;
combining the initial wireless access point fingerprint clusters with cosine similarity larger than a preset threshold value into an initial wireless access point fingerprint cluster;
keeping the initial wireless access point fingerprint cluster with cosine similarity smaller than the preset threshold unchanged.
5. The method according to claim 3, wherein the step of clustering the fingerprint data of the wireless access point into a plurality of fingerprint clusters of the wireless access point according to the relationship between the second cosine similarity and the preset threshold comprises:
if the second cosine similarity is larger than the preset threshold, adding the wireless access point fingerprint data to the corresponding initial wireless access point fingerprint cluster to generate a wireless access point fingerprint cluster;
and if the cosine similarity is smaller than a preset threshold value, setting the fingerprint data of the wireless access point as a new fingerprint cluster of the wireless access point.
6. A similar user identification device, the device comprising:
the clustering module is used for clustering the fingerprint data of each wireless access point in a preset wireless access point fingerprint database into a plurality of wireless access point fingerprint clusters;
the sorting module is used for sorting the fingerprint clusters according to the number of the wireless access point fingerprint data in each wireless access point fingerprint cluster;
the characteristic vector determining module is used for extracting the number of the preset clusters from the sequencing result, and the number of the wireless access point fingerprint clusters in which the number of the wireless access point fingerprint data is greater than the first preset number is used as the wireless access point fingerprint characteristic vector of the user;
the similarity calculation module is used for calculating the similarity between the fingerprint feature vectors of the wireless access points;
and the similar user determining module is used for determining the similarity between corresponding users according to the similarity between the fingerprint feature vectors of the wireless access points.
7. The apparatus of claim 6, further comprising:
the positioning data acquisition module is used for acquiring network positioning data of a user;
the wireless access point fingerprint data and reporting time extracting module is used for extracting corresponding wireless access point fingerprint data and reporting time from the network positioning data;
the time interval calculation module is used for calculating the time interval of the fingerprint data of each wireless access point according to the reporting time;
and the preset wireless access point fingerprint database generating module is used for extracting the wireless access point fingerprint data of which the time interval is greater than the preset time interval and the signal intensity is greater than the preset intensity, and generating the preset wireless access point fingerprint database.
8. The apparatus of claim 6, wherein the clustering module comprises:
the wireless access point fingerprint data selection sub-module is used for randomly selecting a second preset number of wireless access point fingerprint data from the preset wireless access point fingerprint database;
the initial wireless access point fingerprint cluster generating sub-module is used for generating an initial wireless access point fingerprint cluster according to the relation between the first cosine similarity between the second preset number of wireless access point fingerprint data and a preset threshold value;
the second cosine similarity calculation operator module is used for respectively calculating second cosine similarity between the remaining wireless access point fingerprint data in the preset wireless access point fingerprint database and original wireless access point fingerprint data, and the original wireless access point fingerprint data are fingerprint data in the initial wireless access point fingerprint cluster;
and the clustering submodule is used for clustering the wireless access point fingerprint data into a plurality of wireless access point fingerprint clusters according to the relation between the second cosine similarity and the preset threshold value.
9. The apparatus of claim 8, wherein the initial wireless access point fingerprint category generation sub-module comprises:
an initial wireless access point fingerprint cluster setting unit, configured to set the second preset number of wireless access point fingerprint data as a second preset number of initial wireless access point fingerprint clusters, where each initial wireless access point fingerprint cluster includes one wireless access point fingerprint data;
the first cosine similarity calculation unit is used for calculating first cosine similarity between the initial wireless access point fingerprint clusters;
the merging unit is used for merging the initial wireless access point fingerprint clusters with cosine similarity larger than a preset threshold into an initial wireless access point fingerprint cluster;
and the initial wireless access point fingerprint cluster retaining unit is used for keeping the initial wireless access point fingerprint cluster with cosine similarity smaller than the preset threshold unchanged.
10. The apparatus of claim 8, the clustering submodule, comprising:
a wireless access point fingerprint category generating unit, configured to add the wireless access point fingerprint data to the corresponding initial wireless access point fingerprint cluster to generate a wireless access point fingerprint cluster if the second cosine similarity is greater than the preset threshold;
and the wireless access point fingerprint category setting unit is used for setting the wireless access point fingerprint data as a new wireless access point fingerprint cluster if the cosine similarity is smaller than a preset threshold.
11. An apparatus, comprising:
processor, memory and computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, implements a similar user identification method as claimed in one or more of claims 1-5.
12. A readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform a similar user identification method as described in one or more of method claims 1-5.
CN201811005730.5A 2018-08-30 2018-08-30 Similar user identification method, device, equipment and readable storage medium Active CN109068272B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811005730.5A CN109068272B (en) 2018-08-30 2018-08-30 Similar user identification method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811005730.5A CN109068272B (en) 2018-08-30 2018-08-30 Similar user identification method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN109068272A CN109068272A (en) 2018-12-21
CN109068272B true CN109068272B (en) 2021-01-08

Family

ID=64758782

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811005730.5A Active CN109068272B (en) 2018-08-30 2018-08-30 Similar user identification method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN109068272B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245132B (en) * 2019-06-12 2023-10-31 腾讯科技(深圳)有限公司 Data anomaly detection method, device, computer readable storage medium and computer equipment
CN110730432B (en) * 2019-10-21 2021-01-08 深圳市名通科技股份有限公司 Proximity user identification method, terminal and readable storage medium
CN112738724B (en) * 2020-12-17 2022-09-23 福建新大陆软件工程有限公司 Method, device, equipment and medium for accurately identifying regional target crowd
CN113840392B (en) * 2021-09-17 2023-09-22 杭州云深科技有限公司 User intimacy determination method, device, computer equipment and storage medium
CN115766204A (en) * 2022-11-14 2023-03-07 电子科技大学 Dynamic IP equipment identification system and method for encrypted flow
CN117237804B (en) * 2023-09-15 2024-02-13 江苏三棱智慧物联发展股份有限公司 Pyrotechnical recognition system and method based on federal learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102638888A (en) * 2012-03-19 2012-08-15 华中科技大学 Indoor positioning fingerprint grouping method based on signal statistics
CN103476115A (en) * 2013-09-22 2013-12-25 中国地质大学(武汉) Method for WiFi fingerprint positioning based on AP set similarity
CN104684083A (en) * 2015-03-26 2015-06-03 哈尔滨工业大学 AP (access point) selecting method based on clustering idea
CN106060779A (en) * 2016-07-18 2016-10-26 北京方位捷讯科技有限公司 Fingerprint feature matching method and device
CN206272854U (en) * 2016-11-14 2017-06-20 成都信息工程大学 A kind of social networks construction device based on WiFi network linkage record
CN107835498A (en) * 2017-10-18 2018-03-23 上海掌门科技有限公司 A kind of method and apparatus for being used to manage user
CN108234686A (en) * 2017-12-20 2018-06-29 中国联合网络通信集团有限公司 A kind of method and apparatus of indoor and outdoor judgement

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9846801B2 (en) * 2015-11-16 2017-12-19 MorphoTrak, LLC Minutiae grouping for distorted fingerprint matching

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102638888A (en) * 2012-03-19 2012-08-15 华中科技大学 Indoor positioning fingerprint grouping method based on signal statistics
CN103476115A (en) * 2013-09-22 2013-12-25 中国地质大学(武汉) Method for WiFi fingerprint positioning based on AP set similarity
CN104684083A (en) * 2015-03-26 2015-06-03 哈尔滨工业大学 AP (access point) selecting method based on clustering idea
CN106060779A (en) * 2016-07-18 2016-10-26 北京方位捷讯科技有限公司 Fingerprint feature matching method and device
CN206272854U (en) * 2016-11-14 2017-06-20 成都信息工程大学 A kind of social networks construction device based on WiFi network linkage record
CN107835498A (en) * 2017-10-18 2018-03-23 上海掌门科技有限公司 A kind of method and apparatus for being used to manage user
CN108234686A (en) * 2017-12-20 2018-06-29 中国联合网络通信集团有限公司 A kind of method and apparatus of indoor and outdoor judgement

Also Published As

Publication number Publication date
CN109068272A (en) 2018-12-21

Similar Documents

Publication Publication Date Title
CN109068272B (en) Similar user identification method, device, equipment and readable storage medium
US9918297B2 (en) Location measuring method and apparatus using access point for wireless local area network service and method for estimating location coordinate of access point
EP2676501B1 (en) Methods, apparatuses and computer program products for providing a private and efficient geolocation system
Pei et al. Using inquiry-based Bluetooth RSSI probability distributions for indoor positioning
CN104010364B (en) For determining the method and system in the geographical location of the estimation of base station
KR101627544B1 (en) Device and method for making wi-fi radio map
US9077548B2 (en) Method and apparatus for providing differential location-based service using access point
US7389114B2 (en) Estimating the location of inexpensive wireless terminals by using signal strength measurements
CN108450060B (en) Positioning method and device based on WI-FI access point
US8862154B2 (en) Location measuring method and apparatus using access point for wireless local area network service
US20140211691A1 (en) System and method for choosing suitable access points in quips autarkic deployment
US9380472B2 (en) Method and apparatus for updating access point information for location measurement
CN109275090B (en) Information processing method, device, terminal and storage medium
WO2014180219A1 (en) Locating method, device and terminal and computer storage medium
Schmidt et al. A performance study of a fast-rate WLAN fingerprint measurement collection method
US20140228058A1 (en) System for estimating position of base station and method of estimating position of base station by the same
CN110049434A (en) A kind of localization method, device, equipment and storage medium
CN107071708B (en) Passive wireless signal acquisition and positioning method for intelligent mobile terminal
CN103404177A (en) Nodes and methods for positioning
CN113905438B (en) Scene identification generation method, positioning method and device and electronic equipment
US20160174147A1 (en) Access point selection for mobile device positioning
Krishnamurthy Technologies for positioning in indoor Areas
KR101202194B1 (en) Position estimating system and method of portable terminal
CN111757284B (en) Indoor entrance positioning method and electronic equipment
Abhishek et al. Performance analysis of received signal strength based Wi-Fi indoor positioning algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant