CN104185275B

CN104185275B - A kind of indoor orientation method based on WLAN

Info

Publication number: CN104185275B
Application number: CN201410458932.0A
Authority: CN
Inventors: 诸彤宇; 刘帅; 宋志新
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2014-09-10
Filing date: 2014-09-10
Publication date: 2017-11-17
Anticipated expiration: 2034-09-10
Also published as: CN104185275A

Abstract

The invention discloses a WLAN-based indoor positioning method, which belongs to the technical field of indoor wireless communication and network. The method includes: preprocessing the RSSI data of each AP collected at the sampling point, extracting one-dimensional and two-dimensional vectors as feature vectors respectively; performing cluster analysis on the feature vectors to divide the area to be positioned into multiple positioning sub-regions; Group feature vectors to train their corresponding classification models; based on the classification model combined with the "voting" mechanism, select the sub-region set with the highest number of votes from all sub-regions; use two rounds of positioning to narrow the scope of the sub-region set to improve positioning accuracy. The present invention fully exploits the spatial distribution characteristics of RSSI, solves the problems of large-scale indoor positioning search matching space and high computational complexity; establishes a new positioning model, and solves the problem that the existing WLAN indoor positioning methods cannot effectively learn and adapt Due to non-line-of-sight transmission effects and abnormal RSSI attenuation rules, RSSI signals have nonlinear and non-Gaussian statistical characteristics.

Description

A WLAN-based Indoor Positioning Method

技术领域technical field

本发明涉及一种WLAN室内定位领域的定位方法，属于室内无线通信和网络技术领域。The invention relates to a positioning method in the field of WLAN indoor positioning, and belongs to the technical field of indoor wireless communication and network.

背景技术Background technique

近年来，随着人们物质生活水平的不断提高，人们对位置服务的需求也与日俱增，如在人员调度、资产管理、紧急救援、安全监控、安全调度、智能交通、地图导航、出行指南等诸多方面对定位的广泛需求；特别是在应对紧急情况是，如紧急救援、救灾应急指挥调度等特殊应用场景下，定位信息更显得尤为重要。In recent years, with the continuous improvement of people's material living standards, people's demand for location-based services is also increasing day by day, such as in personnel dispatching, asset management, emergency rescue, safety monitoring, safety dispatching, intelligent transportation, map navigation, travel guides, etc. There is a wide demand for positioning; especially in emergency situations, such as emergency rescue, disaster relief emergency command and dispatch and other special application scenarios, positioning information is even more important.

随着普适计算机和分布式通信技术的深入研究，室内无线通信和网络技术飞速发展，衍生出了基于WLAN(Wireless Local Area Networks,无线局域网)，Bluetooth(蓝牙)，WSN(wireless sensor network,无线传感器网络)等室内定位方式，以及基于指纹和概率法的室内定位方法。With the in-depth research of ubiquitous computer and distributed communication technology, the rapid development of indoor wireless communication and network technology, derived from WLAN (Wireless Local Area Networks, wireless local area network), Bluetooth (Bluetooth), WSN (wireless sensor network, wireless sensor network) and other indoor positioning methods, as well as indoor positioning methods based on fingerprints and probabilistic methods.

基于WLAN，Bluetooth，WSN等的定位技术，通过在室内进行网格划分，并在室内部署大量的AP(Access Point,访问接入点)，终端检测在每个网格内接收到的多个AP的RSSI(Received Signal Strength Indication,接收信号强度指示)，由于不同位置接收到的各个信号节点所发出的信号强度不同，将在各个网格中接收到的各个节点的RSSI作为该网络的特征量以完成定位。Based on WLAN, Bluetooth, WSN and other positioning technologies, through indoor grid division and a large number of APs (Access Points, Access Points) deployed indoors, the terminal detects multiple APs received in each grid RSSI (Received Signal Strength Indication, Received Signal Strength Indication), because the signal strength of each signal node received at different locations is different, the RSSI of each node received in each grid is used as the characteristic quantity of the network. Finish positioning.

基于指纹的室内定位，通过采集室内区域内不同AP的RSSI，并将对应的无线接入点的地址和坐标存储在数据库中，终端用户测量周围无线信号强度，将它与预先存储在数据库中的RSSI适量进行匹配定位，从而得到被定位终端用户的坐标信息。Indoor positioning based on fingerprints, by collecting the RSSI of different APs in the indoor area, and storing the address and coordinates of the corresponding wireless access points in the database, the terminal user measures the surrounding wireless signal strength and compares it with the pre-stored in the database. The appropriate amount of RSSI is used for matching and positioning, so as to obtain the coordinate information of the positioned end user.

概率法利用参考点上的已有训练样本，得出各个参考点上的RSSI信号概率分布。一般采用高斯函数进行概率分布拟合，得出各个参考点的高斯概率分布的均值和带宽。概率法充分利用了信号分布的统计特征，定位精度一般较加权最近邻法要高。The probability method uses the existing training samples on the reference points to obtain the probability distribution of RSSI signals on each reference point. Generally, the Gaussian function is used for probability distribution fitting, and the mean value and bandwidth of the Gaussian probability distribution of each reference point are obtained. The probability method makes full use of the statistical characteristics of the signal distribution, and its positioning accuracy is generally higher than that of the weighted nearest neighbor method.

然而，它们同样存在各自的问题。基于指纹的室内定位方法，在实际应用中，对于大范围的室内定位，存在空间匹配搜索范围较大，计算复杂度高，存储空间要求较大的不足，而基于概率法的室内定位方法，在实际应用中存在RSSI信号在某个固定的参考点上的概率分布呈现非高斯、非线性、多模态的特性，使得拟合出的概率分布函数与实际概率分布相差较大，从而导致定位时较大匹配误差。However, they also have their own problems. The fingerprint-based indoor positioning method, in practical applications, has the disadvantages of large space matching search range, high computational complexity, and large storage space requirements for large-scale indoor positioning, while the indoor positioning method based on the probability method, in In practical applications, the probability distribution of the RSSI signal at a fixed reference point presents non-Gaussian, nonlinear, and multi-modal characteristics, which makes the fitted probability distribution function quite different from the actual probability distribution, resulting in time-consuming positioning problems. large matching error.

发明内容Contents of the invention

本发明要解决的技术问题是：克服现有技术的什么不足，提供一种基于WLAN的室内定位方法，既能降低匹配搜索范围，又能得到符合实际情况的预测模型，一定程度上降低了计算复杂度和时间复杂度。The technical problem to be solved by the present invention is: to overcome the deficiencies of the prior art, to provide a WLAN-based indoor positioning method, which can not only reduce the matching search range, but also obtain a prediction model in line with the actual situation, which reduces the computational complexity to a certain extent. complexity and time complexity.

本发明要解决的技术问题是：降低匹配搜索范围，建立一种符合实际情况的预测模型，提供一种基于WLAN的室内定位方法，包括以下步骤实现：The technical problem to be solved in the present invention is: reduce matching search scope, set up a kind of prediction model that conforms to actual situation, provide a kind of indoor positioning method based on WLAN, comprise the following steps and realize:

步骤一：将采样点采集到的各个AP的RSSI数据预处理，从中提取出一维和二维向量分别作为特征向量。Step 1: Preprocess the RSSI data of each AP collected at the sampling point, and extract one-dimensional and two-dimensional vectors as feature vectors respectively.

对扫描到的RSSI数据进行必要的预处理包括：删除RSSI小于-100dB的数据，删除非定位AP的数据。所述删除非定位AP的数据是指，删除不适于定位的AP的RSSI。不适于定位AP的特征为强度过低(RSSI小于-95dB)或稳定性较差(方差大于20)，使用这些AP会增加计算复杂度，降低定位精度，因此予以排除。Necessary preprocessing of the scanned RSSI data includes: deleting data whose RSSI is less than -100dB, and deleting data of non-located APs. The deleting the data of the non-positioned AP refers to deleting the RSSI of the AP that is not suitable for positioning. APs that are not suitable for positioning are characterized by low strength (RSSI less than -95dB) or poor stability (variance greater than 20). Using these APs will increase computational complexity and reduce positioning accuracy, so they are excluded.

采用不同提取方法从原始数据中提取多种可准确量化RSSI分布规律的特征向量。包括以下步骤：Different extraction methods are used to extract a variety of feature vectors that can accurately quantify the distribution of RSSI from the original data. Include the following steps:

(1)将扫描到的所有AP按照MAC地址升序排序，将离线采集时扫描到的所有原始数据根据其采集位置标记上对应的采样点编号；(1) Sort all the scanned APs in ascending order according to the MAC address, and mark all the raw data scanned during offline collection according to the corresponding sampling point number on the collection position;

(2)可以按照以下两种方法提取各自的特征向量：(2) The respective feature vectors can be extracted according to the following two methods:

a.将排序后的AP两两组合，即将AP按照MAC地址分成组，每组AP表示为(AP_i,AP_j)(其中，0<i<j≤m，m代表所有AP的个数)，从标记了采样点的原始数据中提取出对应AP组合的RSSI向量以及相应的采样点；a. Combine the sorted APs in pairs, that is, divide the APs according to their MAC addresses Each group of APs is expressed as (AP _i , AP _j ) (wherein, 0<i<j≤m, m represents the number of all APs), and the RSSI of the corresponding AP combination is extracted from the original data marked with the sampling points Vector and corresponding sampling points;

b.每个AP单独作为一组，即将所有离线采集数据按照AP的MAC地址分成m组，每组AP表示为AP_i(其中，0<i≤m，m代表所有AP的个数)，从标记了采样点的原始数据中提取出对应AP的RSSI一维向量以及相应的采样点。b. Each AP is regarded as a group alone, that is, all offline collected data is divided into m groups according to the MAC address of the AP, and each group of APs is represented as AP _i (wherein, 0<i≤m, m represents the number of all APs), from The RSSI one-dimensional vector corresponding to the AP and the corresponding sampling points are extracted from the original data marked with the sampling points.

步骤二：对特征向量聚类分析，将待定位区域划分为多个定位子区域，每个子区域反映了一种RSSI分布特征。Step 2: clustering and analyzing the feature vectors, dividing the area to be located into a plurality of positioning sub-areas, and each sub-area reflects a RSSI distribution feature.

以步骤一中构造的特征向量为输入，以特征向量之间的距离作为相似度度量函数进行聚类分析。可选的，聚类分析采用可自动发现聚类数目的X-means算法。X-means聚类算法改进了K-means算法，在算法初始运算时无须预先指定聚类数K，只需指定一个K的取值范围[K1,K2](K1<K2)，算法将在指定的范围内找到一个最优的聚类数K，实现聚类划分。X-means算法以贝叶斯信息准则为指导，不断遍历不同类簇的聚类中心即代表不同的信号特征，信号特征反映了在某一区域内信号分布的聚集现象。The feature vector constructed in step 1 is used as input, and the distance between feature vectors is used as the similarity measure function for cluster analysis. Optionally, the cluster analysis adopts the X-means algorithm that can automatically discover the number of clusters. The X-means clustering algorithm improves the K-means algorithm. In the initial operation of the algorithm, there is no need to pre-specify the number of clusters K, just specify a value range of K [K1,K2] (K1<K2), the algorithm will Find an optimal cluster number K within the range of , and realize cluster division. Guided by the Bayesian information criterion, the X-means algorithm continuously traverses the cluster centers of different clusters to represent different signal characteristics, and the signal characteristics reflect the aggregation phenomenon of signal distribution in a certain area.

步骤三：针对每组特征向量结合聚类结果，分别训练出各自相应的分类模型；基于分类模型结合“投票”机制从所有子区域中选取票数最高的子区域集合。其中包括：Step 3: For each group of feature vectors combined with the clustering results, train the respective classification models; based on the classification model combined with the "voting" mechanism, select the sub-region set with the highest number of votes from all sub-regions. These include:

离线阶段，针对步骤二中提出的两种构造方法所构造的特征向量，分别训练出每种构造方法的每种特征向量所对应的支持向量机(Support Vector Machine,SVM)分类模型。SVM是建立在统计学习的VC维(VC dimension)理论和结构风险最小化(structuralrisk minimization)原则基础上的。SVM通过对分类精度(对特定样本的分类正确性)和分类能力(对任意样本进行无错误分类)进行折衷，以期使分类器获得最好的推广能力。特征值作为SVM分类器的输入，是对数据的抽象描述，因此特征值的选取非常重要，能否准确的反映待分类数据特点将直接影响最终的分类效果。In the offline stage, for the feature vectors constructed by the two construction methods proposed in step 2, respectively train the Support Vector Machine (SVM) classification model corresponding to each feature vector of each construction method. SVM is based on the VC dimension theory of statistical learning and the principle of structural risk minimization. SVM trades off the classification accuracy (correctness of classification for a specific sample) and classification ability (no error classification for any sample) in order to obtain the best generalization ability of the classifier. As the input of the SVM classifier, the eigenvalue is an abstract description of the data, so the selection of the eigenvalue is very important. Whether it can accurately reflect the characteristics of the data to be classified will directly affect the final classification effect.

在线阶段，从实时数据提取分类特征向量，读取离线阶段训练好的对应的SVM分类模型，根据所述支持向量多项式展开项值，计算待分类向量对应于不同区域的概率，结合“投票”机制从所有区域中选取票数最高的区域集R。In the online stage, the classification feature vector is extracted from the real-time data, and the corresponding SVM classification model trained in the offline stage is read. According to the support vector polynomial expansion item value, the probability that the vector to be classified corresponds to different regions is calculated, combined with the "voting" mechanism Select the region set R with the highest number of votes from all regions.

在线定位阶段的具体操作包括：Specific operations in the online positioning phase include:

(1)读取训练好的SVM分类模型，计算支持向量多项式展开项值；(1) Read the trained SVM classification model and calculate the support vector polynomial expansion item value;

(2)读取当前采集到的RSSI，提取分类特征向量；(2) read the currently collected RSSI, and extract the classification feature vector;

(3)通过多项式核函数将分类特征向量映射到高维空间，并根据所述支持向量多项式展开项值计算待分类向量对应于不同区域的概率；(3) map the classification feature vector to a high-dimensional space by the polynomial kernel function, and calculate the probability that the vector to be classified corresponds to different regions according to the polynomial expansion item value of the support vector;

(4)对于每个AP组(AP_i,AP_j)，判断划分出的每个子区域是否符合条件，如果存在多个子区域符合条件，则SVM模型认为当前设备可能处于这几个子区域的并集内；(4) For each AP group (AP _i , AP _j ), judge whether each divided sub-area meets the conditions. If there are multiple sub-areas that meet the conditions, the SVM model believes that the current device may be in the union of these sub-areas Inside;

所述符合条件的区域是指，当AP组(AP_i,AP_j)在某一子区域的预测概率大于某一阈值ε(0<ε<1)时，就认为该区域是符合条件的；The eligible area means that when the predicted probability of an AP group (AP _i , AP _j ) in a certain sub-area is greater than a certain threshold ε (0<ε<1), the area is considered to be eligible;

(5)结合“投票”机制从所有区域中选取票数最高的区域集R，具体步骤包括：(5) Combine the "voting" mechanism to select the region set R with the highest number of votes from all regions, and the specific steps include:

如果AP组(AP_i,AP_j)的样本数据经过SVM预测被认定为在某一区域内，则该区域票数加1。从几何上表现为选取被覆盖次数最多的区域当作粗粒度的定位区域，每个区域的票数应当在0到之间。If the sample data of the AP group (AP _i , AP _j ) is identified as being in a certain area after SVM prediction, the number of votes in this area will be increased by 1. Geometrically, the region with the most coverage is selected as a coarse-grained positioning region, and the number of votes in each region should be between 0 and between.

步骤四：采用两轮定位缩小区域集范围，提高定位精度。具体包括：Step 4: Use two-wheel positioning to narrow the scope of the area set and improve the positioning accuracy. Specifically include:

(2)读取当前采集到的RSSI，提取分类特征向量，并对分类特征进行标准化；(2) Read the currently collected RSSI, extract the classification feature vector, and standardize the classification feature;

(3)通过多项式核函数将分类特征向量映射到高维空间，并根据所述支持向量多项式展开项值计算待分类向量对应于不同区域的概率，从中选取步骤三中求出的粗粒度定位区域R内各个区域的概率；(3) Map the classification feature vector to a high-dimensional space through the polynomial kernel function, and calculate the probability that the vector to be classified corresponds to different regions according to the polynomial expansion item value of the support vector, and select the coarse-grained positioning region obtained in step 3 therefrom The probability of each region in R;

(4)对于每个AP_i，判断划分出的每个子区域是否符合条件，且该子区域是步骤三中求出的粗粒度定位区域R的子集，如果存在多个子区域符合条件，则SVM模型认为当前设备可能处于这几个子区域的并集内；(4) For each AP _i , judge whether each divided sub-area meets the conditions, and this sub-area is a subset of the coarse-grained positioning area R obtained in step 3. If there are multiple sub-areas that meet the conditions, then SVM The model believes that the current device may be in the union of these sub-areas;

所述符合条件的区域是指，当AP_i在某一区域的预测概率大于某一阈值ε(0<ε<1)时，就认为该区域是符合条件的；The eligible area means that when the predicted probability of AP _i in a certain area is greater than a certain threshold ε (0<ε<1), the area is considered to be eligible;

(5)结合“投票”机制从R中选取票数最高的区域集R’，具体步骤包括：如果AP_i的样本数据经过SVM预测被认定为在某一区域内，则该区域票数加1。从几何上表现为选取被覆盖次数最多的区域当作细粒度的定位区域，每个区域的票数应当在0到m之间。(5) Select the region set R' with the highest number of votes from R in combination with the "voting" mechanism. The specific steps include: if the sample data of AP _i is identified as being in a certain region after SVM prediction, the number of votes in this region is increased by 1. Geometrically, the region that is covered the most times is selected as a fine-grained positioning region, and the number of votes in each region should be between 0 and m.

本发明提供的技术方案的有益效果是：本发明充分挖掘利用了RSSI的空间分布特征，降低了因区域划分不当造成的定位区域偏差；建立新型定位模型，解决现有WLAN室内定位方法中，无法有效学习和适应RSSI信号的由于非视距传输效应、多径传播效应和RSSI衰减规律异常等原因造成的非线性、非高斯统计特性，以及大范围的室内定位，搜索匹配空间过大，计算复杂度高等问题。The beneficial effects of the technical solution provided by the present invention are: the present invention fully exploits and utilizes the spatial distribution characteristics of RSSI, reduces the positioning area deviation caused by improper area division; establishes a new positioning model, and solves the problem that the existing WLAN indoor positioning method cannot Effectively learn and adapt to the nonlinear and non-Gaussian statistical characteristics of RSSI signals caused by non-line-of-sight transmission effects, multipath propagation effects, and abnormal RSSI attenuation laws, as well as large-scale indoor positioning. The search and matching space is too large and the calculation is complicated high-level issues.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained based on these drawings without creative effort.

图1为本发明方法实现流程图；Fig. 1 is the realization flow chart of the method of the present invention;

图2为本发明方法的聚类流程图；Fig. 2 is the clustering flowchart of the inventive method;

图3为本发明方法的另一种聚类流程图；Fig. 3 is another kind of clustering flowchart of the method of the present invention;

图4为本发明方法的训练流程图；Fig. 4 is the training flowchart of the inventive method;

图5为本发明方法的一种粗粒度定位流程图；Fig. 5 is a kind of coarse-grained positioning flowchart of the method of the present invention;

图6为本发明方法的一种细粒度定位流程图。Fig. 6 is a fine-grained positioning flow chart of the method of the present invention.

具体实施方式detailed description

下面结合流程图和具体实施例对本发明具体实施方案做进一步说明。The specific implementation of the present invention will be further described below in conjunction with the flowchart and specific examples.

图2为本发明方法的聚类流程图，该流程属于离线阶段的一部分。具体可以包括如下步骤：Fig. 2 is a clustering flowchart of the method of the present invention, which is part of the offline stage. Specifically, the following steps may be included:

201、在每个校标点使用智能手机高频扫描周边AP信号，扫描出的数据格式如表1所示。需要注意的是，每个校标点采集的数据条数不固定，因采集时间长短而异。如果当前位置未能采集到相应AP的RSSI，用-100dB填补。201. At each calibration point, use a smart phone to scan the surrounding AP signals at high frequency, and the scanned data format is shown in Table 1. It should be noted that the number of data pieces collected by each calibration point is not fixed, and varies with the length of collection time. If the current location fails to collect the RSSI of the corresponding AP, fill it with -100dB.

表1 扫描数据格式Table 1 Scanned data format

校标点编号Calibration point number AP1AP1 AP2AP2 AP3AP3 AP4AP4 AP5AP5 AP6AP6 AP7AP7 11 -85-85 -97-97 -63-63 -100-100 -100-100 -90-90 -72-72 11 -83-83 -92-92 -65-65 -100-100 -98-98 -85-85 -69-69 22 -70-70 -73-73 -95-95 -82-82 -63-63 -100-100 -100-100 ……... ……... ……... ……... ……... ……... ……... ……...

202、从采集的数据中提取所有AP，按照MAC地址升序排序。其目的在于，定位阶段使用的SVM算法与向量顺序相关，因此必须人为地确定一种向量排列顺序。在本发明实施例中，使用AP的MAC升序排列作为排序方法。202. Extract all APs from the collected data, and sort them in ascending order of MAC addresses. The purpose is that the SVM algorithm used in the positioning stage is related to the order of the vectors, so a vector order must be artificially determined. In this embodiment of the present invention, MACs of APs are sorted in ascending order as a sorting method.

203、将排序后的AP两两组合成(AP_i,AP_j)(其中，0<i<j≤m，m代表所有AP的个数)，即将AP按照MAC地址分成组。从标记了采样点的各个RSSI数据中提取出对应AP组合的二维RSSI向量，作为分类原始数据。如表2，表3所示。203. Combine the sorted APs into two groups (AP _i , AP _j ) (wherein, 0<i<j≤m, m represents the number of all APs), that is, the APs are divided into Group. The two-dimensional RSSI vector corresponding to the AP combination is extracted from each RSSI data marked with the sampling point as the classification original data. As shown in Table 2 and Table 3.

表2 提取数据格式Table 2 Extraction data format

校标点编号Calibration point number AP1AP1 AP2AP2 11 -85-85 -97-97 11 -83-83 -92-92 22 -70-70 -73-73 ……... ……... ……...

表3 提取数据格式Table 3 Extraction data format

校标点编号Calibration point number AP2AP2 AP3AP3 11 -97-97 -63-63 11 -92-92 -65-65 22 -73-73 -95-95 ……... ……... ……...

204、以步骤203构造的向量为输入，以向量之间的距离作为相似度度量函数，采用可自动发现聚类数目的X-means算法进行聚类分析。分别记录每个二维AP组合对整个定位区域的划分情况。204. Taking the vector constructed in step 203 as an input, using the distance between vectors as a similarity measurement function, and using the X-means algorithm that can automatically find the number of clusters to perform cluster analysis. The division of each two-dimensional AP combination to the entire positioning area is recorded separately.

X-means算法聚类分析的具体实现过程如下：The specific implementation process of X-means algorithm clustering analysis is as follows:

Step1.指定聚类数目k范围[k_min,k_max]，并初始化k＝k_min。K的范围根据实际待测区域的大小选择，每个子区域的范围在200m²到700m²，以此方法计算[k_min,k_max]；Step1. Specify the range of clustering number k [k _min ,k _max ], and initialize k=k _min . The range of K is selected according to the size of the actual area to be tested, and the range of each sub-area is from 200m ² to 700m ² , and [k _min ,k _max ] is calculated by this method;

Step2.从步骤202中提取的特征向量集EV中随机选取k个AP组合数据点u₁,u₂,u₃...u_k作为初始聚类中心；特征向量集EV如表2，表3所示，从中挑选出k个特征向量作为初始中心；Step2. Randomly select k AP combined data points u ₁ , u ₂ , u ₃ ... u _k from the eigenvector set EV extracted in step 202 as the initial clustering center; the eigenvector set EV is shown in Table 2 and Table 3 As shown, select k eigenvectors as the initial center;

Step3.对于特征向量集EV中的每一个AP组合数据点xⁱ，根据相似度判定其所属的类簇，其中s(arg₁,arg₂)为相似度计算函数；Step3. For each AP combination data point x ⁱ in the eigenvector set EV, determine the cluster it belongs to according to the similarity, Where s(arg ₁ , arg ₂ ) is the similarity calculation function;

Step4.重复以上过程，将所有的数据点都指派到最相似的类簇，从而将所有AP组数据点都初步划分到对应的类簇中；Step4. Repeat the above process, and assign all data points to the most similar clusters, so that all AP group data points are initially divided into corresponding clusters;

Step5.对于每一个类簇，重新计算其聚类中心，其中，c⁽ⁱ⁾表示该数据点xⁱ初步认定所属的类型；c⁽ⁱ⁾＝j指的是：如果数据点xⁱ属于类簇j，则(c⁽ⁱ⁾＝j)＝1，否则(c⁽ⁱ⁾＝j)＝0；该中心表示每个类簇的加权平均中心点位置；Step5. For each cluster, recalculate its cluster center, Among them, c ⁽ⁱ⁾ indicates the type that the data point x ⁱ is preliminarily determined to belong to; c ⁽ⁱ⁾ = j means: if the data point x ⁱ belongs to the cluster j, then (c ⁽ⁱ⁾ = j) = 1, Otherwise (c ⁽ⁱ⁾ = j) = 0; the center represents the weighted average center point position of each cluster;

Step6.计算准则函数，其中xⁱ是数据集中的数据点，u_j是类簇j的聚类中心；k指的是聚类中心的个数；Step6. Calculate the criterion function, Where x ⁱ is the data point in the data set, u _j is the cluster center of cluster j; k refers to the number of cluster centers;

Step7.如果准则函数不再变化转向Step8，说明该聚类结果已经稳定；否则跳到Step3，重新进行聚类；Step7. If the criterion function no longer changes and turns to Step8, it means that the clustering result has been stabilized; otherwise, skip to Step3 and re-cluster;

Step8.对已聚出的各个类簇进行进一步划分并计算划分前后的贝叶斯信息准则BIC_pre,BIC_post；贝叶斯信息准则(Bayesian Information Criterions，BIC)是贝叶斯理论的一个重要组成部分，可以基于后验概率对相同数据集上的不同模型进行评价，适合作为选取复杂度较低且对数据集描述较好的模型的参考依据。Step8. Further divide the clusters that have been clustered and calculate the Bayesian Information Criterion BIC _pre and BIC _post before and after the division; Bayesian Information Criterions (BIC) is an important component of Bayesian theory In part, different models on the same data set can be evaluated based on the posterior probability, which is suitable as a reference for selecting a model with lower complexity and better description of the data set.

其中对于聚类数目k对应的聚类模型，贝叶斯信息准则的计算公式：其中，EV是步骤202中提取出的特征向量的集合；R为EV中包含的特征向量的个数，在此处特征向量个数等于AP组采集到的在所有位置的RSSI组合的个数；p表示参数个数，称为Schwarz准则，在本发明中其计算公式为p＝k+k·d，其中，d为EV中特征向量的维度，即d＝2；可看作是对聚类模型复杂度的惩罚；是聚类模型M_k在特征向量集合EV上的极大后验对数似然估计，其计算公式如下式所Among them, for the clustering model corresponding to the number of clusters k, the calculation formula of Bayesian information criterion: Wherein, EV is the set of feature vectors extracted in step 202; R is the number of feature vectors contained in EV, where the number of feature vectors is equal to the number of RSSI combinations at all positions collected by the AP group; p represents the number of parameters, which is called the Schwarz criterion, and its calculation formula is p=k+k d in the present invention, wherein, d is the dimension of the feature vector in the EV, i.e. d=2; Can be seen as a penalty for the complexity of the clustering model; is the maximum a posteriori logarithmic likelihood estimate of the clustering model M _k on the eigenvector set EV, and its calculation formula is as follows

其中，u_(i)为类簇i的聚类中心；in, u _(i) is the cluster center of cluster i;

Step9.如果BIC_pre＞BIC_post，察看结果模型是否比原始的得分高，分数高则接受分裂,转向Step10，否则令k＝k+1并跳到Step8；Step9. If BIC _pre > BIC _post , check whether the resulting model is higher than the original score. If the score is high, accept splitting and turn to Step10, otherwise set k=k+1 and skip to Step8;

Step10.如果k＞k_max，则需要重新进行聚类，转向Step7；否则令k＝k+1并跳到Step2，计算增加一个类的聚类情况；Step10. If k>k _max , you need to re-cluster and turn to Step7; otherwise set k=k+1 and skip to Step2 to calculate the clustering situation of adding a class;

Step11.选取BIC最大的划分方式作为聚类结果；Step11. Select the largest division method of BIC as the clustering result;

假定M为不同聚类数目k对应的模型集合，则有即为最佳聚类模型。每个类型都表示为一种信号特征，信号特征反映了在某一区域内信号分布的聚集现象。Assuming that M is a model set corresponding to different cluster numbers k, then we have is the best clustering model. Each type is expressed as a signal feature, which reflects the aggregation phenomenon of signal distribution in a certain area.

图3为本发明方法的另一种聚类流程图，该流程属于离线采样阶段的一部分。具体可以包括如下步骤：Fig. 3 is another clustering flowchart of the method of the present invention, which is part of the offline sampling stage. Specifically, the following steps may be included:

301、在每个校标点使用智能手机高频扫描周边AP信号，扫描出的数据格式如表1所示。需要注意的是，每个校标点采集的数据条数不固定，因采集时间长短而异。如果当前位置未能采集到相应AP的RSSI，用-100dB填补。301. At each calibration point, use a smart phone to scan the surrounding AP signals at high frequency, and the scanned data format is shown in Table 1. It should be noted that the number of data pieces collected by each calibration point is not fixed, and varies with the length of collection time. If the current location fails to collect the RSSI of the corresponding AP, fill it with -100dB.

302、将AP按照MAC地址分成m组(m代表所有AP的个数)。从标记了采样点的各个RSSI数据中提取出对应AP的一维RSSI向量，作为分类原始数据。如表4，表5所示。302. Divide the APs into m groups according to the MAC addresses (m represents the number of all APs). The one-dimensional RSSI vector corresponding to the AP is extracted from each RSSI data marked with the sampling point as the classification original data. As shown in Table 4 and Table 5.

表4提取数据格式Table 4 extract data format

校标点编号Calibration point number AP1AP1 11 -85-85 11 -83-83 22 -70-70 ……... ……...

表5提取数据格式Table 5 extract data format

校标点编号Calibration point number AP2AP2 11 -97-97 11 -92-92

22 -73-73 ……... ……...

303、以步骤302构造的向量为输入，以向量之间的距离作为相似度度量函数进行聚类分析，聚类分析采用可自动发现聚类数目的X-means算法，聚类方法与步骤203类似，区别在于将步骤203中的特征向量由AP组合的二维向量改为单一AP的一维向量。信号模式反映了在某一区域内信号分布的聚集现象，分别记录每个AP对整个定位区域的划分情况。需要注意的是，对于同一个定位区域，不同AP对该区域的划分会有所不同，其原因在于这些AP的部署位置在空间上相距很远，受到非视距传输效应、多径传播效应和RSSI衰减规律异常各不相同，因此可能会对划分结果产生差异。303. Take the vector constructed in step 302 as input, and use the distance between the vectors as a similarity measure function to perform cluster analysis. The cluster analysis adopts the X-means algorithm that can automatically find the number of clusters. The clustering method is similar to step 203. , the difference is that the feature vector in step 203 is changed from a two-dimensional vector of AP combination to a one-dimensional vector of a single AP. The signal mode reflects the aggregation phenomenon of signal distribution in a certain area, and records the division of each AP to the entire positioning area. It should be noted that for the same positioning area, different APs will divide the area differently. The reason is that the deployment locations of these APs are far apart in space and are affected by non-line-of-sight transmission effects, multipath propagation effects and The anomalies of the RSSI attenuation laws are different, so it may cause differences in the division results.

图4为本发明方法的训练流程图，该流程属于离线采样阶段的一部分。具体可以包括如下步骤：Fig. 4 is a training flowchart of the method of the present invention, which is a part of the off-line sampling stage. Specifically, the following steps may be included:

401、在各个标定点提取出AP两两组合的二维RSSI向量，如表2，表3所示，并将标定点的编号替换为相应聚类后的类别编号。401. Extract two-dimensional RSSI vectors of pairwise combinations of APs at each calibration point, as shown in Table 2 and Table 3, and replace the numbers of the calibration points with corresponding clustered category numbers.

402、在各个标定点提取出单个AP的一维RSSI向量，如表4，表5所示，并将标定点的编号替换为相应聚类后的类别编号。402. Extract the one-dimensional RSSI vector of a single AP at each calibration point, as shown in Table 4 and Table 5, and replace the number of the calibration point with the corresponding clustered category number.

403、对步骤401,402得到的向量分别进行SVM训练，计算支持向量机的分类特征值。分类特征值的计算为后续的判断初始化范围和缩小定位范围的有效性提供数据支持。本实施例选取的分类特征就是步骤401,402得到的RSSI向量。403. Perform SVM training on the vectors obtained in steps 401 and 402 respectively, and calculate classification feature values of the support vector machine. The calculation of classification feature values provides data support for the subsequent judgment of the validity of the initialization range and the narrowing of the positioning range. The classification features selected in this embodiment are the RSSI vectors obtained in steps 401 and 402 .

图5为本发明方法的一种粗粒度定位流程图，该流程属于在线阶段的一部分。具体可以包括如下步骤：Fig. 5 is a coarse-grained positioning flow chart of the method of the present invention, which is part of the online phase. Specifically, the following steps may be included:

501、加载训练好的每个二维AP组(AP_i,AP_j)的SVM分类模型，读取当前采集到的RSSI，根据AP的MAC地址升序排序，再提取每组AP组合作为分类向量。需要注意的是，SVM算法与向量顺序相关，因此必须人为地确定一种向量排列顺序。在本例中，使用AP的MAC升序排列作为排序方法。501. Load the trained SVM classification model of each two-dimensional AP group (AP _i , AP _j ), read the currently collected RSSI, sort in ascending order according to the MAC addresses of the APs, and then extract each group of AP combinations as a classification vector. It should be noted that the SVM algorithm is related to the vector order, so a vector arrangement order must be artificially determined. In this example, use AP's MAC ascending order as the sorting method.

502、将步骤501提取出的AP组合形成的分类向量，使用相应的SVM模型对其进行预测，分别求出每组AP在其对应区域划分模式下在各个区域的概率。由于当前位置可能处于多个区域的边缘处，或者由于该二维AP组合中某个或某些AP由于受到非视距传输效应、多径传播效应和RSSI衰减规律异常等原因导致RSSI波动，可能会出现多个区域都符合要求的情况，可以按照以下方法进行选取：502. Use the corresponding SVM model to predict the classification vector formed by combining the APs extracted in step 501, and calculate the probability of each group of APs in each region under its corresponding region division mode. Because the current location may be at the edge of multiple areas, or because one or some APs in the two-dimensional AP combination are subject to non-line-of-sight transmission effects, multipath propagation effects, and RSSI attenuation abnormalities, etc., RSSI fluctuations may occur. There may be situations where multiple areas meet the requirements, and you can select them in the following ways:

每次AP采样提取出的特征向量，经过对应的SVM模型预测后可能会有多个预测结果符合要求，每个预测结果对应于该AP组将待定位区域划分出的一个子区域。上式中s代表符合条件的预测结果个数，即代表SVM模型认定当前设备可能处于几个子区域内；area_k表示第k个符合条件的区域，即代表SVM模型认定当前设备可能在哪几个子区域内。Area(AP_i,AP_j)代表AP组(AP_i,AP_j)所确定的当前位置所在的区域，即代表SVM模型认为当前设备可能处于这几个子区域的并集。符合要求的子区域的选取方法是，如果预测出当前特征向量在某个子区域的概率不小于某个阈值ε(0<ε<1)，就认为该子区域符合要求。在本例中，选取的ε＝1/n，n代表该AP组合划分出的子区域个数。The feature vectors extracted by each AP sampling may have multiple prediction results that meet the requirements after being predicted by the corresponding SVM model, and each prediction result corresponds to a sub-region divided by the AP group into the region to be located. In the above formula, s represents the number of prediction results that meet the conditions, which means that the _SVM model believes that the current device may be in several sub-areas; within the area. Area(AP _i , AP _j ) represents the area where the current location determined by the AP group (AP _i , AP _j ) is located, which means that the SVM model believes that the current device may be in the union of these sub-areas. The selection method of sub-regions that meet the requirements is that if the predicted probability of the current feature vector in a certain sub-region is not less than a certain threshold ε (0<ε<1), the sub-region is considered to meet the requirements. In this example, ε=1/n is selected, and n represents the number of sub-areas divided by the AP combination.

503、对步骤502中获得的所有AP组(AP_i,AP_j)的Area(AP_i,AP_j)采用“投票”方式计算定位结果。如果某AP组合的样本数据经过步骤502预测被认定为可能在某一区域内，则该区域票数加1。遍历所有AP组合的Area(AP_i,AP_j)并投票，选定票数最多的区域作为定位的粗粒度定位区域，每个区域的票数在0到之间。从几何上表现为选取被Area(AP_i,AP_j)覆盖次数最多的区域当作粗粒度的定位区域。如果所有区域的票数都小于某一阈值ξ，认为定位失败，结束定位；如果存在多个区域的票数最多且大于ξ，则求这些区域的并集，作为定位的粗粒度定位区域。由于正常定位至少需要3个AP参与计算，因此在本例中，ξ取值为4。503. For the Areas (AP _i , AP _j ) of all AP groups (AP _i , AP _j ) obtained in step 502, calculate positioning results in a "voting" manner. If the sample data of a certain AP combination is predicted to be in a certain area after step 502, the number of votes in this area is increased by 1. Traverse the Area (AP _i , AP _j ) of all AP combinations and vote, select the area with the most votes as the coarse-grained positioning area for positioning, and the number of votes in each area ranges from 0 to between. Geometrically, the area covered by Area(AP _i , AP _j ) is selected as the coarse-grained positioning area. If the number of votes in all areas is less than a certain threshold ξ, it is considered that the positioning fails, and the positioning ends; if there are multiple areas with the largest number of votes and greater than ξ, the union of these areas is calculated as the coarse-grained positioning area for positioning. Since normal positioning requires at least 3 APs to participate in the calculation, in this example, the value of ξ is 4.

图6为本发明方法的一种细粒度定位流程图，该流程属于在线阶段的一部分。具体可以包括如下步骤：Fig. 6 is a fine-grained positioning flow chart of the method of the present invention, which is part of the online phase. Specifically, the following steps may be included:

601、加载之前训练好的每个AP的SVM分类模型，读取当前采集到的RSSI，将该RSSI形成一个一维的分类向量，使用之前训练好的每个AP的SVM分类模型对其进行预测，分别求出每个AP在其对应区域划分模式下在各个区域的概率，从中选取上一步求出的粗粒度定位区域R内各个区域的概率。由于当前位置可能处于多个区域的边缘处，或者由于该AP由于受到非视距传输效应、多径传播效应和RSSI衰减规律异常等原因导致RSSI波动，可能会出现多个区域都比较符合要求的情况，可以按照以下方法进行选取：601. Load the previously trained SVM classification model of each AP, read the currently collected RSSI, form the RSSI into a one-dimensional classification vector, and use the previously trained SVM classification model of each AP to predict it , respectively calculate the probability of each AP in each area under its corresponding area division mode, and select the probability of each area in the coarse-grained positioning area R obtained in the previous step. Because the current location may be at the edge of multiple areas, or because the AP is subject to non-line-of-sight transmission effects, multipath propagation effects, and RSSI fluctuations due to abnormal RSSI attenuation rules, multiple areas may meet the requirements. Circumstances can be selected as follows:

每次AP采样提取出的特征向量，经过对应的SVM模型预测后可能会有多个预测结果符合要求，每个预测结果对应于该AP将待定位区域划分出的一个子区域。上式中s代表符合条件的预测结果个数，即代表SVM模型认定当前设备可能处于几个子区域内；area_k表示第k个符合条件的区域，即代表SVM模型认定当前设备可能在哪几个子区域内，该区域必须是步骤四中求出的粗粒度定位区域R的子集。Area(AP_i)代表AP_i所确定的当前位置所在的区域，即代表SVM模型认为当前设备可能处于这几个子区域的并集。符合要求的子区域的选取方法是，如果预测出当前特征向量在某个子区域的概率不小于某个阈值ε(0<ε<1)，就认为该子区域符合要求。在本例中，选取的ε＝1/n，n代表该AP划分出的子区域个数。The feature vectors extracted by each AP sampling may have multiple prediction results that meet the requirements after being predicted by the corresponding SVM model, and each prediction result corresponds to a sub-region divided by the AP into the region to be located. In the above formula, s represents the number of prediction results that meet the conditions, which means that the _SVM model believes that the current device may be in several sub-areas; In the region, the region must be a subset of the coarse-grained positioning region R obtained in step 4. Area(AP _i ) represents the area where the current location determined by AP _i is located, that is, it represents the union of these sub-areas that the SVM model believes that the current device may be in. The selection method of sub-regions that meet the requirements is that if the predicted probability of the current feature vector in a certain sub-region is not less than a certain threshold ε (0<ε<1), the sub-region is considered to meet the requirements. In this example, ε=1/n is selected, and n represents the number of sub-areas divided by the AP.

602、对步骤601中的所有AP_i的Area(AP_i)采用“投票”方式计算，即如果某AP的样本数据经过步骤601预测被认定为在某一区域内，则该区域票数加1。遍历所有AP的Area(AP_i)并投票，选定票数最多的区域作为定位的细粒度定位区域，每个区域的票数在0到m之间。从几何上表现为选取被Area(AP_i)覆盖次数最多的区域当作粗粒度的定位区域。如果所有区域的票数都小于某一阈值ξ，认为定位失败，结束定位；如果存在多个区域的票数最多且大于ξ，则求这些区域的并集，求出其中心点坐标和半径，作为最终的定位区域。由于正常定位至少需要3个AP参与计算，因此在本例中，ξ取值为4。602. Calculate the Area (AP _i ) of all AP _i in step 601 by "voting", that is, if the sample data of a certain AP is predicted to be in a certain area after step 601, the number of votes for this area is increased by 1. Traverse all AP's Area (AP _i ) and vote, select the area with the most votes as the fine-grained positioning area for positioning, and the number of votes in each area is between 0 and m. Geometrically, it is expressed as selecting the area covered by Area (AP _i ) the most times as a coarse-grained positioning area. If the number of votes in all areas is less than a certain threshold ξ, it is considered that the positioning has failed, and the positioning ends; if there are multiple areas with the largest number of votes and greater than ξ, then find the union of these areas, and find the coordinates and radius of their center points as the final targeting area. Since normal positioning requires at least 3 APs to participate in the calculation, in this example, the value of ξ is 4.

本发明未详细阐述部分属于本领域公知技术。Parts not described in detail in the present invention belong to the well-known technology in the art.

以上所述，仅为本发明部分具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本领域的人员在本发明揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。The above are only some specific implementations of the present invention, but the protection scope of the present invention is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be covered within the protection scope of the present invention.

Claims

1. An indoor positioning method based on WLAN is characterized by comprising the following steps:

the method comprises the following steps: preprocessing RSSI data of each AP acquired by a sampling point, and extracting a one-dimensional vector and a two-dimensional vector from the preprocessed RSSI data as characteristic vectors respectively;

step two: performing clustering analysis on the characteristic vectors, and dividing a region to be positioned into a plurality of positioning sub-regions;

step three: respectively training corresponding classification models for each group of feature vectors in combination with clustering results; selecting a subregion set with the highest vote number from all subregions based on a classification model and a 'voting' mechanism;

step four: the sub-area set range is reduced by two-wheel positioning, and the positioning precision is improved;

the first step of extracting one-dimensional and two-dimensional vectors from the preprocessed data as feature vectors respectively comprises the following steps:

(1) sequencing all the scanned APs in an ascending order according to the MAC addresses;

(2) one-dimensional and two-dimensional vectors are extracted as feature vectors according to the following two methods:

a. combining the sequenced APs pairwise, and dividing the APs into the groups according to the MAC addressesGroups, each group of APs denoted As (AP)_i,AP_j) Wherein, 0<i<j is less than or equal to m, m represents the number of all APs, and a vector formed by combining the APs is extracted from the preprocessed data to serve as a feature vector;

b. each AP is independently used as one group, namely all off-line collected data are divided into m groups according to the MAC address of the AP, and each group of APs is expressed as an AP_iWherein, 0<i is less than or equal to m, m represents the number of all APs, and a vector formed by the APs is extracted from the preprocessed data to be used as a feature vector;

the third step, the concrete implementation process includes an off-line stage and an on-line stage;

in the off-line stage, a Support Vector Machine (SVM) classification model corresponding to each feature Vector of each construction method is trained respectively aiming at the feature vectors constructed by the two construction methods provided in the step one;

in the online stage, a classification feature vector is extracted from real-time data, an SVM classification model trained in the offline stage is read, the probability that the vector to be classified corresponds to different regions is calculated according to support vector polynomial expansion term values, and a region set R with the highest vote number is selected from all the regions by combining a voting mechanism;

the voting mechanism refers to if a group of APs (APs) is present_i,AP_j) Is determined to be in a certain area through SVM prediction, thenAdding 1 to the ticket number of the region; EV (AP) traversing all AP groups_i,AP_j) Voting, selecting the region with the most votes as the coarse-grained location region, wherein the votes of each region should be 0 to 0EV is a feature vector set;

and step four, adopting two rounds of positioning to reduce the range of the region set, and specifically realizing the following steps:

(1) reading a trained SVM classification model, and calculating a support vector polynomial expansion term value;

(2) reading the currently acquired RSSI, extracting a classification feature vector, and standardizing classification features;

(3) mapping the classified characteristic vectors to a high-dimensional space through a polynomial kernel function, calculating the probability of the vectors to be classified corresponding to different regions according to the support vector polynomial expansion term values, and selecting the probability of each region in the coarse-grained positioning region R obtained in the third step;

(4) for each AP_iJudging whether each divided sub-region meets the condition, wherein the sub-region is a subset of the coarse-grained positioning region R obtained in the third step, and if a plurality of sub-regions meet the condition, the SVM model considers that the current equipment is possibly in a union of the sub-regions;

(5) combining a ' voting ' mechanism to select the region set R ' with the highest vote number from the R, the method specifically comprises the following steps: if AP_iThe sample data of the AP is predicted by the SVM and is determined to be in a certain area, the number of votes in the area is added with 1, the area with the largest number of votes is selected as a positioning fine-grained positioning area according to the positioning area votes of each AP, and the number of votes in each area is between 0 and m.

2. The WLAN based indoor positioning method of claim 1, wherein: the first step of preprocessing the RSSI data of each AP collected by the sampling points comprises the following steps: deleting data with too low RSSI, deleting data of non-positioning AP, and filling up RSSI data which is not scanned;

the step of deleting the data with the low RSSI refers to deleting the data with the RSSI intensity lower than a certain threshold value; the data of the non-positioning AP is deleted, namely the RSSI of the AP which is not suitable for positioning is deleted, and the characteristic that the RSSI is not suitable for positioning is too low, namely the RSSI is less than-95 dB or the stability is poor, namely the variance is more than 20.

3. The WLAN based indoor positioning method of claim 1, wherein: in the second step, the feature vector is subjected to cluster analysis, and the region to be positioned is divided into a plurality of positioning sub-regions, and the specific steps are as follows: and (4) taking the characteristic vectors constructed in the step one as input, taking the distance between the characteristic vectors as a similarity measurement function to perform cluster analysis, wherein the cluster analysis adopts an X-means algorithm capable of automatically finding the cluster number.

4. The WLAN based indoor positioning method of claim 1, wherein: the specific operation of the online positioning stage comprises:

(2) reading the currently acquired RSSI and extracting a classification feature vector;

(3) mapping the classified feature vectors to a high-dimensional space through a polynomial kernel function, and calculating the probability of the vectors to be classified corresponding to different regions according to the support vector polynomial expansion term values;

(4) for each AP group (AP)_i,AP_j) And judging whether each divided sub-region meets the condition, and if a plurality of sub-regions meet the condition, the SVM model considers that the current equipment is possibly in the union of the sub-regions.