CN109600751B

CN109600751B - A pseudo base station detection method based on network side user data

Info

Publication number: CN109600751B
Application number: CN201811376023.7A
Authority: CN
Inventors: 戴彬; 毛世奇
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2018-11-19
Filing date: 2018-11-19
Publication date: 2020-09-18
Anticipated expiration: 2038-11-19
Also published as: CN109600751A

Abstract

The invention discloses a pseudo base station detection method based on user data on the network side. The URL is filtered to obtain the abnormal URL; for any abnormal URL, its traffic threshold is obtained, and the base station whose traffic volume of the abnormal URL is greater than the traffic threshold within the target time interval is determined as the abnormal base station, so as to obtain the corresponding abnormal URL. For any abnormal URL, if the corresponding abnormal base station set is not empty, the abnormal URL is determined to be a malicious URL, and the abnormal base station set is divided so that geographically adjacent abnormal base stations belong to the same abnormal base station. A subset of abnormal base stations to obtain the active area of pseudo base stations. The present invention can accurately obtain multiple active areas of the pseudo base station without relying on the user of the mobile terminal.

Description

A pseudo base station detection method based on network side user data

技术领域technical field

本发明属于移动互联网网络安全领域，更具体地，涉及一种基于网络侧用户数据的伪基站检测方法。The invention belongs to the field of mobile Internet network security, and more particularly, relates to a pseudo base station detection method based on network side user data.

背景技术Background technique

伪基站可以利用手机不能鉴别基站的单向鉴权机制获取手机卡信息，强制给手机群发垃圾短信。在移动互联网中，伪基站是传播恶意URL(Uniform Resource Locator，统一资源定位符)的主要途径之一，因为伪基站群发的垃圾短信的号码可以随意更改，具有金融诈骗性质的钓鱼网址经常使用95555、95588、10086等网民防范心理不高的常用服务官方号码传播，用户往往容易受骗遭受经济财产损失。而且伪基站具有流动性，能够高效的实现短信发送，低投入高回报等特点。数据管道口对伪基站进行检测，能够有效降低来自伪基站群发短信中的恶意URL链接对网民的损害。The fake base station can use the one-way authentication mechanism that the mobile phone cannot identify the base station to obtain the mobile phone card information, and force the group to send spam text messages to the mobile phone. In the mobile Internet, fake base stations are one of the main ways to spread malicious URLs (Uniform Resource Locator, Uniform Resource Locator), because the number of spam text messages sent by fake base stations can be changed at will, and phishing URLs with financial fraud properties often use 95555 , 95588, 10086 and other netizens prevent the spread of common service official numbers with low psychology, and users are often easily deceived and suffer economic and property losses. Moreover, the pseudo base station has the characteristics of liquidity, which can efficiently realize the sending of short messages, and has the characteristics of low investment and high return. The data pipeline port detects the fake base station, which can effectively reduce the damage to netizens caused by malicious URL links in the group text messages sent from the fake base station.

通讯运营商无法获取移动终端的数据，仅能利用网络侧的用户数据实现伪基站的监测。目前检测伪基站的方法主要包括：基于信令交互数据和终端APP数据检测。基于信令交互方法主要是从移动网络侧的位置更新信令中观察，被强制注册过的手机发起了由某一源LAC(Location Area Code，位置区码)到现网某一LAC的位置更新。其中的源LAC也就是伪基站的LAC，可以根据源LAC来初步判断是否为伪基站。该方法只能做到单个伪基站的大致位置信息，而无法对多个伪基站做到实时追踪。基于终端APP数据方法主要通过移动终端信息精确定位伪基站，即通过解析短信相关数据如发送号码、LAC位置新等，将这些信息处理后与库中数据做比对来检测是否为伪基站，再通过返回接收到垃圾短信的用户位置信息估算伪基站的位置。这种方式需要开启手机定位，将会给手机带来流量和电量的损耗，同时，也无法感知某个大区域内的伪基站的地理活动轨迹。The communication operator cannot obtain the data of the mobile terminal, and can only use the user data on the network side to monitor the pseudo base station. The current methods for detecting pseudo base stations mainly include: based on signaling interaction data and terminal APP data detection. The signaling-based interaction method is mainly observed from the location update signaling on the mobile network side. The mobile phone that has been forcibly registered initiates a location update from a source LAC (Location Area Code, location area code) to a LAC on the current network. . The source LAC is also the LAC of the pseudo base station, and whether it is a pseudo base station can be preliminarily determined according to the source LAC. This method can only achieve the approximate location information of a single pseudo base station, but cannot track multiple pseudo base stations in real time. The method based on the terminal APP data mainly locates the pseudo base station precisely through the mobile terminal information, that is, by analyzing the SMS-related data such as the sending number, the new LAC location, etc., and comparing the information with the data in the database to detect whether it is a pseudo base station, and then The location of the pseudo base station is estimated by returning the location information of the user receiving the spam short message. This method needs to enable mobile phone positioning, which will bring loss of traffic and power to the mobile phone, and at the same time, it is impossible to perceive the geographic activity trajectory of the pseudo base station in a large area.

总的来说，现有的检测伪基站的方法，无法在不依赖移动终端用户的情况下准确获得伪基站的活动区域，不利于通讯运营商对伪基站进行实时有效的追踪。In general, the existing methods for detecting pseudo base stations cannot accurately obtain the active area of pseudo base stations without relying on mobile terminal users, which is not conducive to communication operators' real-time and effective tracking of pseudo base stations.

发明内容SUMMARY OF THE INVENTION

针对现有技术的缺陷和改进需求，本发明提供了一种基于网络侧用户数据的伪基站检测方法，其目的在于，在不依赖移动终端用户的情况下准确获得多个伪基站的活动区域。In view of the defects and improvement requirements of the prior art, the present invention provides a pseudo base station detection method based on network side user data, the purpose of which is to accurately obtain the active areas of multiple pseudo base stations without relying on mobile terminal users.

为实现上述目的，按照本发明的一个方面，提供了一种基于网络侧用户数据的伪基站检测方法，包括如下步骤：In order to achieve the above object, according to an aspect of the present invention, a method for detecting a pseudo base station based on user data on the network side is provided, comprising the following steps:

(1)从移动互联网多用户的HTTP请求数据中解析出URL及对应的访问信息，并根据当前网络的域名白名单对解析得到的URL进行过滤，从而得到异常URL；(1) parse out the URL and the corresponding access information from the HTTP request data of multiple users of the mobile Internet, and filter the parsed URL according to the domain name whitelist of the current network, thereby obtaining the abnormal URL;

(2)对于任意一个异常URL，获得其访问量阈值，并将目标时间间隔内对该异常URL的访问量大于访问量阈值的基站确定为异常基站，从而得到该异常URL对应的异常基站集合；(2) For any abnormal URL, obtain its access threshold, and determine the base station whose access to the abnormal URL is greater than the access threshold within the target time interval as the abnormal base station, thereby obtaining the abnormal base station set corresponding to the abnormal URL;

(3)对于任意一个异常URL，若对应的异常基站集合不为空，则确定该异常URL为恶意URL，并对异常基站集合进行划分，使得在地理位置上相邻的异常基站属于同一个异常基站子集，从而得到伪基站的活动范围；(3) For any abnormal URL, if the corresponding abnormal base station set is not empty, it is determined that the abnormal URL is a malicious URL, and the abnormal base station set is divided so that the adjacent abnormal base stations in geographic location belong to the same abnormality Subset of base stations, so as to obtain the activity range of pseudo base stations;

其中，访问量阈值为异常基站和非异常基站对URL访问量的分界点，目标时间间隔的起始时刻为对应的异常URL首次被检测到的时刻。The traffic threshold is the dividing point between the abnormal base station and the non-abnormal base station for URL traffic, and the start time of the target time interval is the time when the corresponding abnormal URL is detected for the first time.

根据大量的统计信息发现，移动终端用户在接收到伪基站设备群发的带有恶意URL链接的垃圾短信后，在某个时间段内会通过HTTP请求访问该恶意URL链接；众多受害用户在相应运营商基站覆盖区域对该URL链接发起的HTTP请求将在地理位置上呈现局部区域多用户密集访问，而周边稍远区域访问稀疏甚至没有的特征。本发明利用上述特征，统计目标时间间隔内每一个基站访问异常URL的访问量，并与访问量阈值进行对比，由此识别出恶意URL及对应的异常基站集合，通过将异常基站集合中在地理位置上相邻的异常基站划分入相同的异常基站子集，从而能够准确、有效地获得伪基站的多个活动区域。According to a large amount of statistical information, it is found that after receiving spam messages with malicious URL links sent by pseudo base station equipment, mobile terminal users will access the malicious URL links through HTTP requests within a certain period of time; many victim users are operating correspondingly. The HTTP request initiated by the business base station coverage area for the URL link will present the characteristics of intensive access by multiple users in the local area, and sparse or even non-existent access in the surrounding area. The present invention utilizes the above-mentioned features to count the number of visits of each base station accessing abnormal URLs within the target time interval, and compare it with the threshold value of the visits, thereby identifying malicious URLs and the corresponding set of abnormal base stations. The abnormal base stations that are adjacent in position are divided into the same subset of abnormal base stations, so that multiple active areas of pseudo base stations can be obtained accurately and effectively.

目标时间间隔具体根据伪基站发送垃圾短信的行为特性，以及用户访问短信中连接的行为特性设定，以保证所获取的数据能够准确反映基站是否具有密集访问特征，从而准确识别异常基站，进而提高伪基站检测的准确性。The target time interval is specifically set according to the behavior characteristics of the pseudo base station sending spam short messages and the behavior characteristics of the user accessing the connection in the short message, so as to ensure that the acquired data can accurately reflect whether the base station has dense access characteristics, so as to accurately identify abnormal base stations and improve the Accuracy of fake base station detection.

进一步地，步骤(3)中，将异常基站集合划分为异常基站子集的方法包括：Further, in step (3), the method for dividing the abnormal base station set into abnormal base station subsets includes:

获得异常基站集合中每一个异常基站的经纬度，并由此计算任意两个异常基站之间的距离；Obtain the latitude and longitude of each abnormal base station in the set of abnormal base stations, and calculate the distance between any two abnormal base stations;

若第一异常基站与第二异常基站之间的距离小于距离阈值，则判定第一异常基站和第二异常基站在地理位置上相邻；If the distance between the first abnormal base station and the second abnormal base station is less than the distance threshold, it is determined that the first abnormal base station and the second abnormal base station are geographically adjacent;

若第一异常基站和第二异常基站分别与第三异常基站在地理位置上相邻，则判定第一异常基站和第二异常基站在地理位置上相邻；If the first abnormal base station and the second abnormal base station are geographically adjacent to the third abnormal base station respectively, it is determined that the first abnormal base station and the second abnormal base station are geographically adjacent;

将在地理位置上相邻的异常基站划分入同一异常基站子集；Divide the geographically adjacent abnormal base stations into the same subset of abnormal base stations;

其中，第一异常基站、第二异常基站和第三异常基站为同一恶意URL所对应的不同异常基站。The first abnormal base station, the second abnormal base station and the third abnormal base station are different abnormal base stations corresponding to the same malicious URL.

进一步地，访问量阈值的计算方法包括：Further, the calculation method of the access threshold value includes:

对于任意一个异常URL，获得对应的异常基站集合中每一个异常基站的识别码和在目标时间间隔内对该异常URL的访问量，并由此计算四分位距；For any abnormal URL, obtain the identification code of each abnormal base station in the corresponding abnormal base station set and the number of visits to the abnormal URL within the target time interval, and then calculate the interquartile range;

根据四分位距的第一分位数和第三分位数计算访问量阈值。The traffic threshold is calculated based on the first and third quantiles of the interquartile range.

更进一步地，访问量阈值的计算公式为：Further, the calculation formula of the traffic threshold is:

Th_u＝Q3+1.5*(Q3-Q1)；Th _u =Q3+1.5*(Q3-Q1);

其中，Th_u表示访问量阈值，Q1和Q3分别表示四分位距的第一分位数和第三分位数。Among them, _Thu represents the threshold of visits, and Q1 and Q3 represent the first and third quantiles of the interquartile range, respectively.

总体而言，通过本发明所构思的以上技术方案，能够取得以下有益效果：In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be achieved:

(1)本发明所提供的基于网络侧用户数据的伪基站检测方法，通过解析网络侧的HTTP请求数据获取URL及对应的访问信息，统计目标时间间隔内每一个基站访问异常URL的访问量，并与访问量阈值进行对比，由此识别出恶意URL，并识别出异常基站。一方面，该方法仅利用网络侧用户数据完成伪基站的检测，无需依赖于移动终端用户；另一方面，该方法符合伪基站的行为特征，有利于准确、有效地检测出多个伪基站的活动区域。(1) The pseudo base station detection method based on network side user data provided by the present invention obtains URL and corresponding access information by parsing the HTTP request data on the network side, and counts the number of visits of each base station accessing abnormal URLs in the target time interval, And compared with the traffic threshold, thus identifying malicious URLs and identifying abnormal base stations. On the one hand, the method only uses the user data on the network side to complete the detection of the pseudo base station, and does not need to rely on the mobile terminal user; activity area.

(2)本发明所提供的基于网络侧用户数据的伪基站检测方法，通过将地理位置上相邻的伪基站加入同一个伪基站集合，能够准确获取到伪基站的活动区域，有利于通讯运营商实时检测到具有流动性的伪基站。(2) The pseudo base station detection method based on the user data on the network side provided by the present invention can accurately obtain the active area of the pseudo base station by adding geographically adjacent pseudo base stations to the same pseudo base station set, which is beneficial to communication operations The quotient detects the pseudo base station with liquidity in real time.

附图说明Description of drawings

图1为现有的伪基站通过短信向终端用户发送恶意URL的示意图；1 is a schematic diagram of an existing pseudo base station sending a malicious URL to a terminal user through a short message;

图2为本发明实施例提供的基于网络侧用户数据的伪基站检测方法流程图。FIG. 2 is a flowchart of a method for detecting a pseudo base station based on user data on a network side according to an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。此外，下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

图1所示为伪基站通过短信向终端用户发送恶意URL的示意图，简单来讲，伪基站将携带有恶意URL链接的垃圾短信发送给用户，用户通过运营商基站访问了短信携带的恶意URL链接，伪基站本身无法检测到，只能通过检测到的基站来确定伪基站的活动区域。为了获得伪基站的活动范围，本发明所提供的基于网络侧用户数据的伪基站检测方法直接从网络侧用户访问网络的HTTP请求数据中解析出URL，这些数据若满足局部地区(基站)访问量高、且涉及基站相邻成带状的特征，则判断为伪基站数据。Figure 1 shows a schematic diagram of a pseudo base station sending a malicious URL to an end user through a short message. In simple terms, the pseudo base station sends a spam short message carrying a malicious URL link to the user, and the user accesses the malicious URL link carried in the short message through the operator's base station , the pseudo base station itself cannot be detected, and only the detected base station can determine the active area of the pseudo base station. In order to obtain the activity range of the pseudo base station, the pseudo base station detection method based on the network side user data provided by the present invention directly parses the URL from the HTTP request data of the network side user accessing the network. If it is high and involves the characteristics that the base stations are adjacent to each other in a band shape, it is judged as pseudo base station data.

具体地，本发明所提供的基于网络侧用户数据的伪基站检测方法如图2所示，包括如下步骤：Specifically, the method for detecting a pseudo base station based on user data on the network side provided by the present invention is shown in FIG. 2 , and includes the following steps:

URL对应的访问信息包括用户标识、接入基站的基站标识以及访问时间戳；The access information corresponding to the URL includes the user identifier, the base station identifier of the access base station, and the access timestamp;

由于URL中包含域名信息，从中提取URL后根据当前网络的白名单进行过滤，即可过滤掉正常URL，并得到异常URL；Since the URL contains domain name information, after extracting the URL and filtering according to the whitelist of the current network, the normal URL can be filtered out and the abnormal URL can be obtained;

(2)对于任意一个异常URL，获得其访问量阈值，并将目标时间间隔内对该异常URL的访问量大于访问量阈值的基站确定为异常基站，从而得到该异常URL对应的异常基站结合；(2) for any abnormal URL, obtain its access threshold value, and determine the base station whose access volume to the abnormal URL is greater than the access volume threshold value within the target time interval as the abnormal base station, thereby obtaining the abnormal base station combination corresponding to the abnormal URL;

目标时间间隔的起始时刻为对应的异常URL首次被检测到的时刻，目标时间间隔具体根据伪基站发送垃圾短信的行为特性，以及用户访问短信中连接的行为特性设定，以保证所获取的数据能够准确反映基站是否具有密集访问特征，从而准确识别异常基站，进而提高伪基站检测的准确性；相关研究显示，一般伪基站在一处地理位置的停留时间为一个小时左右，一天可能辗转多个基站覆盖范围，同时用户对于收到的短信大多会在2个小时内进行查看；综合考虑这两个指标，并结合具体的应用场景，通过简单的对比验证即可确定目标时间间隔的具体取值；The starting time of the target time interval is the time when the corresponding abnormal URL is detected for the first time. The target time interval is specifically set according to the behavior characteristics of the pseudo base station sending spam short messages and the behavior characteristics of the user accessing the connection in the short message, so as to ensure the obtained The data can accurately reflect whether the base station has dense access characteristics, so as to accurately identify abnormal base stations, thereby improving the accuracy of fake base station detection; related research shows that generally the stay time of a fake base station in a geographical location is about an hour, and it may be turned around for more than one day. At the same time, users will check most of the received short messages within 2 hours; considering these two indicators and combining with specific application scenarios, the specific selection of the target time interval can be determined through simple comparison and verification. value;

访问量阈值为异常基站和非异常基站对URL访问量的分界点，在一个可选的实施方式中，访问量阈值的计算方法包括：The access volume threshold is the dividing point between the abnormal base station and the non-abnormal base station on the URL access volume. In an optional embodiment, the calculation method of the access volume threshold includes:

对于任意一个异常URL，获得对应的异常基站集合中每一个异常基站的识别码和在目标时间间隔内对该异常URL的访问量，并由此计算四分位距；根据四分位距的第一分位数和第三分位数计算访问量阈值；For any abnormal URL, obtain the identification code of each abnormal base station in the corresponding abnormal base station set and the number of visits to the abnormal URL within the target time interval, and then calculate the quartile range; 1st quantile and 3rd quantile to calculate the traffic threshold;

具体地，根据目标时间间隔内每一个访问该异常URL的基站标识及对应的访问量，可以得到一组形如{bsid1:34,bsid2:89:bsid3:283,…}的集合data，集合data中，每一个元素分别表示访问异常URL的基站标识和对应的访问量，例如，其中第一个元素“bsid1:34”表示基站的基站标识为bsid1，对应的访问量为34；根据访问量将集合data中的元素排序，将每一个元素作为一个样本，即可得到数据集的样本数量、均值、标准差、最小值、最大值以及3个四份位数，及在数据25％、50％和75％位置的数；在计算四分位距时，可利用pandas库的相关函数完成具体计算；Specifically, according to the ID of each base station accessing the abnormal URL and the corresponding access volume within the target time interval, a set of aggregate data in the form of {bsid1:34,bsid2:89:bsid3:283,...} can be obtained. , each element represents the ID of the base station accessing the abnormal URL and the corresponding access volume. For example, the first element "bsid1:34" indicates that the base station ID of the base station is bsid1, and the corresponding access volume is 34; Sort the elements in the set data, and take each element as a sample, you can get the number of samples, mean, standard deviation, minimum value, maximum value, and 3 quartiles of the data set, and 25% and 50% of the data. and the number of the 75% position; when calculating the interquartile range, the relevant functions of the pandas library can be used to complete the specific calculation;

在本实施例中，访问量阈值的计算公式具体为：In this embodiment, the calculation formula of the access threshold is specifically:

Th_u＝Q3+1.5*(Q3-Q1)；Th _u =Q3+1.5*(Q3-Q1);

其中，Th_u表示访问量阈值，Q1和Q3分别表示四分位距的第一分位数和第三分位数；Among them, T _u represents the threshold of traffic volume, and Q1 and Q3 represent the first and third quantiles of the interquartile range, respectively;

根据统计学相关的理论，采用以上方式计算访问量阈值，能够有效检测出对任意一个异常URL存在异常访问的异常基站；According to the theory of statistics, using the above method to calculate the threshold of the traffic volume can effectively detect the abnormal base station that has abnormal access to any abnormal URL;

在一个可选的实施方式中，将异常基站集合划分为异常基站子集的方法包括：In an optional embodiment, the method for dividing the abnormal base station set into abnormal base station subsets includes:

其中，第一异常基站、第二异常基站和第三异常基站为同一恶意URL所对应的不同异常基站；Wherein, the first abnormal base station, the second abnormal base station and the third abnormal base station are different abnormal base stations corresponding to the same malicious URL;

在上述判断异常基站在在地里位置上相邻的方法中，对于纬度分别为

和

且经度差为Δλ的两个异常基站，可采用如下公式计算这两个伪基站之间的距离d：In the above method of judging that the abnormal base stations are adjacent in the ground, the latitudes are respectively

and

For two abnormal base stations whose longitude difference is Δλ, the following formula can be used to calculate the distance d between the two pseudo base stations:

在上述计算公式中，R表示地球半径；In the above calculation formula, R represents the radius of the earth;

基于以上判定方法，即使异常基站A和C之间的距离大于距离阈值，但是异常基站A与异常基站B相邻，且异常基站C与异常基站B相邻，仍然判定异常基站A和异常基站C相邻，异常基站A、B和C属于同一个异常基站子集；Based on the above determination method, even if the distance between the abnormal base stations A and C is greater than the distance threshold, but the abnormal base station A is adjacent to the abnormal base station B, and the abnormal base station C is adjacent to the abnormal base station B, the abnormal base station A and the abnormal base station C are still determined. Adjacent, abnormal base stations A, B and C belong to the same abnormal base station subset;

距离阈值根据实际环境中基站之间的距离设定，以保证能够准确获得伪基站的活动范围；根据基站基础知识可以知道，基站之间的距离一般在(300,1000)米的范围内，使用单步调优法，即对(30,1000)的值进行遍历取值，通过选定不同的距离阈值得到的恶意URL检测数量与人工判定做对比，最接近人工校验值的，便是最优距离阈值。The distance threshold is set according to the distance between base stations in the actual environment to ensure that the active range of the pseudo base station can be accurately obtained; according to the basic knowledge of base stations, it can be known that the distance between base stations is generally within the range of (300, 1000) meters. Single-step tuning method, that is, traversing the value of (30,1000), and comparing the number of malicious URL detections obtained by selecting different distance thresholds with the manual judgment, the closest value to the manual verification value is the most. optimal distance threshold.

根据大量的统计信息发现，移动终端用户在接收到伪基站设备群发的带有恶意URL链接的垃圾短信后，在某个时间段内会通过HTTP请求访问该恶意URL链接；众多受害用户在相应运营商基站覆盖区域对该URL链接发起的HTTP请求将在地理位置上呈现局部区域多用户密集访问，而周边稍远区域访问稀疏甚至没有的特征。本发明利用上述特征，统计目标时间间隔内每一个基站访问异常URL的访问量，并与访问量阈值进行对比，由此识别出恶意URL，并进一步识别出伪基站，能够准确、有效地确定伪基站的多个活动区域。According to a large amount of statistical information, it is found that after receiving spam messages with malicious URL links sent by pseudo base station equipment, mobile terminal users will access the malicious URL links through HTTP requests within a certain period of time; many victim users are operating correspondingly. The HTTP request initiated by the business base station coverage area for the URL link will present the characteristics of intensive access by multiple users in the local area, and sparse or even non-existent access in the surrounding area. The present invention utilizes the above features to count the number of visits of each base station accessing abnormal URLs within the target time interval, and compare it with the threshold of the visit volume, thereby identifying malicious URLs and further identifying fake base stations, and can accurately and effectively determine fake URLs. Multiple active areas of the base station.

本领域的技术人员容易理解，以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。Those skilled in the art can easily understand that the above are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention, etc., All should be included within the protection scope of the present invention.

Claims

1. a pseudo base station detection method based on network side user data, is characterized in that, comprises the steps:

(1) parse out the URL and the corresponding access information from the HTTP request data of multiple users of the mobile Internet, and filter the parsed URL according to the domain name whitelist of the current network, thereby obtaining the abnormal URL;

(2) For any abnormal URL, obtain its access volume threshold, and determine the base station whose access volume of the abnormal URL is greater than the access volume threshold within the target time interval as the abnormal base station, so as to obtain the abnormal base station corresponding to the abnormal URL gather;

The method for calculating the access threshold value comprises: for any abnormal URL, obtaining the identification code of each abnormal base station in the corresponding abnormal base station set and the access volume of the abnormal URL within the target time interval, and calculating Interquartile range; calculating the traffic threshold according to the first quantile and the third quantile of the interquartile range;

(3) For any abnormal URL, if the corresponding abnormal base station set is not empty, the abnormal URL is determined to be a malicious URL, and the abnormal base station set is divided so that the abnormal base stations adjacent to the geographical location belong to the same abnormal base station. A subset of abnormal base stations to obtain the active area of pseudo base stations;

In the step (3), the method for dividing the abnormal base station set into abnormal base station subsets includes:

Obtain the longitude and latitude of each abnormal base station in the set of abnormal base stations, and thereby calculate the distance between any two abnormal base stations;

If the distance between the first abnormal base station and the second abnormal base station is less than the distance threshold, it is determined that the first abnormal base station and the second abnormal base station are geographically adjacent;

If the first abnormal base station and the second abnormal base station are geographically adjacent to the third abnormal base station respectively, it is determined that the first abnormal base station and the second abnormal base station are geographically adjacent;

The abnormal base stations adjacent to the geographical location are divided into the same abnormal base station subset; wherein, the access threshold is the demarcation point between the abnormal base station and the non-abnormal base station to the URL access volume, and the starting moment of the target time interval is The moment when the corresponding abnormal URL is detected for the first time; the first abnormal base station, the second abnormal base station and the third abnormal base station are different abnormal base stations corresponding to the same malicious URL.

2. The pseudo base station detection method based on network side user data as claimed in claim 1, wherein the calculation formula of the access threshold is:

Th _u =Q3+1.5*(Q3-Q1);

Wherein, T _u represents the traffic threshold, and Q1 and Q3 represent the first quantile and the third quantile of the interquartile range, respectively.