WO2020258673A1 - 网络访问的异常判定方法、装置、服务器及其存储介质 - Google Patents

网络访问的异常判定方法、装置、服务器及其存储介质 Download PDF

Info

Publication number
WO2020258673A1
WO2020258673A1 PCT/CN2019/118551 CN2019118551W WO2020258673A1 WO 2020258673 A1 WO2020258673 A1 WO 2020258673A1 CN 2019118551 W CN2019118551 W CN 2019118551W WO 2020258673 A1 WO2020258673 A1 WO 2020258673A1
Authority
WO
WIPO (PCT)
Prior art keywords
network access
missing
feature set
data
combined
Prior art date
Application number
PCT/CN2019/118551
Other languages
English (en)
French (fr)
Inventor
黎立桂
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020258673A1 publication Critical patent/WO2020258673A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Definitions

  • This application relates to the technical field of security detection. Specifically, this application relates to a method, device, server, and storage medium for determining abnormality of network access.
  • the main means of endangering the security of web services include web crawlers, which simulate real users to visit the website. Under the interference of web crawlers, it is not easy for web crawlers to distinguish between web crawlers and normal users, and it is easy to make wrong judgments and make wrong responses.
  • the existing method is to identify the type of users based on the data of mouse clicks and drags when mobile phone users log on to the website.
  • the proportion of incorrect identification of user types caused by this method is still relatively high, and the identification results obtained through it still cannot achieve the effect of accurately distinguishing normal users from web crawlers.
  • this application provides a method for determining abnormality of network access, which includes the following steps:
  • the preset time interval collects the relevant characteristics of the terminal device according to the network access request, and forms a combined characteristic set about the terminal device according to the characteristics.
  • the characteristic includes the relevant characteristic value of the device type and the relevant characteristic of the system information. Value, the combined feature set and the feature value are in a non-linear relationship with each other;
  • the feature list includes the necessary features generated by the terminal device initiating a network access request
  • Comparing the features of the combined feature set with the set feature list to obtain missing items in the corresponding combined feature set, and composing missing data based on each missing item to obtain the validity of the corresponding network access request includes:
  • the data stratification of the missing data combination of the combined feature set according to the missing data composed of the type and quantity of the missing items includes:
  • data stratification is performed on the combined feature set.
  • the step of using the data layering to obtain the validity of the corresponding network access request includes:
  • the combined feature set of the sample to be tested is input into the lightgbm model for determination, and the abnormal probability of the combined feature set of the sample to be determined is obtained, and the validity of the corresponding network access request is obtained.
  • the method further includes:
  • the parameters num_leaves, min_data_in_leaf, and max_depth of the lightgbm model are automatically adjusted through the GridSearchCV network search, and the lightgbm model is adjusted and optimized.
  • the step of determining abnormal access to the network access request by using the validity includes:
  • the network access is an abnormal access.
  • this application also provides an abnormality determination device for network access, which includes:
  • the feature acquisition module is configured to collect various related features generated by the terminal device according to the network access request at a preset time interval, and form a combined feature set about the terminal device according to the feature, the feature including the related feature value of the device type And the related feature value of the system information, the combined feature set and the feature value are in a non-linear relationship with each other;
  • the comparison module is used to compare the features of the combined feature set with the set feature list to obtain the missing items of the corresponding combined feature set, where the feature list includes the information generated by the terminal device initiating a network access request Necessary features
  • the validity acquisition module is used to compose missing data based on each missing item to obtain the validity of the corresponding network access request
  • the judgment module is used to make use of the validity to determine abnormal access to the network access request
  • the comparison module is also used to compare the features of the feature set with the set feature list to obtain the type and number of missing items in the corresponding combined feature set;
  • the validity acquisition module is also used to perform data stratification on the missing data combination of the combined feature set according to the missing data composed of the type and quantity of the missing items; using the data stratification to obtain the corresponding The validity of the network access request.
  • the present application also provides a server, which includes: one or more processors, a memory, and one or more computer-readable instructions, wherein the one or more computer-readable instructions are stored in the memory And is configured to be executed by the one or more processors, and the one or more computer-readable instructions are configured to execute the method for determining the abnormality of network access according to the embodiment of the first aspect.
  • the present application also provides a computer-readable storage medium having computer-readable instructions stored on the computer-readable storage medium.
  • the computer-readable instructions are executed by a processor, the computer-readable instructions described in the first aspect are implemented.
  • the abnormal judgment method of network access is not limited to, but not limited to, Wi-Fi, Wi-Fi, Wi-Fi, Wi-Fi, Wi-Fi, Wi-Fi, Wi-Fi, Wi-Fi, Wi-Fi, Wi-Fi, etc.
  • the abnormal judgment method of network access When the computer-readable instructions are executed by a processor, the computer-readable instructions described in the first aspect are implemented. The abnormal judgment method of network access.
  • the method and device for determining abnormality of network access compares the features of the combined feature set of the network access request sent by the terminal device with the set feature list including the necessary features, and the result is obtained according to the comparison result. Combining the missing items of the feature set, thereby judging the validity of the network access request, and finally obtaining a judgment result of whether the corresponding network access request is abnormal.
  • Another technical solution is also provided, which trains and obtains the lightgbm model based on the data stratification performed by the combination of different missing categories of the combined feature set, and uses the lightgbm model as a judgment model to judge the Whether the network access is abnormal.
  • This solution can use the ability to identify diverse abnormal scenes, and as the sample size grows, it can cover more and more complex situations.
  • the technical solution provided by this application compares the existing features obtained in the network access request with the feature list containing the necessary features, and uses the necessary features that can reflect abnormal access as the basis for judgment, so as to use as little data processing as possible Get the best judgment result.
  • FIG. 1 is a diagram of an application environment for executing the abnormality determination solution for network access in an embodiment of the present application
  • Fig. 2 is a flowchart of a method for determining abnormality of network access according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of an abnormality determination device for network access according to an embodiment of the application.
  • FIG. 4 is a schematic structural diagram of a server according to an embodiment of the application.
  • terminal and “terminal equipment” used herein include both wireless signal receiver equipment, equipment that only has wireless signal receivers without transmitting capability, and equipment receiving and transmitting hardware.
  • Such equipment may include: cellular or other communication equipment, which has a single-line display or multi-line display or cellular or other communication equipment without a multi-line display; PCS (Personal Communications Service, personal communication system), which can combine voice, data processing, fax and/or data communication capabilities; PDA (Personal Digital Assistant, personal digital assistant), which can include radio frequency receivers, pagers, Internet/Intranet access, web browsers, notepads, calendars, and/or GPS (Global Positioning System (Global Positioning System) receiver; conventional laptop and/or palmtop computer or other device, which has and/or includes a radio frequency receiver, conventional laptop and/or palmtop computer or other device.
  • GPS Global Positioning System (Global Positioning System) receiver
  • conventional laptop and/or palmtop computer or other device which has and/or includes a radio frequency receiver, conventional laptop and/or palmtop computer or other device.
  • terminal and terminal equipment used here may be portable, transportable, installed in vehicles (aviation, sea and/or land), or suitable and/or configured to operate locally, and/or In a distributed form, it runs on the earth and/or any other location in space.
  • the "terminal” and “terminal equipment” used here can also be communication terminals, Internet terminals, music/video playback terminals, such as PDA, MID (Mobile Internet Device, mobile Internet device) and/or mobile phone with music/video playback function, it can also be a smart TV, set-top box and other devices.
  • the remote network device used here includes but is not limited to a computer, a network host, a single network server, a set of multiple network servers, or a cloud composed of multiple servers.
  • cloud is based on cloud computing (Cloud Computing) consists of a large number of computers or network servers.
  • cloud computing is a type of distributed computing, a super virtual computer composed of a group of loosely coupled computer sets.
  • the remote network equipment, terminal equipment and WNS server can communicate through any communication method, including but not limited to mobile communication based on 3GPP, LTE, WIMAX, and TCP/IP, UDP protocol-based mobile communications.
  • Computer network communication and short-range wireless transmission based on Bluetooth and infrared transmission standards.
  • Figure 1 is an application environment diagram of the embodiment of the application; in this embodiment, the technical solution of the application can be implemented on a server.
  • the terminal devices 110 and 120 can access the server through the internet 130.
  • the terminal device 110 and/or 120 sends a network request to the server 130, and the server 130 performs data interaction according to the network request.
  • the server 130 obtains the access data and attribute data of the terminal device 110 and/or 120 according to the request information of the terminal device 110 and/or 120, and determines abnormality of the terminal device according to the data.
  • FIG. 2 is a flowchart of a method for determining abnormality of network access according to an embodiment. The method includes the following steps:
  • S210 Collect each relevant feature generated by the terminal device according to the network access request at a preset time interval, and form a combined feature set about the terminal device according to the feature.
  • the server When the server interacts with the terminal device, it collects the relevant characteristics of the terminal device at intervals.
  • the interval formula is to collect related features according to the network request of the terminal device within a preset time interval, and form a combined feature set.
  • the relevant parameters of the terminal device are obtained.
  • the user sends registration and verification requests, and the front end uses JavaScript scripts to obtain the relevant characteristics of the terminal device, including device type (IPone, Mac, Andriod), system information (OS type, version, resolution), IP, etc.
  • the related feature values a combined feature set for the terminal device is formed according to the related feature values, and the feature values in the combined feature set may have a non-linear relationship with each other.
  • the feature may specifically include acquiring the feature browser language, pixel ratio, color depth, audio stack fingerprint, parameter information of the audio stack fingerprint, and the logical processor available to the user agent by the system through the front end.
  • the total number whether the cpu class is unknown, whether the browser plug-in is missing, whether the font list determined by JS/CSS is missing, whether the operating system is unknown, and whether the WebGL provider is missing.
  • the device type, brand, model, and operating system version number are obtained, and the brand and model of the terminal device currently issuing the network access request through the above analysis are associated with the same device brand and model in the basic library, Obtain feature information corresponding to the above content.
  • the basic library is the real information of the feature information of all device models obtained through authoritative websites.
  • the feature information values in each feature set are standardized.
  • the feature set of each access record obtained may include a variable with a percentile system and a variable with a value of 5 points. Only when all the data are standardized can they be compared in the same standard.
  • S220 Compare the features of the combined feature set with the set feature list to obtain the missing items of the corresponding combined feature set.
  • the feature about the network access request initiated by the terminal device is collected to form a corresponding feature list.
  • the feature list includes at least the necessary features generated by the terminal device initiating a network access request.
  • the necessary feature is that the corresponding real information can be found through the basic library for subsequent reference. Such as browser language, pixel ratio, the total number of logical processors available to the user agent of the system, CPU type, operating system, WebGL vendor and other information.
  • the corresponding feature is extracted from the combined feature set formed when the terminal device initiates a network access request, and the feature is compared with the feature information of the feature list. Because the features listed in the feature list are essential features. Therefore, if the network access request initiated by the terminal device is a normal network access request, the feature information of the feature list is generally included in the combined feature set.
  • the missing items of the corresponding combined feature set can be obtained after comparison.
  • S230 Compose missing data according to each missing item, and obtain the validity of the corresponding network access request.
  • the missing items constitute missing data on the corresponding combined feature set, and the missing data corresponds to the initiating network access request.
  • the missing data the validity of the corresponding network access request is obtained. If the missing data is 0, it means that the server can obtain the necessary characteristic information from the terminal device that initiated the network access request, and the corresponding effectiveness is the highest. According to the increase of the missing data, it directly affects the validity of the corresponding network method request.
  • the validity reflects the possibility of whether the network access request initiated by the terminal device is issued by the user's normal use, thereby determining whether the network access request is initiated by a web crawler.
  • the validity obtained on the basis of the above steps can be directly used to determine whether the network access request is a network access request issued by a web crawler or other abnormal user, and thus, whether the network access request is an abnormal access determination.
  • the method for determining abnormality of network access obtains a combined feature set of terminal equipment according to a network access request, and compares the combined feature set with a preset feature list including the necessary features for initiating the network access request By comparison, the missing items of the corresponding combined feature set are obtained, the validity of the network access request is obtained according to the missing items, and the judgment result of whether it is an abnormal access is obtained according to the validity.
  • the technical solution of the present application is compared with the set feature list to obtain the missing items of the combined feature set to determine whether the corresponding network access request is an abnormal access technical solution.
  • the network can only be initiated by the user.
  • the superficial phenomenon of the click and drag data generated during the access request can be used to identify the type of user.
  • step S220 it may further include:
  • the features of the feature set are compared with the set feature list to obtain the type and number of missing items in the corresponding combined feature set.
  • the features in the feature set are compared with the set feature list.
  • the method of comparison is to list and summarize the types of the features of the combined feature set, and the summary of the types corresponds to the features in the feature list one-to-one. If after one-to-one correspondence, some of the features in the feature list still do not correspond to the features of the combined feature set, the corresponding feature item is the missing item corresponding to the feature of the combined feature set, and the corresponding The type and number of missing items.
  • the two items corresponding to the operating system type and the WebGL provider in the feature list do not get the feature correspondence of the combined feature set, and the missing items of the combined feature set
  • the type distribution is the type of operating system and WebGL vendor, and the number is 2.
  • step S230 may include the following steps:
  • A1. Perform data stratification on the missing data combination of the combined feature set according to the missing data composed of the type and quantity of the missing item;
  • the missing data is mainly the Hardware data of the terminal device. Since the hardware data plays a basic role in the operation of the terminal device when the network access request is initiated, and the number of missing hardware data reaches 3, the corresponding missing degree can be rated as high, and the corresponding network access request The effectiveness is low.
  • the number of missing items with respect to the feature data also reaches 3, but the missing items are initiated on the terminal device
  • the necessity of the network access request is relatively low compared to the hardware data. Even if there are three missing items, the corresponding missing items cannot reach a high level. Therefore, the validity of the corresponding network access request is medium .
  • the necessity for the terminal device to initiate a network access request can be classified.
  • corresponding level division rules can be set. According to this rule, different missing data combinations are stratified.
  • step A1 the following steps may also be included:
  • A12. Perform data stratification on the combined feature set according to the combined missing data.
  • steps A11-A12 the combination is performed according to the type and number of missing items in the corresponding combination feature set. According to the combined missing data, data stratification of different arrangements is performed on the combined feature set.
  • the missing items of the combined feature set can be classified in a tree structure.
  • different root nodes represent different categories, and each root node can be divided into two sub-nodes, and each sub-node is a sub-category of the corresponding category of the root node.
  • each category can be placed at the root node at different positions to form different data layers.
  • the type of the missing item may include at least whether it is a browser information missing, and whether there are more than two missing items. If the first level is set to whether the browser information is missing, the second level is set to whether there are more than 2 missing items; and the first level is set to whether there are more than 2 missing items, and the second level is set to the browser If the information is missing, the data layers of the corresponding combined feature sets obtained by the two groups are different, that is, the corresponding tree structure obtained is different.
  • step A2 may further include the following steps:
  • A21 Perform data stratification according to the combined missing data of the combined feature set, train and obtain a lightgbm model
  • A22 Input the combined feature set of the sample to be tested into the lightgbm model for determination, obtain the abnormal probability of the combined feature set of the sample to be determined, and obtain the validity of the corresponding network access request.
  • the lightgbm model is hierarchically substituted according to the different data obtained from the above step A12, and the lightgbm model is trained to obtain the parameters of the lightgbm model, such as num_leaves, min_data_in_leaf, max_depth.
  • num_leaves represents the maximum number of leaves of the tree structure
  • the data information of the feature set of the sample to be determined is input into the lightgbm model, and abnormality determination is performed corresponding to the network access request initiated by the terminal device.
  • the abnormal probability of the combined feature set of the sample to be determined is obtained.
  • the abnormal probability is used to characterize the probability that the network access request initiated by the sample to be determined is an abnormal user access, that is, it can directly reflect the effectiveness of normal user network access. When the abnormal probability is higher, the effectiveness of the corresponding network access request is lower.
  • the sample to be determined is a network access request initiated by the terminal device to be determined.
  • the lightgbm model is used as a determination model to determine whether the network access is an abnormal access technical solution, which can identify diverse abnormal scenarios, and as the sample size grows, more and more complex situations can be covered .
  • step A21 use GridSearchCV network search to automatically adjust the parameters of the lightgbm model, and the involved parameters include the aforementioned parameters num_leaves, min_data_in_leaf, and max_depth in the lightgbm model.
  • the lightgbm model is adjusted and optimized, which improves the accuracy of the abnormal determination of the network access request initiated by the corresponding terminal device.
  • step S240 it may include:
  • the network access is an abnormal access.
  • the lightgbm model is used according to step A22 to obtain the abnormal probability of the network access request initiated by the corresponding terminal device.
  • the judgment threshold is a critical point representing the probability that the terminal device is initiating a normal network access request. When the abnormal probability exceeds the range defined by the preset threshold, it is more likely to determine that the network access is an abnormal access, so as to obtain a determination result that the network access initiated by the corresponding terminal device is an abnormal access.
  • the server directly rejects the request or re-requires the terminal device to perform access verification; if the network request currently initiated by the terminal device is determined to be a normal access request, then Respond directly to requests.
  • the combined feature set may also include:
  • Effective derivative feature information for identifying outliers obtained by performing measurement data distribution calculation on the feature values of the combined feature set.
  • an effective derivative feature for identifying outliers can be obtained.
  • corresponding effective derivative features are added to the feature list, so as to compare with the effective derivative features of the combined feature set to obtain the effective derivative missing items for use.
  • the metric data distribution calculation includes calculating the range, the quartile, the quartile range, and the five-number summary corresponding to the characteristic information data, and the five-number summary is the minimum, the upper quartile, and the median in order. , Lower quartile, maximum value.
  • the combined feature set of the sample to be tested can be compared more comprehensively, thereby further improving the judging ability of the abnormal judging method for network access.
  • an embodiment of the present application also provides a device for determining abnormality of network access, as shown in FIG. 3, including:
  • the feature acquisition module 310 is configured to collect various related features generated by a terminal device according to a network access request at a preset time interval, and form a combined feature set about the terminal device according to the features;
  • the comparison module 320 is configured to compare the features of the combined feature set with the set feature list to obtain the missing items of the corresponding combined feature set;
  • the validity obtaining module 330 is used to compose missing data according to each missing item to obtain the validity of the corresponding network access request;
  • the determination module 340 is configured to use the validity to determine abnormal access to the network access request.
  • FIG. 4 is a schematic diagram of the internal structure of the server in an embodiment.
  • the server includes a processor 410, a storage medium 420, a memory 430, and a network interface 440 connected through a system bus.
  • the storage medium 420 of the server stores an operating system, a database, and computer-readable instructions.
  • the database may store control information sequences.
  • the processor 410 can implement a network
  • the processor 410 can implement the functions of the feature acquisition module 310, the comparison module 320, the validity acquisition module 330, and the determination module 340 in a network access abnormality determination device in the embodiment shown in FIG.
  • the processor 410 of the server is used to provide computing and control capabilities to support the operation of the entire server.
  • the memory 430 of the server may store computer-readable instructions. When the computer-readable instructions are executed by the processor 410, the processor 410 can execute a method for determining an abnormality of network access.
  • the network interface 440 of the server is used to connect and communicate with the terminal.
  • this application also proposes a storage medium storing computer-readable instructions.
  • the storage medium of the computer-readable instructions may be a non-volatile readable storage medium.
  • one or more processors are made to perform the following steps: collect each relevant feature generated by the terminal device according to the network access request at a preset time interval, and form a combined feature about the terminal device according to the feature Set; compare the features of the combined feature set with the set feature list to obtain the missing items of the corresponding combined feature set; compose missing data according to each missing item to obtain the validity of the corresponding network access request; use the Validity, the abnormal access is determined for the network access request.
  • the method, device, server and storage medium for determining abnormality of network access provided by this application compare the characteristics of the combined feature set of the network access request sent by the terminal device with the set feature list including the necessary features, according to The result of the comparison obtains the missing items of the combined feature set, from which the validity of the network access request is judged, and finally the judgment result of whether the corresponding network access request is abnormal is obtained.
  • Another technical solution is also provided, which trains and obtains the lightgbm model based on the data stratification of the combined missing data of the combined feature set, and uses the lightgbm model as a judgment model to judge the Whether the network access is abnormal.
  • This solution can use the ability to identify diverse abnormal scenes, and as the sample size grows, it can cover more and more complex situations.
  • the technical solution provided by this application compares the existing features obtained in the network access request with the feature list containing the necessary features, and uses the necessary features that can reflect abnormal access as the basis for judgment, so as to use as little data processing as possible Get the best judgment result.
  • the method, device, server, and storage medium of abnormal access in this application through the network use a feature list that can reflect normal network access. After comparison, it is easy to obtain the abnormal access determination result, which solves the existing problem.
  • abnormal access can only be determined by clicking and dragging data when the user initiates a network access request.
  • the technical problem with a high error rate improves the ability to determine abnormal access to terminal equipment.
  • the aforementioned storage medium may be a magnetic disk, an optical disk, a read-only storage memory (Read-Only Memory, ROM) and other storage media, or random access memory (Random Access Memory, RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer And Data Communications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请为安全检测技术领域,本申请提供一种网络访问的异常判定方法、装置、服务器及其存储介质,所述方法包括预设时间间隔采集终端设备根据网络访问请求所产生的各个相关的特征,根据所述特征形成关于所述终端设备的组合特征集;将所述组合特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项;根据各个缺失项组成缺失数据,得到对应的网络访问请求的有效性;利用所述有效性,对所述网络访问请求进行异常访问的判定。该方法有利于提高对终端设备当前网络访问的异常判定能力。

Description

网络访问的异常判定方法、装置、服务器及其存储介质
本申请要求于2019年06月28日提交中国专利局、申请号为201910578452.0、发明名称为“网络访问的异常判定方法、装置、服务器及其存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及安全检测技术领域,具体而言,本申请涉及一种网络访问的异常判定方法、装置、服务器及其存储介质。
背景技术
随着各种服务趋向利用网络途径提供,网络的安全性日益受到了更为广泛的关注。在目前危害网络服务安全的主要手段包括网络爬虫,网络爬虫模拟真实用户对网站进行访问。在网络爬虫的干扰下,网站的服务器不容易分辨网络爬虫和正常用户,容易进行错误的判别,从而做出错误的反应。针对上述网络安全问题,现有的方法是通过手机用户在登录网站时关于鼠标的点击和拖动的数据,来判别用户的种类。但该方法所导致对用户类型的错误判别的比例仍较高,通过其得到的判别的结果,仍无法达到准确区分正常用户和网络爬虫的效果。
发明内容
为克服以上技术问题,特别是现有技术中通过终端设备登录网络时用户的使用痕迹数据容易将真实用户判别为异常用户的问题,特提出以下技术方案:
第一方面,本申请提供一种网络访问的异常判定方法,其包括以下步骤:
预设时间间隔采集终端设备根据网络访问请求所产生的各个相关的特征,根据所述特征形成关于所述终端设备的组合特征集,所述特征包括设备类型的相关特征值和系统信息的相关特征值,所述组合特征集与所述特征值互为非线性关系;
将所述组合特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项;
根据各个缺失项组成缺失数据,得到对应的网络访问请求的有效性;
利用所述有效性,对所述网络访问请求进行异常访问的判定;
其中,所述特征列表包括所述终端设备发起网络访问请求所产生的必要特征;
所述将所述组合特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项,根据各个缺失项组成缺失数据,得到对应的网络访问请求的有效性,包括:
将所述特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项的类型和数量;
根据所述缺失项的类型和数量组成的缺失数据,对所述组合特征集的缺失数据组合进行数据分层;
利用所述数据分层,得到对应的网络访问请求的有效性。
在其中一个实施例中,所述根据所述缺失项的类型和数量组成的缺失数据,对所述组合特征集的缺失数据组合进行数据分层,包括:
对所述缺失项的类型和数量组成的缺失数据进行组合;
根据所述组合后的缺失数据,对所述组合特征集的进行数据分层。
在其中一个实施例中,所述利用所述数据分层,得到对应的网络访问请求的有效性的步骤,包括:
根据所述组合特征集的组合后的缺失数据进行的数据分层,训练并得到lightgbm模型;
将待检样本的组合特征集输入所述lightgbm模型进行判定,得到所述待判定样本的组合特征集的异常概率,得到对应网络访问请求的有效性。
在其中一个实施例中,所述根据所述组合特征集的组合后的缺失数据进行的数据分层,训练并得到lightgbm模型的步骤之后,还包括:
通过GridSearchCV网络搜索对所述lightgbm模型的参数num_leaves、min_data_in_leaf、max_depth进行自动调参,对所述lightgbm模型进行调节优化。
在其中一个实施例中,所述利用所述有效性,对所述网络访问请求进行异常访问的判定的步骤,包括:
当所述有效性所依据的异常概率大于预设的阈值,判定所述网络访问为异常访问。
第二方面,本申请还提供一种网络访问的异常判定装置,其包括:
特征获取模块,用于预设时间间隔采集终端设备根据网络访问请求所产生的各个相关的特征,根据所述特征形成关于所述终端设备的组合特征集,所述特征包括设备类型的相关特征值和系统信息的相关特征值,所述组合特征集与所述特征值互为非线性关系;
对比模块,用于将所述组合特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项,其中,所述特征列表包括所述终端设备发起网络访问请求所产生的必要特征;
有效性获取模块,用于根据各个缺失项组成缺失数据,得到对应的网络访问请求的有效性;
判定模块,用于利用所述有效性,对所述网络访问请求进行异常访问的判定;
其中,所述对比模块,还用于将所述特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项的类型和数量;
所述的有效性获取模块,还用于根据所述缺失项的类型和数量组成的缺失数据,对所述组合特征集的缺失数据组合进行数据分层;利用所述数据分层,得到对应的网络访问请求的有效性。
第三方面,本申请还提供一种服务器,其包括:一个或多个处理器,存储器,一个或多个计算机可读指令,其中所述一个或多个计算机可读指令被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个计算机可读指令配置用于执行第一方面实施例所述的网络访问的异常判定方法。
第四方面,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,该计算机可读指令被处理器执行时实现第一方面实施例所述的网络访问的异常判定方法。
本申请提供的一种网络访问的异常判定方法和装置,通过将终端设备发送的网络访问请求的组合特征集的特征与设定的包括必要特征的特征列表进行对比,根据对比的结果得到所述组合特征集的缺失项,由此对所述网络访问请求的有效性进行判断,最终得到对应的网络访问请求是否为异常的判定结果。
在此基础上,还提供另一技术方案,根据所述组合特征集的不同缺失类别的组合进行的数据分层,训练并得到lightgbm模型,并利用所述lightgbm模型作为判定模型,以判断所述网络访问是否为异常访问。该方案能够利用能够识别多样性的异常场景,并且随着样本量增长,可以覆盖更多、更复杂的情况。
本申请所提供的技术方案通过网络访问请求中得到的现有的特征与包含了必要特征的特征列表进行对比,通过对能体现异常访问的必要特征作为判断的基础,以便利用尽量少的数据处理得到最好的判定效果。
本申请附加的方面和优点将在下面的描述中部分给出,这些将从下面的描述中变得明显,或通过本申请的实践了解到。
附图说明
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1是本申请中的实施例执行所述网络访问的异常判定方案的应用环境图;
图2是本申请中的一个实施例的网络访问的异常判定方法的流程图;
图3为本申请中的一个实施例的网络访问的异常判定装置的示意图;
图4为本申请中的一个实施例的服务器的结构示意图。
具体实施方式
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本申请,而不能解释为对本申请的限制。
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。
本技术领域技术人员可以理解,除非另外定义,这里使用的所有术语(包括技术术语和科学术语),具有与本申请所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是,诸如通用字典中定义的那些术语,应该被理解为具有与现有技术的上下文中的意义一致的意义,并且除非像这里一样被特定定义,否则不会用理想化或过于正式的含义来解释。
本技术领域技术人员可以理解,这里所使用的“终端”、“终端设备”既包括无线信号接收器的设备,其仅具备无发射能力的无线信号接收器的设备,又包括接收和发射硬件的设备,其具有能够在双向通讯链路上,执行双向通讯的接收和发射硬件的设备。这种设备可以包括:蜂窝或其他通讯设备,其具有单线路显示器或多线路显示器或没有多线路显示器的蜂窝或其他通讯设备;PCS(Personal Communications Service,个人通讯系统),其可以组合语音、数据处理、传真和/或数据通讯能力;PDA(Personal Digital Assistant,个人数字助理),其可以包括射频接收器、寻呼机、互联网/内联网访问、网络浏览器、记事本、日历和/或GPS(Global Positioning System,全球定位系统)接收器;常规膝上型和/或掌上型计算机或其他设备,其具有和/或包括射频接收器的常规膝上型和/或掌上型计算机或其他设备。这里所使用的“终端”、“终端设备”可以是便携式、可运输、安装在交通工具(航空、海运和/或陆地)中的,或者适合于和/或配置为在本地运行,和/或以分布形式,运行在地球和/或空间的任何其他位置运行。这里所使用的“终端”、“终端设备”还可以是通讯终端、上网终端、音乐/视频播放终端,例如可以是PDA、MID(Mobile Internet Device,移动互联网设备)和/或具有音乐/视频播放功能的移动电话,也可以是智能电视、机顶盒等设备。
本技术领域技术人员可以理解,这里所使用的远端网络设备,其包括但不限于计算机、网络主机、单个网络服务器、多个网络服务器集或多个服务器构成的云。在此,云由基于云计算(Cloud Computing)的大量计算机或网络服务器构成,其中,云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。本申请的实施例中,远端网络设备、终端设备与WNS服务器之间可通过任何通讯方式实现通讯,包括但不限于,基于3GPP、LTE、WIMAX的移动通讯、基于TCP/IP、UDP协议的计算机网络通讯以及基于蓝牙、红外传输标准的近距无线传输方式。
参考图1所示,图1是本申请实施例方案的应用环境图;该实施例中,本申请技术方案可以基于服务器上实现,如图1中,终端设备110和120可以通过internet网络访问服务器130,终端设备110和/或120向服务器130发出的网络请求,服务器130根据网络请求进行数据交互。在进行数据交互时,服务器130根据终端设备110和/或120的请求信息获取终端设备110和/或120的访问数据和属性数据,并根据该数据对该终端设备进行异常判定。
为了解决目前判定异常数据容易将真实用户判别为异常用户的问题,本申请提供了一种网络访问的异常判定方法。可参考图2,图2是一个实施例的网络访问的异常判定方法的流程图,该方法包括以下步骤:
S210、预设时间间隔采集终端设备根据网络访问请求所产生的各个相关的特征,根据所述特征形成关于所述终端设备的组合特征集。
服务器与终端设备进行数据交互的时候,间隔式地对终端设备的各个相关的特征进行采集。该间隔式是在预设时间间隔内,根据终端设备的网络请求进行相关的特征采集,并形成一组合特征集。
根据终端设备发出的网络请求,获取该终端设备的相关参数。在该步骤中,用户通过发送注册、验证请求,前端利用JavaScript脚本获取终端设备的相关的特征,包括设备类型(IPone、Mac、Andriod)、系统信息(OS类型、版本、分辨率)、IP等的多个相关特征值,根据该相关特征值形成关于该终端设备的组合特征集,该组合特征集中的特征值之间可相互为非线性关系。
在本实施例中,所述特征可以具体包括通过前端获取设备的特征浏览器语言、像素比、颜色深度、音频堆栈指纹是否提供、音频堆栈指纹的参数信息、系统对用户代理可用的逻辑处理器总数、cpu类是否未知、浏览器插件是否缺失、使用JS/CSS判定到的字体列表是否缺失、操作系统是否为unknown、WebGL供应商是否缺失。通过解析user_agent中的字符串信息,获取设备的类型、品牌、型号、操作系统版本号,通过上述解析的当前发出网络访问请求的终端设备的品牌及型号关联基础库中相同的设备品牌及型号,得到对应上述内容对应的特征信息。其中,基础库是通过权威网站获取的所有设备型号的特征信息的真实信息。
进一步地,为了消除变量间的量纲关系,从而使数据具有可比性,在对特征值标注之前,对各个特征集中的特征信息值进行标准化。例如,在得到的每一次访问记录的特征集中可能包括百分制的变量与一个5分值的变量,只有将所有的数据标准化,才能够在同一标准中进行比较。
S220、将所述组合特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项。
在该步骤中,根据终端设备所发起的网络访问请求的历史特征信息,收集关于所述终端设备发起网络访问请求的特征,形成对应的特征列表。所述特征列表至少包括所述终端设备发起网络访问请求所产生的必要特征。所述必要特征是能够通过所述基础库能找到相应的真实信息,以便后续作为参照。如浏览器语言、像素比、系统对用户代理可用的逻辑处理器总数、cpu类型、操作系统、WebGL供应商等信息。
将所述终端设备在发起网络访问请求时所形成的所述组合特征集提取相应的特征,将所述特征与所述特征列表的特征信息进行对比。由于所述特征列表所列举的是必要特征。因此,如果所述终端设备所发起的网络访问请求是正常的网络访问请求时,所述特征列表的特征信息一般包含在所述组合特征集中。
因此,通过所述特征列表的设置方式,根据对比后,能得到对应的组合特征集的缺失项。
S230、根据各个缺失项组成缺失数据,得到对应的网络访问请求的有效性。
在步骤S220得到缺失项的基础上,所述缺失项组成关于对应的组合特征集的缺失数据,该缺失数据与发起网络访问请求相对应。根据所述缺失数据,得到对应网络访问请求的有效性。如果所述缺失数据为0,即表征服务器能从发起网络访问请求的终端设备能获取必要的特征信息,对应的有效性是最高的。根据所述缺失数据的增加,其直接影响对应的网络方法请求的有效性。
在本实施例中,所述有效性是体现所述终端设备所发起的网络访问请求是否由用户的正常使用所发出的可能性,借此判定所述网络访问请求是否由网络爬虫所发起的。
S240、利用所述有效性,对所述网络访问请求异常访问的判定。
在上述步骤的基础上得到的有效性,可以直接作为判定所述网络访问请求是否为网络爬虫或者其他非正常用户所发出的网络访问请求,由此,对所述网络访问请求是否为异常访问进行判定。
本申请所提供的一种网络访问的异常判定方法,根据网络访问请求得到关于终端设备的组合特征集,并将所述组合特征集与预设的包括发起网络访问请求的必要特征的特征列表进行对比,得到对应组合特征集的缺失项,根据所述缺失项得到所述网络访问请求的有效性,根据所述有效性得到是否为异常访问的判定结果。本申请的技术方案通过与设定特征列表的对比,得到所述组合特征集的缺失项来判定对应的网络访问请求是否为异常访问的技术方案,与现有技术中只能通过用户的发起网络访问请求时产生的点击和拖动的数据的表面现象来判别用户的种类,从而进行异常访问的判定方法相比,能从异常访问所造成的根本现象出发,利用根本现象所产生的特征数据进行处理并对数据处理的结果进行判定,这样能以尽量少的数据对比处理得到高的判定结果,准确率得到提高。
对于所述步骤S220,可进一步包括:
将所述特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项的类型和数量。
在该步骤中,将所述特征集中的特征与设定的特征列表进行对比。该对比的方式为将所述组合特征集的特征的类型进行列举并汇总,并将类型的汇总与所述特征列表中的特征一一对应。若一一对应后,所述特征列表中的部分特征依然没有所述组合特征集的特征与其相对应,那对应的特征的项为对应所述组合特征集的特征的缺失项,由此得到相应的缺失项的类型以及数量。
例如,经过所述组合特征集的特征对比,对应所述特征列表中的操作系统的类型、WebGL供应商的两项没有得到所述组合特征集的特征对应,对于的组合特征集的缺失项的类型分布为操作系统的类型和WebGL供应商,且数量为2。
在所述对步骤S220进一步限定的基础上,所述步骤S230可包括以下步骤:
A1、根据所述缺失项的类型和数量组成的缺失数据,对所述组合特征集的缺失数据组合进行数据分层;
A2、利用所述数据分层,得到对应的网络访问请求的有效性。
对于步骤A1-A2,可具体为所述缺失数据主要集中在其中的某类型数据为所述缺失项的类型为CPU类、起显示器的像素比和硬盘类型,那么所述缺失数据主要为所述终端设备的硬件数据。由于在发起网络访问请求时,所述硬件数据对于终端设备的运行过程起着基础作用,而且关于所述硬件数据的缺失数量达到3个,对应的缺失程度可评为高,对应的网络访问请求的有效性为低。
若所述缺失数据的缺失项的类型分别分散于浏览器语言、浏览器插件和音频堆栈指纹等信息,关于所述特征数据的缺失数量同样达到3个,但所述缺失项的在终端设备发起网络访问请求的必要性相对于硬件数据来说必要性相对较低,即使同样具备3项缺失项,但对应的缺失程度未能达到高的级别,因此,对应的网络访问请求的有效性为中。
在所述特征列表的特征项中,可以对终端设备发起网络访问请求的必要性进行分级。同时,针对所述缺失数据包括必要特征的等级和对应的数量,可以设定相应的等级划分规则。根据该规则对不同的缺失数据组合进行数据分层。
对于上述步骤A1,还可包括以下步骤:
A11、对所述缺失项的类型和数量组成的缺失数据进行组合;
A12、根据所述组合后的缺失数据,对所述组合特征集的进行数据分层。
在步骤A11-A12中,根据对应的组合特征集的缺失项的类型和数量,进行组合。根据所述组合后的缺失数据,对所述组合特征集进行不同排布的数据分层。
具体地,可以对所述组合特征集的缺失项进行树状结构的分类设置。在该树状结构中,不同的根结点代表不同的类别,每个根结点下可以分成两个子结点,每个子结点为其根结点对应类别的子分类。
对于一个所述组合特征集,可以将各个类别放置于不同位置的根结点,形成不同的数据分层。
再以所述关于所述缺失数据的缺失项的类型分别分散于浏览器语言、浏览器插件和音频堆栈指纹的实施例为例进行说明:
该实施例中,关于所述缺失项的类型至少可以包括是否为浏览器信息缺失、缺失项是否超过2项。若将第一级设定为是否为浏览器信息缺失,第二级设定为缺失项超过2项;以及将第一级设定为缺失项是否超过2项,第二级设定为浏览器信息缺失,两组所得到对应的组合特征集的数据分层是不同的,即所得到的对应的树状结构是不同的。
在步骤A11-A12的基础上,对应步骤A2可以进一步包括以下步骤:
A21、根据所述组合特征集的组合后的缺失数据进行数据分层,训练并得到lightgbm模型;
A22、将待检样本的组合特征集输入所述lightgbm模型进行判定,得到所述待判定样本的组合特征集的异常概率,得到对应网络访问请求的有效性。
在步骤A21-A22中,根据从上述步骤A12得到的不同的数据分层代入lightgbm模型,并对所述lightgbm模型进行训练,得到所述lightgbm模型中的参数,如num_leaves、min_data_in_leaf、max_depth。具体的过程,可以先通过对上述的参数进行预设值设定,代入上述举例的不同的数据分层,对上述参数进行中重新调整,由此得到训练好的lightgbm模型。
关于所述lightgbm模型的参数的意义如下:
由于num_leaves是代表树状结构的最大叶子数,用于调节树的复杂程度通常取值为<= 2 ^(max_depth);min_data_in_leaf:其取值取决于训练数据的样本个数和num_leaves,将其设置较大可以避免生成一个过深的树状结构;max_depth代表树状结构的最大深度。
根据所述步骤A21得到训练好的所述lightgbm模型后,将待判定样本的特征集的数据信息输入至该lightgbm模型,并进行对应终端设备所发起的网络访问请求进行异常判定。根据所述lightgbm模型,得到该待判定样本的组合特征集的异常概率。所述异常概率用于表征该待判定样本所所发起的网络访问请求为非正常用户访问的概率,即可以直接体现正常用户网络访问的有效性。当异常概率越高,对应的网络访问请求的有效性越低。
所述待判定样本为待判定的终端设备所发起的网络访问请求。
使用了所述lightgbm模型作为判定模型,以判断所述网络访问是否为异常访问的技术方案,能够利用能够识别多样性的异常场景,并且随着样本量增长,可以覆盖更多、更复杂的情况。
在所述步骤A21之后,利用GridSearchCV网络搜索对所述lightgbm模型的参数进行自动调参,所涉及到的参数包括上述提到的关于所述lightgbm模型中的参数num_leaves、min_data_in_leaf、max_depth。经过对上述参数进行调整后,完成了对所述lightgbm模型进行了调节优化,提高了对对应的终端设备所发起的网络访问请求的异常判定的准确性。
对于所述步骤S240,可包括:
当所述有效性所依据的异常概率大于预设的阈值,判定所述网络访问为异常访问。
在该步骤中,根据步骤A22利用所述lightgbm模型得到对应终端设备所发起的网络访问请求的异常概率。所述判断阈值是表现所述终端设备在发起正常网络访问请求的概率的临界点。当所述异常概率超出了预设的阈值所限定的范围时,判定所述网络访问为异常访问的可能性较大,以此得到在对应的终端设备发起的网络访问为异常访问的判定结果。
对于所述终端设备当前发起的网络请求被判定为异常访问请求,服务器直接拒绝请求或重新要求所述终端设备进行访问验证;若所述终端设备当前发起的网络请求被判定为正常访问请求,则直接响应请求。
另外,对于所述组合特征集还可以包括:
通过对所述组合特征集的特征的值进行度量数据散布计算得到的识别离群点的有效衍生特征信息。
根据所述度量数据散布计算,可以得到识别离群点的有效衍生特征。对应地,所述特征列表也增加相应的有效衍生特征,以便于与所述组合特征集的有效衍生特征进行对比,得到对用的有效衍生缺失项。
所述度量数据散布计算包括对应特征信息数据计算极差、四分位数、四分位数极差、五数概括,所述五数概括按次序为最小值、上四分位、中位数、下四分位数、最大值。
通过在所述特征列表中增加对比的特征,使所述待检样本的组合特征集能得到更全面的比较,从而进一步提高所述网络访问的异常判定方法的判定能力。
基于与上述网络访问的异常判定方法相同的发明构思,本申请实施例还提供了一种网络访问的异常判定装置,如图3所示,包括:
特征获取模块310,用于预设时间间隔采集终端设备根据网络访问请求所产生的各个相关的特征,根据所述特征形成关于所述终端设备的组合特征集;
对比模块320,用于将所述组合特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项;
有效性获取模块330,用于根据各个缺失项组成缺失数据,得到对应的网络访问请求的有效性;
判定模块340,用于利用所述有效性,对所述网络访问请求进行异常访问的判定。
请参考图4,图4为一个实施例中服务器的内部结构示意图。如图4所示,该服务器包括通过系统总线连接的处理器410、存储介质420、存储器430和网络接口440。其中,该服务器的存储介质420存储有操作系统、数据库和计算机可读指令,数据库中可存储有控件信息序列,该计算机可读指令被处理器410执行时,可使得处理器410实现一种网络访问的异常判定方法,处理器410能实现图3所示实施例中的一种网络访问的异常判定装置中的特征获取模块310、对比模块320、有效性获取模块330和判定模块340的功能。该服务器的处理器410用于提供计算和控制能力,支撑整个服务器的运行。该服务器的存储器430中可存储有计算机可读指令,该计算机可读指令被处理器410执行时,可使得处理器410执行一种网络访问的异常判定方法。该服务器的网络接口440用于与终端连接通信。本领域技术人员可以理解,图4中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的服务器的限定,具体的服务器可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请还提出了一种存储有计算机可读指令的存储介质,该计算机可读指令的存储介质可以为非易失性可读存储介质,该计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:预设时间间隔采集终端设备根据网络访问请求所产生的各个相关的特征,根据所述特征形成关于所述终端设备的组合特征集;将所述组合特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项;根据各个缺失项组成缺失数据,得到对应的网络访问请求的有效性;利用所述有效性,对所述网络访问请求进行异常访问的判定。
综合上述实施例可知,本申请在于:
本申请提供的一种网络访问的异常判定方法、装置、服务器及其存储介质,通过将终端设备发送的网络访问请求的组合特征集的特征与设定的包括必要特征的特征列表进行对比,根据对比的结果得到所述组合特征集的缺失项,由此对所述网络访问请求的有效性进行判断,最终得到对应的网络访问请求是否为异常的判定结果。
在此基础上,还提供另一技术方案,根据所述组合特征集的组合后的缺失数据进行的数据分层,训练并得到lightgbm模型,并利用所述lightgbm模型作为判定模型,以判断所述网络访问是否为异常访问。该方案能够利用能够识别多样性的异常场景,并且随着样本量增长,可以覆盖更多、更复杂的情况。
本申请所提供的技术方案通过网络访问请求中得到的现有的特征与包含了必要特征的特征列表进行对比,通过对能体现异常访问的必要特征作为判断的基础,以便利用尽量少的数据处理得到最好的判定效果。
综上,本申请通过网络访问的异常判定方法、装置、服务器及其存储介质,通过利用了能体现验证正常网络访问的特征列表,经过对比后,容易得到异常访问的判定结果,解决了现有技术中只能通过用户的发起网络访问请求时的点击和拖动的数据进行异常访问的判定,错误率较高的技术问题,提高了对终端设备异常访问的判定能力。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等存储介质,或随机存储记忆体(Random Access Memory,RAM)等。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种网络访问的异常判定方法,其中,包括以下步骤:
    预设时间间隔采集终端设备根据网络访问请求所产生的各个相关的特征,根据所述特征形成关于所述终端设备的组合特征集,所述特征包括设备类型的相关特征值和系统信息的相关特征值,所述组合特征集与所述特征值互为非线性关系;
    将所述组合特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项;
    根据各个缺失项组成缺失数据,得到对应的网络访问请求的有效性;
    利用所述有效性,对所述网络访问请求进行异常访问的判定;
    其中,所述特征列表包括所述终端设备发起网络访问请求所产生的必要特征;
    所述将所述组合特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项,根据各个缺失项组成缺失数据,得到对应的网络访问请求的有效性,包括:
    将所述特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项的类型和数量;
    根据所述缺失项的类型和数量组成的缺失数据,对所述组合特征集的缺失数据组合进行数据分层;
    利用所述数据分层,得到对应的网络访问请求的有效性。
  2. 根据权利要求1所述的方法,其中,所述根据所述缺失项的类型和数量组成的缺失数据,对所述组合特征集的缺失数据组合进行数据分层,包括:
    对所述缺失项的类型和数量组成的缺失数据进行组合;
    根据所述组合后的缺失数据,对所述组合特征集的进行数据分层。
  3. 根据权利要求1所述的方法,其中,所述利用所述数据分层,得到对应的网络访问请求的有效性的步骤,包括:
    根据所述组合特征集的组合后的缺失数据进行的数据分层,训练并得到lightgbm模型;
    将待检样本的组合特征集输入所述lightgbm模型进行判定,得到所述待判定样本的组合特征集的异常概率,得到对应网络访问请求的有效性。
  4. 根据权利要求3所述的方法,其中,所述根据所述组合特征集的组合后的缺失数据进行的数据分层,训练并得到lightgbm模型的步骤之后,还包括:
    通过GridSearchCV网络搜索对所述lightgbm模型的参数num_leaves、min_data_in_leaf、max_depth进行自动调参,对所述lightgbm模型进行调节优化。
  5. 根据权利要求3所述的方法,其中,所述利用所述有效性,对所述网络访问请求进行异常访问的判定的步骤,包括:
    当所述有效性所依据的异常概率大于预设的阈值,判定所述网络访问为异常访问。
  6. 一种网络访问的异常判定装置,其中,包括:
    特征获取模块,用于预设时间间隔采集终端设备根据网络访问请求所产生的各个相关的特征,根据所述特征形成关于所述终端设备的组合特征集,所述特征包括设备类型的相关特征值和系统信息的相关特征值,所述组合特征集与所述特征值互为非线性关系;
    对比模块,用于将所述组合特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项,其中,所述特征列表包括所述终端设备发起网络访问请求所产生的必要特征;
    有效性获取模块,用于根据各个缺失项组成缺失数据,得到对应的网络访问请求的有效性;
    判定模块,用于利用所述有效性,对所述网络访问请求进行异常访问的判定;
    其中,所述对比模块,还用于将所述特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项的类型和数量;
    所述的有效性获取模块,还用于根据所述缺失项的类型和数量组成的缺失数据,对所述组合特征集的缺失数据组合进行数据分层;利用所述数据分层,得到对应的网络访问请求的有效性。
  7. 如权利要求6所述的异常判定装置,其中,所述的有效性获取模块还用于:
    对所述缺失项的类型和数量组成的缺失数据进行组合;
    根据所述组合后的缺失数据,对所述组合特征集的进行数据分层。
  8. 如权利要求6所述的异常判定装置,其中,所述的有效性获取模块还用于:
    根据所述组合特征集的组合后的缺失数据进行的数据分层,训练并得到lightgbm模型;
    将待检样本的组合特征集输入所述lightgbm模型进行判定,得到所述待判定样本的组合特征集的异常概率,得到对应网络访问请求的有效性.
  9. 如权利要求8所述的异常判定装置,其中,所述的有效性获取模块还用于:
    通过GridSearchCV网络搜索对所述lightgbm模型的参数num_leaves、min_data_in_leaf、max_depth进行自动调参,对所述lightgbm模型进行调节优化。
  10. 如权利要求8所述的异常判定装置,其中,所述的判定模块还用于:
    当所述有效性所依据的异常概率大于预设的阈值,判定所述网络访问为异常访问。
  11. 一种服务器,其中,包括:一个或多个处理器,存储器,一个或多个计算机可读指令,其中所述一个或多个计算机可读指令被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个计算机可读指令配置用于执行以下步骤:
    预设时间间隔采集终端设备根据网络访问请求所产生的各个相关的特征,根据所述特征形成关于所述终端设备的组合特征集,所述特征包括设备类型的相关特征值和系统信息的相关特征值,所述组合特征集与所述特征值互为非线性关系;
    将所述组合特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项;
    根据各个缺失项组成缺失数据,得到对应的网络访问请求的有效性;
    利用所述有效性,对所述网络访问请求进行异常访问的判定;
    其中,所述特征列表包括所述终端设备发起网络访问请求所产生的必要特征;
    所述将所述组合特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项,根据各个缺失项组成缺失数据,得到对应的网络访问请求的有效性,包括:
    将所述特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项的类型和数量;
    根据所述缺失项的类型和数量组成的缺失数据,对所述组合特征集的缺失数据组合进行数据分层;
    利用所述数据分层,得到对应的网络访问请求的有效性。
  12. 如权利要求11所述的服务器,其中,所述根据所述缺失项的类型和数量组成的缺失数据,对所述组合特征集的缺失数据组合进行数据分层,包括:
    对所述缺失项的类型和数量组成的缺失数据进行组合;
    根据所述组合后的缺失数据,对所述组合特征集的进行数据分层。
  13. 如权利要求11所述的服务器,其中,所述利用所述数据分层,得到对应的网络访问请求的有效性的步骤,包括:
    根据所述组合特征集的组合后的缺失数据进行的数据分层,训练并得到lightgbm模型;
    将待检样本的组合特征集输入所述lightgbm模型进行判定,得到所述待判定样本的组合特征集的异常概率,得到对应网络访问请求的有效性。
  14. 如权利要求13所述的服务器,其中,所述根据所述组合特征集的组合后的缺失数据进行的数据分层,训练并得到lightgbm模型的步骤之后,还包括:
    通过GridSearchCV网络搜索对所述lightgbm模型的参数num_leaves、min_data_in_leaf、max_depth进行自动调参,对所述lightgbm模型进行调节优化。
  15. 如权利要求13所述的服务器,其中,所述利用所述有效性,对所述网络访问请求进行异常访问的判定的步骤,包括:
    当所述有效性所依据的异常概率大于预设的阈值,判定所述网络访问为异常访问。
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机可读指令,该计算机可读指令被处理器执行时实现以下步骤:
    预设时间间隔采集终端设备根据网络访问请求所产生的各个相关的特征,根据所述特征形成关于所述终端设备的组合特征集,所述特征包括设备类型的相关特征值和系统信息的相关特征值,所述组合特征集与所述特征值互为非线性关系;
    将所述组合特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项;
    根据各个缺失项组成缺失数据,得到对应的网络访问请求的有效性;
    利用所述有效性,对所述网络访问请求进行异常访问的判定;
    其中,所述特征列表包括所述终端设备发起网络访问请求所产生的必要特征;
    所述将所述组合特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项,根据各个缺失项组成缺失数据,得到对应的网络访问请求的有效性,包括:
    将所述特征集的特征与设定的特征列表进行对比,得到对应的组合特征集的缺失项的类型和数量;
    根据所述缺失项的类型和数量组成的缺失数据,对所述组合特征集的缺失数据组合进行数据分层;
    利用所述数据分层,得到对应的网络访问请求的有效性。
  17. 如权利要求16所述的计算机可读存储介质,其中,所述根据所述缺失项的类型和数量组成的缺失数据,对所述组合特征集的缺失数据组合进行数据分层,包括:
    对所述缺失项的类型和数量组成的缺失数据进行组合;
    根据所述组合后的缺失数据,对所述组合特征集的进行数据分层。
  18. 如权利要求16所述的计算机可读存储介质,其中,所述利用所述数据分层,得到对应的网络访问请求的有效性的步骤,包括:
    根据所述组合特征集的组合后的缺失数据进行的数据分层,训练并得到lightgbm模型;
    将待检样本的组合特征集输入所述lightgbm模型进行判定,得到所述待判定样本的组合特征集的异常概率,得到对应网络访问请求的有效性。
  19. 如权利要求18所述的计算机可读存储介质,其中,所述根据所述组合特征集的组合后的缺失数据进行的数据分层,训练并得到lightgbm模型的步骤之后,还包括:
    通过GridSearchCV网络搜索对所述lightgbm模型的参数num_leaves、min_data_in_leaf、max_depth进行自动调参,对所述lightgbm模型进行调节优化。
  20. 如权利要求18所述的计算机可读存储介质,其中,所述利用所述有效性,对所述网络访问请求进行异常访问的判定的步骤,包括:
    当所述有效性所依据的异常概率大于预设的阈值,判定所述网络访问为异常访问。
PCT/CN2019/118551 2019-06-28 2019-11-14 网络访问的异常判定方法、装置、服务器及其存储介质 WO2020258673A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910578452.0A CN110401639B (zh) 2019-06-28 2019-06-28 网络访问的异常判定方法、装置、服务器及其存储介质
CN201910578452.0 2019-06-28

Publications (1)

Publication Number Publication Date
WO2020258673A1 true WO2020258673A1 (zh) 2020-12-30

Family

ID=68323571

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118551 WO2020258673A1 (zh) 2019-06-28 2019-11-14 网络访问的异常判定方法、装置、服务器及其存储介质

Country Status (2)

Country Link
CN (1) CN110401639B (zh)
WO (1) WO2020258673A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110401639B (zh) * 2019-06-28 2021-12-24 平安科技(深圳)有限公司 网络访问的异常判定方法、装置、服务器及其存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110111863A (ko) * 2010-04-06 2011-10-12 국방과학연구소 웹 로봇 탐지 시스템 및 방법
US20130104230A1 (en) * 2011-10-21 2013-04-25 Mcafee, Inc. System and Method for Detection of Denial of Service Attacks
CN104391979A (zh) * 2014-12-05 2015-03-04 北京国双科技有限公司 网络恶意爬虫识别方法及装置
CN108985048A (zh) * 2017-05-31 2018-12-11 腾讯科技(深圳)有限公司 模拟器识别方法及相关装置
CN109766104A (zh) * 2018-12-07 2019-05-17 北京数字联盟网络科技有限公司 应用程序的下载系统、安装类型的确定方法以及存储介质
CN109886290A (zh) * 2019-01-08 2019-06-14 平安科技(深圳)有限公司 用户请求的检测方法、装置、计算机设备及存储介质
CN110401639A (zh) * 2019-06-28 2019-11-01 平安科技(深圳)有限公司 网络访问的异常判定方法、装置、服务器及其存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7868898B2 (en) * 2005-08-23 2011-01-11 Seiko Epson Corporation Methods and apparatus for efficiently accessing reduced color-resolution image data
US9727723B1 (en) * 2014-06-18 2017-08-08 EMC IP Holding Co. LLC Recommendation system based approach in reducing false positives in anomaly detection
CN108156166A (zh) * 2017-12-29 2018-06-12 百度在线网络技术(北京)有限公司 异常访问识别和接入控制方法及装置
CN108259482B (zh) * 2018-01-04 2019-05-28 平安科技(深圳)有限公司 网络异常数据检测方法、装置、计算机设备及存储介质
CN108763274B (zh) * 2018-04-09 2021-06-11 北京三快在线科技有限公司 访问请求的识别方法、装置、电子设备及存储介质
CN109150875A (zh) * 2018-08-20 2019-01-04 广东优世联合控股集团股份有限公司 反爬虫方法、装置、电子设备及计算机可读存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110111863A (ko) * 2010-04-06 2011-10-12 국방과학연구소 웹 로봇 탐지 시스템 및 방법
US20130104230A1 (en) * 2011-10-21 2013-04-25 Mcafee, Inc. System and Method for Detection of Denial of Service Attacks
CN104391979A (zh) * 2014-12-05 2015-03-04 北京国双科技有限公司 网络恶意爬虫识别方法及装置
CN108985048A (zh) * 2017-05-31 2018-12-11 腾讯科技(深圳)有限公司 模拟器识别方法及相关装置
CN109766104A (zh) * 2018-12-07 2019-05-17 北京数字联盟网络科技有限公司 应用程序的下载系统、安装类型的确定方法以及存储介质
CN109886290A (zh) * 2019-01-08 2019-06-14 平安科技(深圳)有限公司 用户请求的检测方法、装置、计算机设备及存储介质
CN110401639A (zh) * 2019-06-28 2019-11-01 平安科技(深圳)有限公司 网络访问的异常判定方法、装置、服务器及其存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG, JIAN ET AL.: "Non-official translation: Identification of User's Role and Discovery Method of Its Malicious Access Behavior in Web Logs", COMPUTER SCIENCE, vol. 45, no. 10, 31 October 2018 (2018-10-31), DOI: 20200301180547A *

Also Published As

Publication number Publication date
CN110401639A (zh) 2019-11-01
CN110401639B (zh) 2021-12-24

Similar Documents

Publication Publication Date Title
US20210352090A1 (en) Network security monitoring method, network security monitoring device, and system
CN101751535B (zh) 通过应用程序数据访问分类进行的数据损失保护
WO2020258657A1 (zh) 异常检测方法、装置、计算机设备及存储介质
WO2020233077A1 (zh) 系统服务的监控方法、装置、设备及存储介质
WO2020258672A1 (zh) 网络访问的异常检测方法和装置
WO2020143322A1 (zh) 用户请求的检测方法、装置、计算机设备及存储介质
WO2019037197A1 (zh) 主题分类器的训练方法、装置及计算机可读存储介质
WO2017213400A1 (en) Malware detection by exploiting malware re-composition variations
WO2020073494A1 (zh) 网页后门检测方法、设备、存储介质及装置
CN111123388B (zh) 房间摄像装置的检测方法、装置及检测设备
CN106789413B (zh) 一种检测代理上网的方法和装置
WO2015102446A1 (ko) 왕복 시간 변화를 이용하여 익명 네트워크를 통한 우회 접속을 탐지하는 방법
WO2020155773A1 (zh) 文本输入异常监控方法、装置、计算机设备及存储介质
WO2009154353A2 (ko) 객체속성 접근엔진에 의한 실시간 유해사이트 차단방법
WO2020062644A1 (zh) Json劫持漏洞的检测方法、装置、设备及存储介质
WO2020233060A1 (zh) 事件通知方法、事件通知服务器、存储介质及装置
CN104834588A (zh) 检测常驻式跨站脚本漏洞的方法和装置
WO2016064024A1 (ko) 이상 접속 검출 장치 및 방법
WO2020258673A1 (zh) 网络访问的异常判定方法、装置、服务器及其存储介质
WO2020186780A1 (zh) 用户操作录制还原方法、装置、设备及可读存储介质
WO2020085558A1 (ko) 고속분석 영상처리장치 및 그 장치의 구동방법
CN112583820B (zh) 一种基于攻击拓扑的电力攻击测试系统
US11394687B2 (en) Fully qualified domain name (FQDN) determination
CN113111005A (zh) 应用程序测试方法和装置
WO2016186326A1 (ko) 검색어 리스트 제공 장치 및 이를 이용한 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19935670

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 18/02/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19935670

Country of ref document: EP

Kind code of ref document: A1