WO2017143934A1 - 网络访问行为识别方法和装置、服务器和存储介质 - Google Patents

网络访问行为识别方法和装置、服务器和存储介质 Download PDF

Info

Publication number
WO2017143934A1
WO2017143934A1 PCT/CN2017/073615 CN2017073615W WO2017143934A1 WO 2017143934 A1 WO2017143934 A1 WO 2017143934A1 CN 2017073615 W CN2017073615 W CN 2017073615W WO 2017143934 A1 WO2017143934 A1 WO 2017143934A1
Authority
WO
WIPO (PCT)
Prior art keywords
behavior
user
entropy
category
preset
Prior art date
Application number
PCT/CN2017/073615
Other languages
English (en)
French (fr)
Inventor
沈雄
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Priority to EP17755763.4A priority Critical patent/EP3370169A4/en
Priority to JP2018513718A priority patent/JP6422617B2/ja
Priority to KR1020187015207A priority patent/KR20180118597A/ko
Priority to SG11201708944VA priority patent/SG11201708944VA/en
Priority to US15/578,695 priority patent/US20180359268A1/en
Priority to AU2017221945A priority patent/AU2017221945B2/en
Publication of WO2017143934A1 publication Critical patent/WO2017143934A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • the present application relates to the field of computer network technologies, and in particular, to a network access behavior identification method and apparatus, a server, and a storage medium.
  • a network access behavior identification method and apparatus a server, and a storage medium are provided.
  • a network access behavior identification method includes:
  • a network access behavior identification device includes:
  • a network access information obtaining module configured to acquire network access information of the user within a preset time period
  • a behavior data obtaining module configured to extract behavior data of the user in each preset behavior category according to the network access information
  • a behavior entropy calculation module configured to calculate a behavior entropy of the user according to behavior data of the user in each preset behavior category, where the behavior entropy is a degree of dispersion that represents a user's network access behavior;
  • an access category determining module configured to determine, according to the behavior entropy, an access category to which the user's network access behavior belongs.
  • a server includes a memory and a processor, the memory storing instructions that, when executed by the processor, cause the processor to perform the following steps:
  • One or more non-volatile readable storage media storing computer-executable instructions, when executed by one or more processors, cause the one or more processors to perform the following steps:
  • the behavioral entropy is a degree of dispersion that characterizes the user's network access behavior
  • FIG. 1 is an application environment diagram of a network access behavior identification method in an embodiment
  • Figure 2 is a block diagram of a server in one embodiment
  • FIG. 3 is a flow chart of a method for identifying a network access behavior in an embodiment
  • FIG. 4 is a flow chart showing the steps of calculating a user's behavior entropy according to behavior data of a user in each preset behavior category in one embodiment
  • Figure 5 is a block diagram of a network access behavior identifying apparatus in an embodiment
  • FIG. 6 is a schematic structural diagram of a behavior category data acquiring module in an embodiment.
  • FIG. 1 is an application scenario diagram of a network access behavior identification method in an embodiment.
  • terminal 110 communicates with server 120 over a network.
  • the terminal 110 sends a network access request to the server 120, and the server 120 acquires network access information of the corresponding user within a preset time period according to the network access request. Or the server 120 can also actively obtain the user from the database on a regular basis. Network access information during the time period.
  • the server 120 extracts behavior data of the user in each preset behavior category according to the network access information; and calculates the behavior entropy of the user according to the behavior data of the user in each preset behavior category, and the behavior entropy is to represent the network access behavior of the user.
  • the degree of dispersion; the access category to which the user's network access behavior belongs is determined based on the behavioral entropy.
  • the terminal 110 includes, but is not limited to, various personal computers, smart phones, tablet computers, notebook computers, portable wearable devices, etc., which are not enumerated here.
  • FIG. 2 shows a block diagram of a server 120 in one embodiment that includes a processor coupled through a system bus, a non-volatile storage medium, an internal memory, and a network interface.
  • This processor is used to provide computing and control capabilities to support the operation of the entire server.
  • the server's non-volatile storage medium stores operating system, database, and computer executable instructions.
  • the database is used to store related data involved in the process of implementing the network access behavior identification method, such as storing historical access data of related users to related web pages.
  • the computer executable instructions are executable by the processor to implement a network access behavior identification method suitable for the server as shown in FIG.
  • the internal memory in the server provides a cached operating environment for operating systems, databases, and computer executable instructions in a non-volatile storage medium.
  • the network interface is used for network communication with the terminal. It can be understood that the server can be implemented by a separate server or a server cluster composed of multiple servers.
  • FIG. 2 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the server to which the solution of the present application is applied.
  • the specific server may include a ratio. More or fewer components are shown in the figures, or some components are combined, or have different component arrangements.
  • a network access behavior identification method is provided, which can be applied to a determination scenario in which a user's network access behavior is a malicious behavior, especially when the user is accessing the power. Whether the network access of the merchant or shopping class is in the scene of malicious access. Among them, network access can be network access through a common browser application, or network access using other applications. Web browsing for applications such as social, e-commerce or shopping.
  • the network access behavior identification method is applied to the server as described in FIG. 1 or 2, and the method specifically includes the following steps S302-S308.
  • Step S302 Obtain network access information of the user within a preset time period.
  • the network access information of the user is information that the user recorded by the server is performing network access or historical access.
  • the user can access the network through one or more different terminals, which may be, but are not limited to, a personal computer, a notebook computer, a tablet computer, a smart phone, a wearable smart device, and the like.
  • the server can detect the user's network access information in real time and store the network access information.
  • the server may classify and record network access information of each user according to the user's username.
  • the network access information may include, but is not limited to, basic information of the user, such as the age and contact of the user. It can also include the user's login time, login name, search information, browsing information, purchase information, and the like.
  • the above-mentioned search information, browsing information, and purchase information may be information for browsing, searching, and purchasing by a user on an e-commerce website or a shopping-type website.
  • the preset time period may be the user's most recent one month, two months, or two weeks, and the like.
  • the server may set a detection period, which is a preset time period. The server periodically obtains network access information of the user in the current cycle according to the detection cycle. Alternatively, the server may start to obtain network access information of the user within a preset time period according to the detected browsing behavior of the user.
  • Step S304 extracting behavior data of the user in each preset behavior category according to the network access information.
  • the server presets a behavior category that needs to be detected and counted.
  • the preset behavior category may include, but is not limited to, one or more of a user's login behavior, purchase behavior, browsing behavior, and search behavior.
  • the behavior data includes but is not limited to one or more of data such as the number of logins of the user, the number of purchases, and the number of times of browsing.
  • the network access information of the user stored by the server is comprehensive information of the user's network access. Therefore, after obtaining the network access information, the network access information may be parsed to extract behavior data of the user in each preset behavior category.
  • the step of extracting behavior data of the user in each preset behavior category according to the network access information comprises: pre-processing the network access information according to the pre-processed network The access information acquires behavior data of the user in each preset behavior category, so that the acquired behavior data of the same category has the same format.
  • the network access information in order to extract behavior data of each category, the network access information may be preprocessed.
  • the preprocessing of network access information includes variable collection of network access information, minimax rule processing, missing value processing, and format processing.
  • the variable collection is to collect the access time, login time, browsing information, search information, and purchase information of the user each time the network access is accessed from the network access information, such as the access time, login time, and browsing when visiting a specific e-commerce website.
  • Information, search information, and purchase information When the server collects information such as the access time, login time, browsing information, search information, and purchase information of the user, the related accumulator or calculator can be called to count the number of logins of the user within the preset time period. , purchases, views, and searches.
  • the extremely small rule processing includes the processing of the numerical value contained in the collected network access information to reduce the interference of the abnormal data on the user's behavior classification judgment.
  • the age of the user in the collected network access information may be subject to extremely minimal rule processing. For example, for data that is age -1, 0, or 999 years old, etc., which is obviously not in accordance with the age of the normal user, it is subject to extremely small rule processing.
  • the missing value processing means that when the behavior data in the preset behavior category included in the collected network access information does not exist, the missing value processing can be performed. If it is marked as "0", or other information is used instead. For example, when a user directly accesses a related shopping website by anonymous access or without logging in to the user name, the login information of the user recorded by the server is missing.
  • the server may perform missing value processing on the type of information, such as obtaining a unique identifier of the user's access terminal, and associating the unique identifier with the login name of the user.
  • the format processing includes processing the format of the time information contained in the network access information so that the format remains the same. For example, for the recorded time information such as the user's login time, such as the recorded time information including 20091011 and 2009-10-11 and October 11, 2009, etc., all of them can be converted into a unified format, such as 20091011. .
  • Step S306 calculating a row of the user according to the behavior data of the user in each preset behavior category. For entropy.
  • entropy is a description of the disordered state of the physical system and is a measure of the degree of disorder. Behavioral entropy reflects the uncertainty and disorder of human behavior, and characterizes the degree of dispersion of users' network access behavior. In general, the more the user's behavior tends to be regular, and the smaller the behavior entropy, the more likely the behavior is that the machine is executing.
  • the server may calculate the behavior entropy of the user according to the behavior data in each preset behavior category.
  • the probability of occurrence of each behavior data in all preset behavior categories may be separately calculated to obtain a first type of probability; and then the logarithm of each class's class probability is performed to obtain a second class. The probability is then calculated from the first type of probability and the second type obtained according to a preset four-order operation, and the calculation result is taken as behavior entropy.
  • the steps of calculating the user's behavior entropy according to the behavior data of the user in each preset behavior category include:
  • Step S402 calculating the statistical number and total number of times of each preset behavior category according to the behavior data of the user in each preset behavior category.
  • the number of statistics includes, but is not limited to, one or more of the number of logins, the number of purchases, the number of views, and the number of searches.
  • the total number of times is the sum of the statistics for each preset behavior category.
  • the statistics of the corresponding behavior category are the number of logins, the number of purchases, the number of views and the number of searches.
  • the total number of times is the sum of logins, purchases, views, and searches.
  • the server calculates the number of logins, search times, browsing times, and purchase times of User A and User B in a preset time period, and the total number of the above four categories.
  • Step S404 Calculate the class probability of each preset behavior category according to the statistics number and the total number of times of each preset behavior category.
  • the class probability of the corresponding four categories namely, the registration probability, the search probability, the browsing probability, and the purchase probability, can be respectively calculated according to the statistical number and total number of the above four behavior categories.
  • Step S406 Calculate the category entropy of each preset behavior category according to the class probability of each preset behavior category.
  • the category entropy is a degree of dispersion that characterizes the network access behavior of the user in the corresponding category. In one embodiment, it refers to login entropy, search entropy, browsing entropy, and purchase entropy corresponding to login behavior, purchase behavior, browsing behavior, and search behavior.
  • the category entropy may be expressed by the following formula:
  • P i represents the class entropy of the i-th class behavior
  • a is a coefficient that is not
  • b represents a coefficient greater than
  • C represents the number of categories of the default behavior category
  • p i represents the class behavior of the i-th class relative to the total The probability of the number of times.
  • the login behavior, the purchase behavior, the browsing behavior, and the search behavior may be corresponding to the first to fourth types of behaviors, respectively.
  • P 1 to P 4 are respectively represented as login entropy, purchase entropy, browsing entropy, and search entropy.
  • the coefficient a can be -1 and the coefficient b is 2. That is, the formula for calculating the entropy of this category can be:
  • Step S408 calculating behavior entropy according to the category entropy of each preset behavior category.
  • the server may assign corresponding weights to each category entropy in advance. Further, the class entropy of the most influential behavior category may be assigned a relatively large or small weight. After calculating the value of each class entropy, multiplying it by the corresponding weight, and accumulating the product of all class entropies and corresponding weights, the weighted product sum is the behavior entropy.
  • the behavioral entropy is the sum of the entropies of each category, ie the sum of the category entropies for each of the preset behavior categories.
  • the behavioral entropies of User A and User B can be calculated as 1.600727 and 0.470349, respectively.
  • Step S308 determining, according to the behavior entropy, an access category to which the user's network access behavior belongs.
  • the server may preset different access categories. For example, it can be set to three categories, and the corresponding behaviors are machine access behavior, suspicious access behavior, and normal access behavior, and the corresponding users are respectively set to machine users, suspicious users, and normal users.
  • the server may correspondingly set a range of behavior entropy corresponding to the user of each category according to the determined calculation formula of the behavior entropy. After calculating the behavior entropy, the range to which the behavioral entropy belongs is found, and then the category of the user is determined.
  • the calculation of the behavior entropy corresponding to Table 1 is also used as an example to illustrate the setting of the machine access by the server.
  • the range of behavior entropy corresponding to setting the machine access behavior, the suspicious access behavior, and the normal access behavior is 0 ⁇ x ⁇ 0.5, 0.5 ⁇ x ⁇ 1, and x ⁇ 1, where x represents the value of the user's behavior entropy.
  • Table 1 when user A's behavior entropy is 1.600727, it can be judged that its membership range is x ⁇ 1, then it can be judged as a normal user; user B's behavior entropy is 0.470349, and its membership range is 0.5 ⁇ x ⁇ 1 , then judge it as a machine user.
  • the behavior data of the user in each preset behavior category is extracted according to the network access information of the user within a preset time period, and then the user is calculated according to the behavior data.
  • Behavioral entropy this behavioral entropy can be used to determine the access category to which the user's network access behavior belongs. The above method can quickly and accurately identify whether the user is stealing data or the behavior of normal browsing, and then corresponding measures can be made according to the behavior judgment result.
  • the method includes: when determining that the user's access category is a machine access behavior, freezing the user's account; when the user's access category is suspicious When accessing the behavior, get the user's contact information and send a suspicious warning to the contact.
  • the server when it is determined that the user is a machine user, it indicates that the user behavior is maliciously stealing data, such as stealing the product information on the e-commerce website, and therefore, the use of the account can be frozen.
  • the server can obtain the contact information of the user, such as the user's mailbox, and send a warning email to the user.
  • the network access information includes a login time of the user within a preset time period.
  • the step of calculating the behavior entropy of the user according to the behavior data of the user in each preset behavior category includes: calculating a dispersion degree of the user login time; determining a first behavior entropy weight of the user according to the dispersion; and entropy according to the first behavior Weight and behavioral entropy determine the user's new behavioral entropy.
  • the login time indicates the time when the user starts to log in to the account for browsing, and the login time does not consider the login date.
  • the dispersion of the login time can be characterized by "variance.”
  • the server may calculate the average login time of the user according to the login time obtained from the network access information in the preset time period, and then calculate the variance of the login time of the user in the preset time period. According to the boarding The number of recordings and the variance correspond to determine the user's behavioral entropy.
  • the server may set a first behavioral entropy weight corresponding to different variance or variance ranges under different login times.
  • the behavior entropy calculated above is multiplied by the determined first behavior entropy weight, and the product is used as a new behavior entropy, and the user's behavior category is determined by the new behavior entropy.
  • Table 2 and Table 3 record the login times of User A and User B in Table 1, respectively.
  • Table 4 shows the time dispersion range and the first behavior entropy of the server. The correspondence table of values. If the server calculates that the time dispersion of the user A is 1 and the time dispersion of the user B is 15, the first behavior entropy weights of the users A and B are respectively obtained according to the preset correspondence, and the weights are 1 and 0.9, respectively. Further, based on the weight, the new behavior entropy of users A and B is calculated to be 1.600727 and 0.4233141.
  • the behavior entropy of the user is determined by combining the login time of the user, and then the behavior classification of the user can be determined according to the behavior entropy, thereby further improving the accuracy of identifying the user behavior.
  • the network access information includes the age of the user.
  • the step of calculating the behavior entropy of the user according to the behavior data of the user in each preset behavior category further includes: determining whether the age of the user belongs to a sensitive age, and if yes, acquiring a second behavior entropy weight corresponding to the sensitive age; The second behavior entropy weight and behavior entropy determine the user's new behavior entropy.
  • the server may set a plurality of ages to be sensitive ages, and the sensitive age is used to indicate It is very likely that the age automatically generated by the machine, such as the sensitive ages set to -1, 0, 1, and 999 years old.
  • the second behavior entropy weight corresponding to the sensitive age set by the server may be obtained.
  • the product is taken as the user's new behavior entropy.
  • the new behavior entropy may also be determined in conjunction with the age and login time of the user. In one embodiment, when the age of the user is detected as a sensitive age, the preliminary calculated behavior entropy may be multiplied by A behavioral entropy weight and a second behavioral entropy weight are used as new behavioral entropy.
  • the behavior entropy of the user is determined by combining the age of the user, and then the behavior classification of the user can be determined according to the behavior entropy, thereby further improving the accuracy of identifying the user behavior.
  • a network access behavior identifying apparatus is provided, which can be run in a server as shown in FIG. 1 or FIG. 2, including:
  • the network access information obtaining module 502 is configured to acquire network access information of the user within a preset time period.
  • the behavior data obtaining module 504 is configured to extract behavior data of the user in each preset behavior category according to the network access information.
  • the behavior entropy calculation module 506 is configured to calculate a behavior entropy of the user according to the behavior data of the user in each preset behavior category, and the behavior entropy is a degree of dispersion that represents the network access behavior of the user.
  • the access category determining module 508 is configured to determine, according to the behavior entropy, an access category to which the user's network access behavior belongs.
  • the behavior data obtaining module 504 is further configured to: preprocess the network access information; and obtain behavior data of the user in each preset behavior category according to the preprocessed network access information, so that the same category is obtained.
  • the behavior data has the same format.
  • the behavior entropy calculation module 506 includes:
  • the number calculation unit 602 is configured to calculate a statistical number and a total number of times of each preset behavior category according to the behavior data of the user in each preset behavior category, where the total number of times is the sum of the statistics times of each preset behavior category. .
  • the class probability calculation unit 604 is configured to calculate a class probability of each preset behavior class according to a statistical number of times and a total number of times of each preset behavior category.
  • the category entropy calculation unit 606 is configured to calculate a category entropy of each preset behavior category according to a class probability of each preset behavior category, where the category entropy is a degree of dispersion that represents a network access behavior of the user in the corresponding category.
  • the behavior entropy calculation unit 608 calculates the behavior entropy according to the category entropy of each preset behavior category.
  • the formula for class entropy is: Where P i represents the class entropy of the i-th class behavior, a is a coefficient that is not 0, b represents a coefficient greater than 0, C represents the number of categories of the default behavior category, and p i represents the class behavior of the i-th class relative to the total The probability of the number of times; the behavioral entropy is the sum of the category entropies of each of the preset behavior categories.
  • the network access information includes a login time of the user within a preset time period.
  • the behavior entropy calculation module 506 is further configured to calculate a dispersion degree of the user login time; determine a first behavior entropy weight of the user according to the dispersion; and determine a new behavior entropy of the user according to the first behavior entropy weight and the behavior entropy.
  • the network access information includes the age of the user.
  • the behavior entropy calculation module 506 is further configured to determine whether the age of the user belongs to a sensitive age, and if yes, obtain a second behavior entropy weight corresponding to the sensitive age; and determine a new behavior entropy of the user according to the second behavior entropy weight and behavior entropy.
  • the network interface may be an Ethernet card or a wireless network card.
  • the above modules may be embedded in the hardware in the processor or in the memory in the server, or may be stored in the memory in the server, so that the processor calls the corresponding operations of the above modules.
  • the processor can be a central processing unit (CPU), a microprocessor, a microcontroller, or the like.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computational Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Operations Research (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Quality & Reliability (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种网络访问行为识别方法,包括:获取用户在预设时间段内的网络访问信息;根据所述网络访问信息提取所述用户在每个预设行为类别中的行为数据;根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为熵,所述行为熵为表征所述用户的网络访问行为的离散程度;根据所述行为熵确定所述用户的网络访问行为所属的访问类别。

Description

网络访问行为识别方法和装置、服务器和存储介质
本申请要求于2016年02月24日提交中国专利局,申请号为2016101003580,发明名称为“网络访问行为识别方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机网络技术领域,特别是涉及一种网络访问行为识别方法和装置、服务器和存储介质。
背景技术
随着计算机技术的发展和对信息重要性的认识的提高,一些不法利用开发出的网络爬虫技术来盗取一些网站或平台上的数据。特别是在电商领域,通过这些网络爬虫技术可快速抓取物品的价格以及买家评论等信息,对被盗取数据的企业造成重大损失。
传统的方法中,均是采用人工审核来识别用户的网络访问行为是恶意盗取数据的行为还是正常浏览的行为。由于人会出现疲劳等情况,这种传统的人工识别的方法准确率低。
发明内容
根据本申请公开的各种实施例,提供一种网络访问行为识别方法和装置、服务器和存储介质。
一种网络访问行为识别方法,包括:
获取用户在预设时间段内的网络访问信息;
根据所述网络访问信息提取所述用户在每个预设行为类别中的行为数据;
根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为熵,所述行为熵为表征所述用户的网络访问行为的离散程度;及
根据所述行为熵确定所述用户的网络访问行为所属的访问类别。
一种网络访问行为识别装置,包括:
网络访问信息获取模块,用于获取用户在预设时间段内的网络访问信息;
行为数据获取模块,用于根据所述网络访问信息提取所述用户在每个预设行为类别中的行为数据;
行为熵计算模块,用于根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为熵,所述行为熵为表征用户的网络访问行为的离散程度;及
访问类别确定模块,用于根据所述行为熵确定所述用户的网络访问行为所属的访问类别。
一种服务器,包括存储器和处理器,所述存储器中存储有指令,所述指令被所述处理器执行时,使得所述处理器执行以下步骤:
获取用户在预设时间段内的网络访问信息;
根据所述网络访问信息提取所述用户在每个预设行为类别中的行为数据;
根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为熵,所述行为熵为表征所述用户的网络访问行为的离散程度;及
根据所述行为熵确定所述用户的网络访问行为所属的访问类别。
一个或多个存储有计算机可执行指令的非易失性可读存储介质,所述计算机可执行指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
获取用户在预设时间段内的网络访问信息;
根据所述网络访问信息提取所述用户在每个预设行为类别中的行为数据;
根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为 熵,所述行为熵为表征所述用户的网络访问行为的离散程度;及
根据所述行为熵确定所述用户的网络访问行为所属的访问类别.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中网络访问行为识别方法的应用环境图;
图2为一个实施例中服务器的框图;
图3为一个实施例中网络访问行为识别方法的流程图;
图4为一个实施例中根据用户在每个预设行为类别中的行为数据计算出用户的行为熵的步骤的流程图;
图5为一个实施例中网络访问行为识别装置的框图;
图6为一个实施例中行为类别数据获取模块的结构示意图。
具体实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
图1为一个实施例中网络访问行为识别方法的应用场景图。如图1所示,终端110通过网络与服务器120进行通信。终端110向服务器120发送网络访问请求,服务器120根据该网络访问请求获取对应的用户在预设时间段内的网络访问信息。或者服务器120还可定期从数据库中主动获取用户在预设 时间段内的网络访问信息。服务器120根据网络访问信息提取用户在每个预设行为类别中的行为数据;再根据用户在每个预设行为类别中的行为数据计算出用户的行为熵,行为熵为表征用户的网络访问行为的离散程度;根据行为熵确定用户的网络访问行为所属的访问类别。
可以理解,终端110包括但不限于各种个人计算机、智能手机、平板电脑、笔记本电脑、便携式穿戴设备等,在此不一一列举。
图2示出了一个实施例中的服务器120的框图,该服务器包括通过系统总线连接的处理器、非易失性存储介质、内存储器和网络接口。该处理器用于提供计算和控制能力,支撑整个服务器的运行。该服务器的非易失性存储介质存储有操作系统、数据库和计算机可执行指令。数据库用于存储实现网络访问行为识别方法过程中所涉及的相关数据,比如可存储各个用户的对相关网页的历史访问数据。该计算机可执行指令可被处理器所执行以实现如图3所示的适用于服务器的一种网络访问行为识别方法。服务器中的内存储器为非易失性存储介质中的操作系统、数据库和计算机可执行指令提供高速缓存的运行环境。网络接口用于与终端进行网络通信。可以理解,服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
本领域技术人员可以理解,图2中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的服务器的限定,具体的服务器可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,如图3所示,提供了一种网络访问行为识别方法,该方法可应用于对用户的网络访问行为是否是恶意行为的判断场景中,特别是应用于用户在访问电商或购物类的网络访问是否是恶意访问的场景中。其中,网络访问可为通过常用的浏览器应用进行网络访问,也可为使用其他的应用进行的网络访问。比如社交类、电商类或购物类等等的应用进行的网页浏览。本实施例以该网络访问行为识别方法应用于如图1或2所述的服务器中进行说明,该方法具体包括下述步骤S302~S308。
步骤S302,获取用户在预设时间段内的网络访问信息。
本实施例中,用户的网络访问信息为服务器所记录的用户正在进行网络访问或历史访问的信息。用户可通过一个或多个不同的终端来进行网络访问,终端可以是但不限于是个人计算机、笔记本电脑、平板电脑、智能手机、可穿戴式智能设备等。服务器可实时检测用户的网络访问信息,并存储该网络访问信息。在一个实施例中,服务器可根据用户的用户名来分类记录各个用户的网络访问信息。网络访问信息可包括但不限于用户的基本信息,如用户的年龄和联系方式等。还可包括用户的登录时间、登录名、搜索信息、浏览信息以及购买信息等等。在一个实施例中,上述的搜索信息、浏览信息以及购买信息可为用户在访问电商类网站或购物类网站上进行的浏览、搜索以及购买的信息。
本实施例中,预设时间段可为用户最近的1个月、2个月或2周等等。在一个实施例中,服务器可设置一检测周期,该检测周期即为预设时间段。服务器按照检测周期定期获取用户在当前周期下的网络访问信息。又或者,服务器可根据检测到的用户的浏览行为后,则开始获取用户在预设时间段内的网络访问信息。
步骤S304,根据网络访问信息提取用户在每个预设行为类别中的行为数据。
本实施例中,服务器预先设置了需要检测和统计的行为类别。预设行为类别可包括但不限于用户的登录行为、购买行为、浏览行为和搜索行为等类别中的一种或多种。对应的,行为数据包括但不限于用户的登录次数、购买次数、浏览次数个搜索次数等数据中的一种或多种。通常,服务器所存储的用户的网络访问信息是用户进行网络访问的综合信息。因此,在获取该网络访问信息后,可对该网络访问信息进行解析,以提取出用户在每个预设行为类别中的行为数据。
在一个实施例中,根据网络访问信息提取用户在每个预设行为类别中的行为数据的步骤,包括:对网络访问信息进行预处理,根据预处理后的网络 访问信息获取用户在每个预设行为类别中的行为数据,使获取的同一类别的行为数据具有相同的格式。
本实施例中,为提取出每个类别的行为数据,可对该网络访问信息进行预处理。对网络访问信息的预处理包括对网络访问信息进行变量采集、极大极小规则处理、缺失值处理和格式处理等。
变量采集为从网络访问信息中采集出用户每次网络访问的访问时间、登录时间、浏览信息、搜索信息以及购买信息等等,比如访问一个具体的电商网站时的访问时间、登录时间、浏览信息、搜索信息以及购买信息。服务器在采集出用户每次访问的访问时间、登录时间、浏览信息、搜索信息以及购买信息等信息时,可调用相关的累加器或计算器等对应统计出用户在预设时间段内的登录次数、购买次数、浏览次数和搜索次数。
极大极小的规则处理包括对所采集的网络访问信息所包含的数值大小的处理,以降低异常数据对用户的行为分类判断的干扰。在一个实施例中,可所对所采集的网络访问信息中的用户的年龄进行极大极小的规则处理。比如,对于年龄为-1、0、或999岁等等,明显不符合正常用户年龄的数据,对其进行极大极小规则处理。
缺失值处理是指所采集网络访问信息中包含的预设行为类别中的行为数据不存在时,可对其进行缺失值处理。如将其标记为“0”,或采用其它信息代替等等。比如,用户采用匿名访问或不登录用户名而直接访问相关的购物网站时,服务器所记录的用户的登录信息则缺失。服务器可对该类信息进行缺失值处理,如可获取用户的访问终端的唯一标识,将该唯一标识作为和用户的登录名进行关联。
格式处理包括对网络访问信息中包含的时间信息的格式的处理,使其格式保持相同。比如,对于所记录的用户的登录时间等时间信息,比如所记录到的时间信息包括20091011和2009-10-11以及2009年10月11日等形式,可将其全部转换成统一格式,如20091011。
步骤S306,根据用户在每个预设行为类别中的行为数据计算出用户的行 为熵。
本实施例中,熵是物理系统无序状态的描述,是紊乱程度的测度。行为熵反映的人的行为的不确定性和无序性,表征用户的网络访问行为的离散程度。一般的,用户的行为越趋向于规律,其行为熵越小,则该行为越有可能是机器在执行。服务器在计算出每个预设行为类别的行为数据后,可根据每个预设行为类别中的行为数据计算出用户的行为熵。在一个实施例中,可分别计算出每个行为数据在所有预设行为类别中出现的概率,得出第一类概率;再对每个类别的类概率进行对数运算,得出第二类概率,再由所得出的第一类概率和第二类,按照预设的四则运算等方式进行计算,将该计算结果作为行为熵。
在一个实施中,如图4所示,根据用户在每个预设行为类别中的行为数据计算出用户的行为熵的步骤,包括:
步骤S402,根据用户在每个预设行为类别中的行为数据计算对应的每个预设行为类别的统计次数和总次数。
本实施例中,统计次数包括但不限于登录次数、购买次数、浏览次数和搜索次数中的一种或多种。总次数为每个预设行为类别的统计次数之和。
以预设行为类别为登录行为、购买行为、浏览行为和搜索行为4种为例,则相应的行为类别的统计次数分别为登录次数、购买次数、浏览次数和搜索次数。总次数则为登录次数、购买次数、浏览次数和搜索次数之和。比如,如表1所示,服务器分别计算出了用户A和用户B在预设时间段内的登录次数、搜索次数、浏览次数和购买次数以及上述4种类别的总次数。
步骤S404,根据每个预设行为类别的统计次数和总次数计算对应的每个预设行为类别的类概率。
继续如表1所示,根据上述的4种行为类别的统计次数和总次数,可分别计算出对应的4种类别的类概率,即登录概率、搜索概率、浏览概率和购买概率。
表1
  用户A 用户B
登录次数 4 3
搜索次数 9 212
浏览次数 21 1997
购买次数 3 0
总次数 37 2212
登录概率 0.108108 0.001356
搜索概率 0.243243 0.095841
浏览概率 0.567568 0.902803
购买概率 0.081081 0
登录熵 0.346968 0.01292
搜索熵 0.496101 0.32425
浏览熵 0.46378 0.133179
购买熵 0.293878 0
行为熵 1.600727 0.470349
步骤S406,根据每个预设行为类别的类概率计算对应的每个预设行为类别的类别熵。
本实施例中,类别熵为表征用户在对应类别的网络访问行为的离散程度。在一个实施例中,是指与登录行为、购买行为、浏览行为和搜索行为对应的登录熵、搜索熵、浏览熵和购买熵。
在一个实施例中,类别熵可采用下述的公式表示:
Figure PCTCN2017073615-appb-000001
其中Pi表示第i类行为的类别熵,a为任意不为0的系数,b表示大于0的系数,C表示预设行为类别的类别数,pi表示第i类的类别行为相对于总次数的概率。当pi为0时,可记对应的Pi也为0。
比如,可将登录行为、购买行为、浏览行为和搜索行为分别即为对应的第1~4类行为。则C=4,相应的,p1~p4分别表示登录概率、购买概率、浏览概率和搜索概率,P1~P4分别表示为登录熵、购买熵、浏览熵和搜索熵。
在一个优选实施例中,系数a可为-1,系数b为2。即该类别熵的计算公式可为:
Figure PCTCN2017073615-appb-000002
采用上述a=-1、b=2的公式来计算表1中的用户A和用户B的类别熵,则可计算出用户A和用户B的登录熵、搜索熵、浏览熵和购买熵的值分别为0.346968、0.496101、0.46378和0.293878以及0.01292、0.32425、0.133179和0。
步骤S408,根据每个预设行为类别的类别熵计算行为熵。
本实施例中,服务器可预先为每个类别熵分配对应不同的权值。进一步的,可对影响较大的行为类别的类别熵分配相对较大或较小的权值。在计算出每个类别熵的值后,将其与对应的权值相乘,并累加所有类别熵与对应权值的乘积,则加权乘积和即为行为熵。
在一个实施例中,当权值均为1时,则行为熵为每个类别熵的累加和,即为每个预设行为类别的类别熵之和。如表1所示,可按照上述的方式计算出用户A和用户B的行为熵分别为1.600727和0.470349。
步骤S308,根据行为熵确定用户的网络访问行为所属的访问类别。
本实施例中,服务器可预先设置不同的访问类别。例如,可设置为三类,对应的行为为机器访问行为、可疑访问行为和正常访问行为,将对应的用户分别设置为机器用户、可疑用户和正常用户。
服务器可根据所确定的行为熵的计算公式来对应设置每个类别的用户对应的行为熵的范围。在计算出行为熵后,则查找该行为熵所属的范围,进而确定该用户的类别。
同样以表1所对应的行为熵的计算方式来举例说明,服务器设置机器访 问行为、可疑访问行为和正常访问行为对应的行为熵的所属范围。
比如,设置机器访问行为、可疑访问行为和正常访问行为对应的行为熵的所属范围分别为0≤x<0.5、0.5≤x<1以及x≥1,其中,x表示用户的行为熵的值。如表1所示,用户A的行为熵为1.600727时,可判断其隶属范围为x≥1,则可判断其为正常用户;用户B的行为熵为0.470349,其隶属范围为0.5≤x<1,则判断其为机器用户。
本实施例所提供的网络访问行为识别方法中,通过根据用户在预设时间段内的网络访问信息来提取用户在每个预设行为类别中的行为数据,然后再根据该行为数据计算出用户的行为熵;进而可该行为熵来确定用户的网络访问行为所属的访问类别。上述方法可快速且准确识别出用户是在恶意盗取数据的行为还是正常浏览的行为,进而可根据行为判断结果做出对应的措施。
在一个实施例中,在根据行为熵确定用户的网络访问行为所属的访问类别的步骤之后,包括:当判断用户的访问类别为机器访问行为时,冻结用户的账号;当用户的访问类别为可疑访问行为时,获取用户的联系方式,向该联系方式发送可疑警告。
本实施例中,对于判断出用户为机器用户时,则说明该用户行为是在恶意盗取数据,如盗取电商网站上的商品信息,因此,可冻结该账号的使用。对于判断为可疑用户时,说明该用户存在一定程度上的盗取数据行为,服务器可获取用户的联系方式,如用户的邮箱,向用户发送警告邮件。
进一步的,在一个实施例中,网络访问信息包括用户在预设时间段内的登录时间。根据用户在每个预设行为类别中的行为数据计算出用户的行为熵的步骤,包括:计算用户登录时间的离散度;根据离散度确定用户的第一行为熵权值;根据第一行为熵权值与行为熵确定用户的新的行为熵。
本实施例中,登录时间表示用户开始登录账号进行浏览的时间,该登录时间不考虑登陆日期。登录时间的离散度可以以“方差”来表征。服务器可根据从预设时间段内的网络访问信息中获取的登录时间计算出用户的平均登录时间,再进而计算出用户在该预设时间段内的登录时间的方差。根据该登 录次数和方差对应确定用户的行为熵。
服务器可设置在不同登录次数范围下,不同的方差或方差范围所对应的第一行为熵权值。将上述所计算出的行为熵与所确定的第一行为熵权值相乘,将该乘积作为新的行为熵,通过该新的行为熵来对应确定用户的行为类别。
举例来说,如表2~4所示,表2和表3分别记录了表1中的用户A和用户B的登录时间,表4为服务器预设的时间离散度范围和第一行为熵权值的对应关系表。若服务器计算出用户A的时间离散度为1,用户B的时间离散度为15,则可根据该预设的对应关系获取到用户A与B的第一行为熵权值分别为1和0.9,进而可根据该权值计算出用户A与B的新的行为熵为1.600727和0.4233141。
表2
8:01:23 7:58:32 7:59:59
表3
14:36:49 21:40:51 8:06:07 11:35:25
表4
时间离散度范围 0-10 10-100 100以上
第一行为熵权值 0.7 0.9 1
本实施例中,通过结合用户的登陆时间来确定用户的行为熵,进而可根据该行为熵来对应确定用户的行为分类,可进一步提高对用户行为的辨识的准确率。
更进一步的,在一个实施例中,网络访问信息包括用户的年龄。根据用户在每个预设行为类别中的行为数据计算出用户的行为熵的步骤,还包括:判断用户的年龄是否属于敏感年龄,若是,则获取敏感年龄对应的第二行为熵权值;根据第二行为熵权值与行为熵确定用户的新的行为熵。
本实施例中,服务器可设置多个年龄为敏感年龄,该敏感年龄用于表示 很可能为机器自动生成的年龄,如可设置敏感年龄为-1、0、1以及999岁等。当检测到用户的年龄为敏感年龄时,可获取服务器设置的敏感年龄对应的第二行为熵权值。在根据第二行为熵权值与初步计算出的行为熵相乘,将其乘积作为用户的新的行为熵。
在一个实施例中,还可结合用户的年龄和登录时间来确定新的行为熵,在一个实施例中,当检测到用户的年龄为敏感年龄时,可将初步计算出的行为熵乘以第一行为熵权值和第二行为熵权值,将该乘积作为新的行为熵。
本实施例中,通过结合用户的年龄来确定用户的行为熵,进而可根据该行为熵来对应确定用户的行为分类,可进一步提高对用户行为的辨识的准确率。
在一个实施例中,如图5所示,提供了一种网络访问行为识别装置,该装置可运行于如图1或图2中所示的服务器中,包括:
网络访问信息获取模块502,用于获取用户在预设时间段内的网络访问信息。
行为数据获取模块504,用于根据网络访问信息提取用户在每个预设行为类别中的行为数据。
行为熵计算模块506,用于根据用户在每个预设行为类别中的行为数据计算出用户的行为熵,行为熵为表征用户的网络访问行为的离散程度。
访问类别确定模块508,用于根据行为熵确定用户的网络访问行为所属的访问类别。
在一个实施例中,行为数据获取模块504还用于对网络访问信息进行预处理;及根据预处理后的网络访问信息获取用户在每个预设行为类别中的行为数据,使获取的同一类别的行为数据具有相同的格式。
在一个实施例中,如图6所示,行为熵计算模块506包括:
次数计算单元602,用于根据用户在每个预设行为类别中的行为数据计算对应的每个预设行为类别的统计次数和总次数,总次数为每个预设行为类别的统计次数之和。
类概率计算单元604,用于根据每个预设行为类别的统计次数和总次数计算对应的每个预设行为类别的类概率。
类别熵计算单元606,用于根据每个预设行为类别的类概率计算对应的每个预设行为类别的类别熵,类别熵为表征用户在对应类别的网络访问行为的离散程度。
行为熵计算单元608,根据每个预设行为类别的类别熵计算行为熵。
在一个实施例中,类别熵的计算公式为:
Figure PCTCN2017073615-appb-000003
其中Pi表示第i类行为的类别熵,a为任意不为0的系数,b表示大于0的系数,C表示预设行为类别的类别数,pi表示第i类的类别行为相对于总次数的概率;行为熵为每个预设行为类别的类别熵之和。
在一个实施例中,网络访问信息包括用户在预设时间段内的登录时间。行为熵计算模块506还用于计算用户登录时间的离散度;根据离散度确定用户的第一行为熵权值;根据第一行为熵权值与行为熵确定用户的新的行为熵。
在一个实施例中,网络访问信息包括用户的年龄。行为熵计算模块506还用于判断用户的年龄是否属于敏感年龄,若是,则获取敏感年龄对应的第二行为熵权值;根据第二行为熵权值与行为熵确定用户的新的行为熵。
上述网络访问行为识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。其中,网络接口可以是以太网卡或无线网卡等。上述各模块可以硬件形式内嵌于或独立于服务器中的处理器中,也可以以软件形式存储于服务器中的存储器中,以便于处理器调用执行以上各个模块对应的操作。该处理器可以为中央处理单元(CPU)、微处理器、单片机等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于 一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (24)

  1. 一种网络访问行为识别方法,包括:
    获取用户在预设时间段内的网络访问信息;
    根据所述网络访问信息提取所述用户在每个预设行为类别中的行为数据;
    根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为熵,所述行为熵为表征所述用户的网络访问行为的离散程度;及
    根据所述行为熵确定所述用户的网络访问行为所属的访问类别。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述网络访问信息提取所述用户在每个预设行为类别中的行为数据包括:
    对所述网络访问信息进行预处理;及
    根据预处理后的网络访问信息获取所述用户在每个预设行为类别中的行为数据,使获取的同一类别的行为数据具有相同的格式。
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为熵包括:
    根据所述用户在每个预设行为类别中的行为数据计算对应的每个预设行为类别的统计次数和总次数,所述总次数为所述每个预设行为类别的统计次数之和;
    根据所述每个预设行为类别的统计次数和总次数计算对应的每个预设行为类别的类概率;
    根据所述每个预设行为类别的类概率计算对应的每个预设行为类别的类别熵,所述类别熵为表征所述用户在对应类别的网络访问行为的离散程度;及
    根据所述每个预设行为类别的类别熵计算所述行为熵。
  4. 根据权利要求3所述的方法,其特征在于,所述类别熵的计算公式为:
    Figure PCTCN2017073615-appb-100001
    其中Pi表示第i类行为的类别熵,a为任意不为0的系数, b表示大于0的系数,C表示预设行为类别的类别数,pi表示第i类的类别行为相对于总次数的概率;
    所述行为熵为所述每个预设行为类别的类别熵之和。
  5. 根据权利要求1所述的方法,其特征在于,所述网络访问信息包括用户在预设时间段内的登录时间;
    所述根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为熵包括:
    计算用户登录时间的离散度;
    根据所述离散度确定用户的第一行为熵权值;及
    根据所述第一行为熵权值与所述行为熵确定用户的新的行为熵。
  6. 根据权利要求1所述的方法,其特征在于,所述网络访问信息包括用户的年龄;
    所述根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为熵包括:
    判断用户的年龄是否属于敏感年龄,若是,则
    获取所述敏感年龄对应的第二行为熵权值;及
    根据所述第二行为熵权值与所述行为熵确定用户的新的行为熵。
  7. 一种网络访问行为识别装置,包括:
    网络访问信息获取模块,用于获取用户在预设时间段内的网络访问信息;
    行为数据获取模块,用于根据所述网络访问信息提取所述用户在每个预设行为类别中的行为数据;
    行为熵计算模块,用于根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为熵,所述行为熵为表征用户的网络访问行为的离散程度;及
    访问类别确定模块,用于根据所述行为熵确定所述用户的网络访问行为所属的访问类别。
  8. 根据权利要求7所述的装置,其特征在于,所述行为数据获取模块还用于对所述网络访问信息进行预处理;及根据预处理后的网络访问信息获取所述用户在每个预设行为类别中的行为数据,使获取的同一类别的行为数据具有相同的格式。
  9. 根据权利要求7所述的装置,其特征在于,所述行为熵计算模块包括:
    次数计算单元,用于根据所述用户在每个预设行为类别中的行为数据计算对应的每个预设行为类别的统计次数和总次数,所述总次数为所述每个预设行为类别的统计次数之和;
    类概率计算单元,用于根据所述每个预设行为类别的统计次数和总次数计算对应的每个预设行为类别的类概率;及
    类别熵计算单元,用于根据所述每个预设行为类别的类概率计算对应的每个预设行为类别的类别熵,所述类别熵为表征所述用户在对应类别的网络访问行为的离散程度;
    行为熵计算单元,根据所述每个预设行为类别的类别熵计算所述行为熵。
  10. 根据权利要求8所述的装置,其特征在于,所述类别熵的计算公式为:其中Pi表示第i类行为的类别熵,a为任意不为0的系数,b表示大于0的系数,C表示预设行为类别的类别数,pi表示第i类的类别行为相对于总次数的概率;
    所述行为熵为所述每个预设行为类别的类别熵之和。
  11. 根据权利要求7所述的装置,其特征在于,所述网络访问信息包括用户在预设时间段内的登录时间;
    所述行为熵计算模块还用于计算用户登录时间的离散度;根据所述离散度确定用户的第一行为熵权值;根据所述第一行为熵权值与所述行为熵确定用户的新的行为熵。
  12. 根据权利要求7所述的装置,其特征在于,所述网络访问信息包括 用户的年龄;
    所述行为熵计算模块还用于判断用户的年龄是否属于敏感年龄,若是,则获取所述敏感年龄对应的第二行为熵权值;根据所述第二行为熵权值与所述行为熵确定用户的新的行为熵。
  13. 一种服务器,包括存储器和处理器,所述存储器中存储有指令,所述指令被所述处理器执行时,使得所述处理器执行以下步骤:
    获取用户在预设时间段内的网络访问信息;
    根据所述网络访问信息提取所述用户在每个预设行为类别中的行为数据;
    根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为熵,所述行为熵为表征所述用户的网络访问行为的离散程度;及
    根据所述行为熵确定所述用户的网络访问行为所属的访问类别。
  14. 根据权利要求13所述的服务器,其特征在于,所述处理器所执行的所述根据所述网络访问信息提取所述用户在每个预设行为类别中的行为数据包括:
    对所述网络访问信息进行预处理;及
    根据预处理后的网络访问信息获取所述用户在每个预设行为类别中的行为数据,使获取的同一类别的行为数据具有相同的格式。
  15. 根据权利要求13所述的服务器,其特征在于,所述处理器所执行的所述根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为熵包括:
    根据所述用户在每个预设行为类别中的行为数据计算对应的每个预设行为类别的统计次数和总次数,所述总次数为所述每个预设行为类别的统计次数之和;
    根据所述每个预设行为类别的统计次数和总次数计算对应的每个预设行为类别的类概率;
    根据所述每个预设行为类别的类概率计算对应的每个预设行为类别的类 别熵,所述类别熵为表征所述用户在对应类别的网络访问行为的离散程度;及
    根据所述每个预设行为类别的类别熵计算所述行为熵。
  16. 根据权利要求15所述的服务器,其特征在于,所述类别熵的计算公式为:
    Figure PCTCN2017073615-appb-100003
    其中Pi表示第i类行为的类别熵,a为任意不为0的系数,b表示大于0的系数,C表示预设行为类别的类别数,pi表示第i类的类别行为相对于总次数的概率;
    所述行为熵为所述每个预设行为类别的类别熵之和。
  17. 根据权利要求13所述的服务器,其特征在于,所述网络访问信息包括用户在预设时间段内的登录时间;
    所述处理器所执行的所述根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为熵包括:
    计算用户登录时间的离散度;
    根据所述离散度确定用户的第一行为熵权值;及
    根据所述第一行为熵权值与所述行为熵确定用户的新的行为熵。
  18. 根据权利要求13所述的服务器,其特征在于,所述网络访问信息包括用户的年龄;
    所述处理器所执行的所述根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为熵包括:
    判断用户的年龄是否属于敏感年龄,若是,则
    获取所述敏感年龄对应的第二行为熵权值;及
    根据所述第二行为熵权值与所述行为熵确定用户的新的行为熵。
  19. 一个或多个存储有计算机可执行指令的非易失性可读存储介质,所述计算机可执行指令被一个或多个处理器执行,使得所述一个或多个处理器执行以下步骤:
    获取用户在预设时间段内的网络访问信息;
    根据所述网络访问信息提取所述用户在每个预设行为类别中的行为数据;
    根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为熵,所述行为熵为表征所述用户的网络访问行为的离散程度;及
    根据所述行为熵确定所述用户的网络访问行为所属的访问类别。
  20. 根据权利要求19所述的非易失性可读存储介质,其特征在于,所述处理器所执行的所述根据所述网络访问信息提取所述用户在每个预设行为类别中的行为数据包括:
    对所述网络访问信息进行预处理;及
    根据预处理后的网络访问信息获取所述用户在每个预设行为类别中的行为数据,使获取的同一类别的行为数据具有相同的格式。
  21. 根据权利要求19所述的非易失性可读存储介质,其特征在于,所述处理器所执行的所述根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为熵包括:
    根据所述用户在每个预设行为类别中的行为数据计算对应的每个预设行为类别的统计次数和总次数,所述总次数为所述每个预设行为类别的统计次数之和;
    根据所述每个预设行为类别的统计次数和总次数计算对应的每个预设行为类别的类概率;
    根据所述每个预设行为类别的类概率计算对应的每个预设行为类别的类别熵,所述类别熵为表征所述用户在对应类别的网络访问行为的离散程度;及
    根据所述每个预设行为类别的类别熵计算所述行为熵。
  22. 根据权利要求21所述的非易失性可读存储介质,其特征在于,所述类别熵的计算公式为:
    Figure PCTCN2017073615-appb-100004
    其中Pi表示第i类行为的类别熵,a 为任意不为0的系数,b表示大于0的系数,C表示预设行为类别的类别数,pi表示第i类的类别行为相对于总次数的概率;
    所述行为熵为所述每个预设行为类别的类别熵之和。
  23. 根据权利要求19所述的非易失性可读存储介质,其特征在于,所述网络访问信息包括用户在预设时间段内的登录时间;
    所述处理器所执行的所述根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为熵包括:
    计算用户登录时间的离散度;
    根据所述离散度确定用户的第一行为熵权值;及
    根据所述第一行为熵权值与所述行为熵确定用户的新的行为熵。
  24. 根据权利要求19所述的非易失性可读存储介质,其特征在于,所述网络访问信息包括用户的年龄;
    所述处理器所执行的所述根据所述用户在每个预设行为类别中的行为数据计算出所述用户的行为熵包括:
    判断用户的年龄是否属于敏感年龄,若是,则
    获取所述敏感年龄对应的第二行为熵权值;及
    根据所述第二行为熵权值与所述行为熵确定用户的新的行为熵。
PCT/CN2017/073615 2016-02-24 2017-02-15 网络访问行为识别方法和装置、服务器和存储介质 WO2017143934A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
EP17755763.4A EP3370169A4 (en) 2016-02-24 2017-02-15 METHOD AND DEVICE FOR IDENTIFYING NETWORK ACCESS BEHAVIOR, SERVER AND STORAGE MEDIUM
JP2018513718A JP6422617B2 (ja) 2016-02-24 2017-02-15 ネットワークアクセス動作識別プログラム、サーバ及び記憶媒体
KR1020187015207A KR20180118597A (ko) 2016-02-24 2017-02-15 네트워크 액세스 행동을 식별하는 방법 및 장치, 서버와 저장 매체
SG11201708944VA SG11201708944VA (en) 2016-02-24 2017-02-15 Method and device of identifying network access behavior, server and storage medium
US15/578,695 US20180359268A1 (en) 2016-02-24 2017-02-15 Method and Device of Identifying Network Access Behavior, Server and Storage Medium
AU2017221945A AU2017221945B2 (en) 2016-02-24 2017-02-15 Method and device of identifying network access behavior, server and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610100358.0 2016-02-24
CN201610100358.0A CN105808639B (zh) 2016-02-24 2016-02-24 网络访问行为识别方法和装置

Publications (1)

Publication Number Publication Date
WO2017143934A1 true WO2017143934A1 (zh) 2017-08-31

Family

ID=56466462

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/073615 WO2017143934A1 (zh) 2016-02-24 2017-02-15 网络访问行为识别方法和装置、服务器和存储介质

Country Status (8)

Country Link
US (1) US20180359268A1 (zh)
EP (1) EP3370169A4 (zh)
JP (1) JP6422617B2 (zh)
KR (1) KR20180118597A (zh)
CN (1) CN105808639B (zh)
AU (1) AU2017221945B2 (zh)
SG (1) SG11201708944VA (zh)
WO (1) WO2017143934A1 (zh)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808639B (zh) * 2016-02-24 2021-02-09 平安科技(深圳)有限公司 网络访问行为识别方法和装置
CN107707509B (zh) * 2016-08-08 2020-09-29 阿里巴巴集团控股有限公司 识别及辅助识别虚假流量的方法、装置及系统
CN107527223A (zh) * 2016-12-22 2017-12-29 北京锐安科技有限公司 一种购票信息分析的方法及装置
CN108243142A (zh) * 2016-12-23 2018-07-03 阿里巴巴集团控股有限公司 识别方法和装置以及反垃圾内容系统
US10440037B2 (en) * 2017-03-31 2019-10-08 Mcafee, Llc Identifying malware-suspect end points through entropy changes in consolidated logs
CN108829572A (zh) * 2018-05-30 2018-11-16 北京奇虎科技有限公司 用户登录行为的分析方法及装置
CN108616545B (zh) * 2018-06-26 2021-06-29 中国科学院信息工程研究所 一种网络内部威胁的检测方法、系统及电子设备
CN109714636B (zh) * 2018-12-21 2021-04-23 武汉瓯越网视有限公司 一种用户识别方法、装置、设备及介质
CN109978627B (zh) * 2019-03-29 2023-08-08 电子科技大学中山学院 一种面向宽带接入网用户上网行为大数据的建模方法
CN110519257B (zh) * 2019-08-22 2022-04-01 北京天融信网络安全技术有限公司 一种网络信息的处理方法及装置
CN110543862B (zh) * 2019-09-05 2022-04-22 北京达佳互联信息技术有限公司 数据获取方法、装置及存储介质
CN112559840B (zh) * 2019-09-10 2023-08-18 中国移动通信集团浙江有限公司 上网行为识别方法、装置、计算设备及计算机存储介质
CN110675228B (zh) * 2019-09-27 2021-05-28 支付宝(杭州)信息技术有限公司 用户购票行为检测方法以及装置
CN111461545B (zh) * 2020-03-31 2023-11-10 北京深演智能科技股份有限公司 机器访问数据的确定方法及装置
CN112437197B (zh) * 2020-10-30 2021-06-18 中国人民解放军战略支援部队信息工程大学 一种基于通信行为信息熵的异常呼叫发现方法与装置
CN113486366A (zh) * 2021-06-08 2021-10-08 贵州电网有限责任公司 一种基于聚类分析的Web违规操作行为检测方法
CN113660277B (zh) * 2021-08-18 2023-01-06 广州优视云集科技有限公司 一种基于复用埋点信息的反爬虫方法及处理终端

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446979A (zh) * 2008-12-26 2009-06-03 北京科尔威视网络科技有限公司 动态热点跟踪的方法
CN101841529A (zh) * 2010-03-12 2010-09-22 北京工业大学 基于信息论和信任的隐私信息保护方法
US8495375B2 (en) * 2007-12-21 2013-07-23 Research In Motion Limited Methods and systems for secure channel initialization
CN103793426A (zh) * 2012-11-01 2014-05-14 腾讯科技(深圳)有限公司 一种网页访问记录保存方法及装置
CN105808639A (zh) * 2016-02-24 2016-07-27 平安科技(深圳)有限公司 网络访问行为识别方法和装置

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2006263653A1 (en) * 2005-06-29 2007-01-04 Trustees Of Boston University Whole-network anomaly diagnosis
CA2531410A1 (en) * 2005-12-23 2007-06-23 Snipe Network Security Corporation Behavioural-based network anomaly detection based on user and group profiling
US8244752B2 (en) * 2008-04-21 2012-08-14 Microsoft Corporation Classifying search query traffic
US7974970B2 (en) * 2008-10-09 2011-07-05 Yahoo! Inc. Detection of undesirable web pages
US20110131652A1 (en) * 2009-05-29 2011-06-02 Autotrader.Com, Inc. Trained predictive services to interdict undesired website accesses
US10187353B2 (en) * 2010-06-02 2019-01-22 Symantec Corporation Behavioral classification of network data flows
JP2012048360A (ja) * 2010-08-25 2012-03-08 Sony Corp Id価値評価装置、id価値評価システム、及びid価値評価方法
US20120090027A1 (en) * 2010-10-12 2012-04-12 Electronics And Telecommunications Research Institute Apparatus and method for detecting abnormal host based on session monitoring
US20120158953A1 (en) * 2010-12-21 2012-06-21 Raytheon Bbn Technologies Corp. Systems and methods for monitoring and mitigating information leaks
JP5579140B2 (ja) * 2011-09-05 2014-08-27 日本電信電話株式会社 文書検索装置及び方法及びプログラム
CN102271091B (zh) * 2011-09-06 2013-09-25 电子科技大学 一种网络异常事件分类方法
CN102752288B (zh) * 2012-06-06 2015-07-08 华为技术有限公司 网络访问行为识别方法和装置
US20140257919A1 (en) * 2013-03-09 2014-09-11 Hewlett- Packard Development Company, L.P. Reward population grouping
US9380066B2 (en) * 2013-03-29 2016-06-28 Intel Corporation Distributed traffic pattern analysis and entropy prediction for detecting malware in a network environment
CN103793484B (zh) * 2014-01-17 2017-03-15 五八同城信息技术有限公司 分类信息网站中的基于机器学习的欺诈行为识别系统
CN104836702B (zh) * 2015-05-06 2018-06-19 华中科技大学 一种大流量环境下主机网络异常行为检测及分类方法
CN104883363A (zh) * 2015-05-11 2015-09-02 北京交通大学 异常访问行为分析方法及装置
CN104994056B (zh) * 2015-05-11 2018-01-19 中国电力科学研究院 一种电力信息网络中流量识别模型的动态更新方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8495375B2 (en) * 2007-12-21 2013-07-23 Research In Motion Limited Methods and systems for secure channel initialization
CN101446979A (zh) * 2008-12-26 2009-06-03 北京科尔威视网络科技有限公司 动态热点跟踪的方法
CN101841529A (zh) * 2010-03-12 2010-09-22 北京工业大学 基于信息论和信任的隐私信息保护方法
CN103793426A (zh) * 2012-11-01 2014-05-14 腾讯科技(深圳)有限公司 一种网页访问记录保存方法及装置
CN105808639A (zh) * 2016-02-24 2016-07-27 平安科技(深圳)有限公司 网络访问行为识别方法和装置

Also Published As

Publication number Publication date
EP3370169A4 (en) 2019-06-12
US20180359268A1 (en) 2018-12-13
JP2018516421A (ja) 2018-06-21
JP6422617B2 (ja) 2018-11-14
SG11201708944VA (en) 2017-11-29
KR20180118597A (ko) 2018-10-31
AU2017221945A1 (en) 2017-11-23
AU2017221945B2 (en) 2019-11-07
EP3370169A1 (en) 2018-09-05
CN105808639B (zh) 2021-02-09
CN105808639A (zh) 2016-07-27

Similar Documents

Publication Publication Date Title
WO2017143934A1 (zh) 网络访问行为识别方法和装置、服务器和存储介质
US11710054B2 (en) Information recommendation method, apparatus, and server based on user data in an online forum
US9215252B2 (en) Methods and apparatus to identify privacy relevant correlations between data values
CA2985028C (en) Gating decision system and methods for determining whether to allow material implications to result from online activities
CN107872436B (zh) 一种账号识别方法、装置及系统
KR101939554B1 (ko) 일시적 거래 한도 결정
US11403643B2 (en) Utilizing a time-dependent graph convolutional neural network for fraudulent transaction identification
WO2015085961A1 (zh) 构建用户画像的方法及装置
WO2020048084A1 (zh) 资源推荐方法、装置、计算机设备及计算机可读存储介质
JP5551704B2 (ja) オンライン・マーケティング効率の評価
US10909145B2 (en) Techniques for determining whether to associate new user information with an existing user
CN107622197A (zh) 设备识别方法及装置、用于设备识别的权重计算方法及装置
CN111415167B (zh) 网络欺诈交易检测方法及装置、计算机存储介质和终端
CN110753065B (zh) 网络行为检测方法、装置、设备及存储介质
TWI780355B (zh) 維修對象的定損方法及裝置、電子設備
TWI639093B (zh) Object set and processing method and device thereof
EP3693871A1 (en) A computer implemented large-scale method, a system and computer program for optin-redundant personalized data aggregation and content delivery in telecommunication networks
CN113822691A (zh) 用户账号的识别方法、装置、系统和介质
CN115859176A (zh) 文本处理方法、装置、计算机设备和存储介质
Takano¹ et al. Check for updates Privacy-Protective Distributed Machine Learning Between Rich Devices and Edge Servers Using Confidence Level
CN111798282A (zh) 一种信息处理方法、终端及存储介质
TWM621545U (zh) 能夠判別匿名用戶是否屬於特定用戶族群的用戶管理裝置
CN114254112A (zh) 用于敏感信息预分类的方法、系统、装置和介质
TW202312061A (zh) 能夠判別匿名用戶是否屬於特定用戶族群的用戶管理裝置、方法與儲存該方法的儲存媒介
CN116010695A (zh) 资源推荐方法、推荐模型训练方法、装置及存储介质

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 11201708944V

Country of ref document: SG

ENP Entry into the national phase

Ref document number: 2017221945

Country of ref document: AU

Date of ref document: 20170215

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2018513718

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20187015207

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE