CN115603947A - Abnormal access detection method and device - Google Patents

Abnormal access detection method and device Download PDF

Info

Publication number
CN115603947A
CN115603947A CN202211121064.8A CN202211121064A CN115603947A CN 115603947 A CN115603947 A CN 115603947A CN 202211121064 A CN202211121064 A CN 202211121064A CN 115603947 A CN115603947 A CN 115603947A
Authority
CN
China
Prior art keywords
access
abnormal
sequence
user
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211121064.8A
Other languages
Chinese (zh)
Inventor
李任鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202211121064.8A priority Critical patent/CN115603947A/en
Publication of CN115603947A publication Critical patent/CN115603947A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The disclosure provides an abnormal access detection method and device, relates to the technical field of computers, and particularly relates to the field of big data. The specific implementation scheme is as follows: determining a target user accessing a first service line in a first period; acquiring access resource information corresponding to the user identification of each target user; clustering the user identifications based on the access resource information, and determining a plurality of clustered user clusters; and detecting the user cluster, and determining an abnormal user cluster with abnormal access. The access resource information of the users is used as the clustering characteristics for clustering, abnormal access teams are found through clustering results, timeliness is higher compared with a manual mining and analyzing mode, and meanwhile abnormal access teams which are not easy to find can be found.

Description

Abnormal access detection method and device
Technical Field
The present disclosure relates to the field of computer technology, and more particularly, to the field of big data technology.
Background
The web crawler flow is that the web traffic is automatically captured under the control of a script according to a certain rule, which is different from the way that a normal user acquires information traffic, and therefore the web crawler flow belongs to cheating traffic or is called abnormal traffic.
To maintain the security of web information, web crawler traffic needs to be detected.
Disclosure of Invention
The present disclosure provides an abnormal access detection method, apparatus, electronic device, computer-readable storage medium, and computer program product.
According to a first aspect of the present disclosure, there is provided an abnormal access detection method, the method including:
determining a target user accessing a first service line in a first period;
acquiring access resource information corresponding to the user identification of each target user; the access resource information represents access resources used when the target user initiates an access request;
clustering the user identification based on the access resource information, and determining a plurality of clustered user clusters;
and detecting the user cluster, and determining an abnormal user cluster with abnormal access.
According to a second aspect of the present disclosure, there is provided an abnormal access detection apparatus, the apparatus including:
the target user determining module is used for determining a target user accessing the first service line in a first period;
the information acquisition module is used for acquiring access resource information corresponding to the user identification of each target user; the access resource information represents access resources used when the target user initiates an access request;
the first clustering module is used for clustering the user identifications based on the access resource information and determining a plurality of clustered user clusters;
and the detection module is used for detecting the user cluster and determining the abnormal user cluster with abnormal access.
According to a third aspect of the present disclosure, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform an abnormal access detection method.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to execute an abnormal access detection method.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements an abnormal access detection method.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a schematic flowchart of an abnormal access detection method according to an embodiment of the present disclosure;
fig. 2 is another schematic flow chart of an abnormal access detection method provided in the embodiment of the present disclosure;
fig. 3 is a schematic flowchart of another abnormal access detection method provided in the embodiment of the present disclosure;
fig. 4 is a schematic diagram of an abnormal access detection method provided in the embodiment of the present disclosure;
FIG. 5 is a block diagram of an apparatus for implementing the abnormal access detection method of an embodiment of the present disclosure;
fig. 6 is a block diagram of an electronic device provided by an embodiment of the disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The web crawler flow is that the web traffic is automatically captured under the control of a script according to a certain rule, which is different from the way that a normal user acquires information traffic, and therefore the web crawler flow belongs to cheating traffic or is called abnormal traffic.
To maintain the security of web information, web crawler traffic needs to be detected.
Because crawler behaviors are generally of teamwork, in the related technology, abnormal access teams are analyzed and mined in a manual mode, but higher crawler behaviors are more dispersed in the aspect of resource use, for example, an IP pool is used, a plurality of account numbers are cracked and invaded, the abnormal teams are difficult to find visually through log flow, the specific behavior patterns of the abnormal teams are lack of understanding, the resource pools used by the teams cannot be effectively correlated, and the teams using the same resource pools in subsequent related businesses cannot be located and tracked in time.
In order to solve the technical problem, the present disclosure provides an abnormal access detection method and apparatus.
In one embodiment of the present disclosure, an abnormal access detection method is provided, and the method includes:
determining a target user accessing a first service line in a first period;
acquiring access resource information corresponding to the user identification of each target user;
clustering the user identification based on the access resource information, and determining a plurality of clustered user clusters;
and detecting the user cluster, and determining an abnormal user cluster with abnormal access.
The exception access has the following features: abnormally accessed teams are often associated with a uniform resource pool and frequently change access resources from the resource pool to avoid detection. In the embodiment of the present disclosure, in consideration of the above characteristics of abnormal access, the access resource information of the users is clustered as a clustering characteristic, so that users adopting similar access resource information are clustered into one class. For the abnormal access team, the same resource pool is adopted, clustering is carried out according to the access resource information, the identifiers of the user accounts adopted by the abnormal access team can be gathered into one class, and whether each user cluster is an abnormal user cluster or not can be easily identified after clustering, namely, the abnormal access team is mined through a clustering result. Compared with a manual excavation and analysis mode, the timeliness is stronger, and meanwhile, an abnormal access team which is not easy to find can be excavated.
The following describes the abnormal access detection method provided by the embodiment of the present disclosure in detail.
Referring to fig. 1, fig. 1 is a schematic flowchart of an abnormal access detection method provided in an embodiment of the present disclosure, and as shown in fig. 1, the method may include the following steps:
s101: a target user accessing the first service line during the first time period is determined.
In the embodiment of the present disclosure, a flow log is obtained, and log standardization is performed, for example, data cleaning, field extraction, and database dropping are performed in sequence.
The users accessing the service line in a specific time period can be determined according to the standardized log, and for convenience of description, the users accessing the first service line in the first time period are taken as target users.
S102: and acquiring access resource information corresponding to the user identifier of each target user, wherein the access resource information represents access resources used when the target user initiates an access request.
The User identifier may be a User Identification (UID), that is, a numerical value generated by the network side when the User registers, and may be used as a unique identifier of the User.
And the access resource information corresponding to the user identifier of each target user can be obtained through the standardized log, and represents the access resource used when the target user initiates an access request.
As an example, an IP Address (Internet Protocol Address) is a resource that is necessary when a user initiates an access request, and thus the IP Address can be used as access resource information. For convenience of description, the IP address is hereinafter referred to as IP.
S103: and clustering the user identifications based on the access resource information, and determining a plurality of clustered user clusters.
In the embodiment of the present disclosure, the access resource information of each user identifier may be represented by a feature vector.
Before clustering, the feature vectors may be normalized. After the normalization process, clustering is performed using a related clustering algorithm.
As an example, the number of categories is first determined using the elbow method provided by the k-means clustering algorithm, and then the user identities are clustered using the k-means clustering model.
The k-means clustering algorithm may include the following steps:
1) Determining the number k of the categories, and selecting the initial k samples as initial clustering centers.
2) And calculating the distances from each sample in the data set to the k cluster centers, and classifying the samples into the class corresponding to the cluster center with the minimum distance.
3) For each category, the cluster centers are recalculated.
4) Repeating steps 2-3 until a termination condition is reached.
In the embodiment of the present disclosure, the input of the k-means clustering model is a normalized feature vector of the access resource information of each user identifier.
After clustering is completed, a plurality of user identification clusters can be obtained, and the user identification clusters can also be understood as user clusters.
S104: and detecting the user cluster and determining the abnormal user cluster with abnormal access.
In the embodiment of the disclosure, in order to avoid online detection, an abnormal access team (e.g., a web crawler team) may create a resource pool such as an IP pool, and frequently change access resources, while a normal access user may not change access resources in a frequency band, so that clustering is performed based on access resource information, and then abnormal access teams using the same resource pool can be accurately mined.
Therefore, the abnormal user cluster with abnormal access can be determined by detecting the user cluster.
In the embodiment of the present disclosure, a specific rule may be set, for example, if the average number of IPs of a user cluster exceeds a set value, or the overall traffic scale of the user cluster is large, the user cluster is considered to be an abnormal user cluster.
Therefore, in the embodiment of the present disclosure, the access resource information of the users is used as the clustering feature for clustering, so that the users adopting similar access resource information are clustered into one class. For the abnormal access team, the same resource pool is adopted, clustering is carried out according to the access resource information, the identifiers of the user accounts adopted by the abnormal access team can be gathered into one class, and whether each user cluster is an abnormal user cluster or not can be easily identified after clustering, namely, the abnormal access team is mined through a clustering result. Compared with a manual excavation and analysis mode, the timeliness is stronger, and meanwhile, an abnormal access team which is not easy to find can be excavated.
In one embodiment of the present disclosure, accessing the resource information may include: one or more of the duplication removal number of the IP, the duplication removal number of the IP network segment, the duplication removal number of the user identity cache identification and the duplication removal number of the browser user agent.
The user identity cache identifier may be a cookie, which is data generated by the website for identifying the user identity and stored in the user local terminal. Part of fields in IP network segment, IPC, IP address. A browser User Agent (UA) is used to identify browser client information.
Therefore, in the embodiment of the disclosure, when an abnormal access team accesses a service line, the IP, the user identity cache identifier, and the browser user agent are frequently replaced, so that the access resource information is used as a basis for clustering, and the abnormal access team is mined according to a clustering result.
In one embodiment of the present disclosure, the target subscriber may be a subscriber who accesses the first service line more than a set value for a first period of time.
Because the access times of the abnormal access users are larger, the users with larger access times can be obtained by performing preliminary screening according to the access times, and the users may relate to abnormal access and are taken as target users.
Therefore, in the embodiment of the disclosure, the preliminary screening is performed according to the access times, so that the data amount participating in clustering is reduced, and the abnormal access detection efficiency is further improved.
In an embodiment of the present disclosure, on the basis of the method shown in fig. 1, the method may further include:
marking the access resource information of the abnormal user cluster as abnormal resource information;
and marking the request which is detected on line and accessed by adopting the abnormal resource information as an abnormal access request.
Specifically, since the user account in the abnormal user cluster is an account adopted by the abnormal access team, the corresponding access resource also belongs to the resource pool created by the abnormal access team, and therefore the access resource information is marked as abnormal resource information, and in the subsequent detection process, if a request for accessing by adopting the abnormal resource information is detected, the request is directly marked as an abnormal access request.
Therefore, in the embodiment of the disclosure, clustering is performed based on the access resource information, an abnormal access team is mined, the access resource information adopted by the abnormal access team is marked, understanding of a specific behavior mode of the abnormal access team is facilitated, a resource pool used by the abnormal access team is effectively associated, and subsequently, when an abnormal access team using the same resource pool appears in related services, positioning and tracking can be performed in time.
As an example, first, one-day traffic data of a certain service line in a standardized log is obtained, feature dimension aggregation is performed based on the UID, the IP deduplication number, the IPC deduplication number, the cookie deduplication number, and the UA deduplication number of the UID are obtained as clustering features, the UID with the request number greater than 1000 is screened, and finally 4039 UID feature vectors of the service line are screened, so that corresponding access resource information can be represented as 4039 4-dimensional feature vectors.
And standardizing the UID characteristic vector, and clustering by a k-means clustering algorithm to obtain the category label of each UID. And then, detecting and identifying the user cluster obtained by each cluster, and finally positioning to a plurality of typical abnormal access teams.
Furthermore, the access resource information of the abnormal user cluster is marked as abnormal resource information for online detection, so that abnormal access teams adopting the same resource pool can be positioned and tracked in time.
In an embodiment of the present disclosure, the abnormal risk pattern mining may be further performed based on clustering, the abnormal features of the risk pattern are located, and the online detection rule is perfected, specifically referring to fig. 2, where fig. 2 is another schematic flow diagram of the abnormal access detection method provided in the embodiment of the present disclosure, and the method may include:
s201: candidate IPs for accessing the second service line within the second time period are determined.
In particular, the IP that accesses the service line at a particular time period, i.e., the IP employed by the user accessing the service line, can be determined from the standardized log.
For convenience of description, the IP accessing the second service line in the second period is taken as an example and is denoted as a candidate IP.
S202: and acquiring a first time sequence access sequence of each candidate IP, wherein the first time sequence access sequence comprises the access times of the candidate IP in each sub-period in the second period.
The first sequence of time-ordered accesses for each candidate IP can be further obtained from the standardized log. The first time sequence access sequence comprises the access times of the candidate IP in each subinterval in the second time period.
As an example, if the second time interval is a day and each sub-interval is 1 minute, the first time access sequence may be represented as a feature vector with dimension 1440, each value representing the number of accesses to the second service line by the candidate IP in the corresponding sub-interval.
S203: and screening out target IPs which accord with preset abnormal access characteristics from the candidate IPs based on the first time sequence access sequence.
In the embodiment of the present disclosure, the abnormal access feature may be set according to the detection experience of the abnormal traffic. For example, the number of access requests by normal users is not generally smooth, typically having peaks and valleys, i.e., higher daytime access and lower nighttime access, while abnormal access is controlled by a script, typically smooth throughout the day.
Therefore, in an embodiment of the present disclosure, it may be determined whether the first time sequence access sequence of the candidate IP is a time sequence stationary sequence, and if so, determining that the candidate IP meets the abnormal access characteristic, and determining the candidate IP as the target IP.
Therefore, in the embodiment of the disclosure, it is considered that the access request of the normal IP conforms to the time sequence stability, and the access request of the abnormal IP does not conform to the time sequence stability, so that if the time sequence access sequence of the candidate IP is judged to belong to the time sequence stability sequence, it is determined that the candidate IP conforms to the abnormal access characteristic. Abnormal IP can be screened out efficiently.
S204: and clustering the target IPs based on the time sequence access sequence of each target IP, and determining a plurality of clustered IP clusters.
The target IPs are then clustered based on the time-ordered access sequences in order to mine common features of the abnormal IP clusters.
S205: and mining abnormal IP characteristics based on the IP cluster, and updating the online deployed abnormal access detection rules based on the abnormal IP characteristics.
Specifically, abnormal IP characteristics, namely characteristics shared by the whole abnormal IP cluster, can be more intuitively mined through clustering, online detection rules are updated according to the characteristics, and the accuracy of online detection of abnormal access flow can be improved.
As an example, if there are some abnormal IPs undetected in the service feedback, it is confirmed that there is a type of traffic that continues all day long but with low frequency bypassing the on-line detection rule, and it is necessary to locate the abnormal characteristics of the type of low frequency traffic, thereby perfecting the on-line detection rule.
Specifically, a standardized log is obtained, and one-day traffic data of the service line is obtained. Feature dimension aggregation is carried out based on IP, a time sequence request sequence of the IP is obtained to serve as clustering features, the IP with the request number larger than 30000 is screened, and the request number base number can be configured according to specific service scenes. The number of the final screened IP is 580, and the stable time sequence sequences are obtained by using ADF (automatic document-Fuller, unit root inspection) inspection, and the input of the final clustering algorithm can be represented by feature vectors with 121 dimensions 1440.
And standardizing the characteristic vectors, clustering by adopting a k-means clustering algorithm, and determining the class label of each IP.
As an example, the finally determined abnormal IPs are all of the IDC (Internet Data Center) type, and the Internet Data Center has complete devices (including high-speed Internet access bandwidth, high-performance local area network, safe and reliable computer room environment, and the like), has a service platform for specialized management, and frequently changes the nickname of the user, and the generation time of the nickname of the user is new, so that the online detection rules can be further perfected by using these features.
Therefore, in the embodiment of the disclosure, the IPs which conform to the abnormal access characteristics are clustered based on the time sequence access sequence, so that the abnormal IP characteristics, that is, the characteristics common to the entire abnormal IP cluster, can be more intuitively mined through clustering, the online detection rule is updated according to the characteristics, and the accuracy of online detection of the abnormal access traffic can be improved.
In an embodiment of the present disclosure, the online misjudgment result may be corrected based on clustering, specifically referring to fig. 3, where fig. 3 is another schematic flow diagram of the abnormal access detection method provided in the embodiment of the present disclosure, and the method may include:
s301: and determining the unnatural person identifier of which the service access times are greater than a preset threshold value in the third time period.
Specifically, the unnatural person identifier for accessing the service line in a specific time period can be determined from the standardized journal, and for convenience of description, the unnatural person identifier for accessing the third service line in the third time period is taken as an example.
Wherein the unnatural people identification can comprise one or more of an IP, a browser user agent, and a client fingerprint, which can be a JA3 fingerprint.
Therefore, in the embodiment of the present disclosure, the unnatural person identifier may cover various types of information, including an IP, a browser user agent, and a client fingerprint, when a user (a natural person) accesses a service line, the unnatural person identifier information may be generated, and a timing access sequence corresponding to the unnatural person identifier may be efficiently determined by counting access requests.
S302: and determining a second time sequence access sequence corresponding to the unnatural person identifier, wherein the second time sequence access sequence contains the access times of the unnatural person identifier in each sub-period in the third period.
And further acquiring a second time sequence access sequence corresponding to each unnatural person identifier according to the standardized log. As an example, if the third time period is a day, and each sub-period is 1 minute, then the second time series of accesses may be represented as a feature vector of dimension 1440, each numerical value representing the number of accesses to the second service line by the unnatural person's logo within the corresponding sub-period.
It is easy to understand that, in the embodiment of the present disclosure, the access times of the unnatural person identifier for the service line are substantially the access times of the user who performs service access by using the unnatural person identifier for the service line.
S303: and clustering the unnatural person identifiers based on the second time sequence access sequence, and determining a plurality of clustered unnatural person identifier clusters and a clustering time sequence access sequence of each clustered unnatural person identifier cluster.
And then clustering the unnatural person identifiers based on the second time sequence access sequence to obtain a plurality of unnatural person identifier clusters, and determining the clustering time sequence access sequence of each clustered unnatural person identifier cluster.
S304: and judging whether the clustering time sequence access sequence of the unnatural person identification cluster conforms to the preset natural person access characteristic, if so, marking the unnatural person identification cluster as a non-abnormal access identification cluster.
And then sequentially judging whether the clustering time sequence access sequence of each unnatural person identification cluster conforms to the access characteristics of natural persons.
For example, if the cluster-ordered access sequence is non-stationary, has a peak period and a trough period, it may be determined to conform to natural human access characteristics.
S305: and carrying out misjudgment correction on the abnormal identification detected on the line based on the non-abnormal access identification cluster.
Specifically, the detection rule configured on the line is inevitably adopted, so that misjudgment can occur. For example, in the service test process, the service test traffic is greatly different from the access traffic of a normal user, so the service test traffic is easily identified as crawler traffic, but when a tester performs the service test, the generated traffic also conforms to the access characteristics of natural people, that is, the time sequence behavior of the whole day can be kept consistent with that of normal people.
Therefore, if the abnormal mark detected on the line belongs to the non-abnormal access mark cluster, the abnormal mark belongs to the misjudgment condition, and the abnormal mark is corrected.
Therefore, in the embodiment of the present disclosure, the unnatural person identifier is clustered according to the time sequence access sequence, and then it is determined whether the time sequence access sequence conforms to the access characteristics of the natural person. If the non-natural person identification is matched with the abnormal access identification, the non-natural person identification does not belong to the abnormal access identification, if the non-natural person identification is detected to belong to the abnormal identification through the detection rule deployed on the line, the detection on the line can be determined to belong to misjudgment, and then misjudgment correction can be carried out, the detection rule is further perfected, and the accuracy of detecting the abnormal access flow is improved.
As an example, if an unnatural person who produces more than 30000 requests per service line, including IP, UA, and JA3, then statistics is performed on the temporal access sequences, and a total of 13354 temporal access sequences are finally produced, then the input of the clustering algorithm can be represented by feature vectors of 13354 dimensions and 1440 dimensions.
And standardizing the characteristic vectors, clustering by adopting a k-means clustering algorithm, and determining a class label to which each unnatural person identifier belongs. And then judging whether each cluster accords with the normal time sequence characteristics, namely the natural human access characteristics. If the cluster accords with the access characteristics of the natural people, the cluster is not an abnormal access team, and misjudgment correction can be carried out on the abnormal identification detected on the line based on the cluster.
For the convenience of understanding, the abnormal access detection method provided by the embodiment of the present disclosure is further described below with reference to fig. 4 of the drawings.
Referring to fig. 4, fig. 4 is a schematic diagram of an abnormal access detection method provided in the embodiment of the present disclosure, first obtaining a service traffic standardized log, and then determining a cluster ID (i.e., a cluster object) and corresponding cluster characteristics according to different scenarios, where the cluster object may include a UID, an IP, an unnatural person identifier, and the like; the clustering features may include: IP deduplication, UA deduplication, sequential access sequences, etc. And then preprocessing the clustering characteristics, and clustering by a clustering algorithm.
When the clustering object is UID, the clustering characteristic is one or more of IP duplication removal number, IPC duplication removal number, cookie duplication removal number and UA duplication removal number, and an abnormal access team can be mined;
when the clustering object is an IP and the clustering characteristic is a time sequence access sequence, the abnormal IP characteristic can be mined, and the online detection rule is perfected;
when the clustering object is an unnatural mark and the clustering characteristic is a time sequence access sequence, the online misjudgment can be corrected.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an abnormal access detection apparatus provided in an embodiment of the present disclosure, where the apparatus may include:
a target user determining module 501, configured to determine a target user accessing a first service line in a first period;
an information obtaining module 502, configured to obtain access resource information corresponding to the user identifier of each target user;
a first clustering module 503, configured to cluster the user identifiers based on the access resource information, and determine a plurality of clustered user clusters;
a detecting module 504, configured to detect the user cluster, and determine an abnormal user cluster with abnormal access.
Therefore, in the embodiment of the present disclosure, the access resource information of the users is used as the clustering feature for clustering, so that the users adopting similar access resource information are clustered into one class. For the abnormal access team, the same resource pool is adopted, clustering is carried out according to the access resource information, the identifications of the user accounts adopted by the abnormal access team can be clustered into one class, and whether each user cluster is an abnormal user cluster or not can be easily identified after clustering, namely the abnormal access team is mined through a clustering result. Compared with a manual excavation and analysis mode, the timeliness is stronger, and meanwhile, an abnormal access team which is not easy to find can be excavated.
In an embodiment of the present disclosure, the accessing resource information includes: one or more of the duplication-removing number of the IP address, the duplication-removing number of the IP network segment, the duplication-removing number of the user identity cache mark and the duplication-removing number of the browser user agent.
Therefore, in the embodiment of the disclosure, when an abnormal access team accesses a service line, the IP, the user identity cache identifier, and the browser user agent are frequently replaced, so that the access resource information is used as a basis for clustering, and the abnormal access team is mined according to a clustering result.
In one embodiment of the present disclosure, the target user is a user who accesses the first service line more than a set value in the first period.
Therefore, in the embodiment of the disclosure, the preliminary screening is performed according to the access times, the data amount participating in clustering is reduced, and the abnormal access detection efficiency is further improved.
In one embodiment of the present disclosure, the method further includes:
the first marking module is used for marking the access resource information of the abnormal user cluster as abnormal resource information;
and the second marking module is used for marking the request which is detected on line and accessed by adopting the abnormal resource information as an abnormal access request.
Therefore, in the embodiment of the disclosure, clustering is performed based on the access resource information, an abnormal access team is mined, the access resource information adopted by the abnormal access team is marked, understanding of a specific behavior mode of the abnormal access team is facilitated, a resource pool used by the abnormal access team is effectively associated, and subsequently, when an abnormal access team using the same resource pool appears in related services, positioning and tracking can be performed in time.
In one embodiment of the present disclosure, the method further includes:
the candidate IP determining module is used for determining candidate IPs for accessing the second service line in a second time interval;
a first sequence obtaining module, configured to obtain a first sequence access sequence of each candidate IP, where the first sequence access sequence includes access times of the candidate IP in each sub-period in the second period;
the screening module is used for screening a target IP which accords with preset abnormal access characteristics from the candidate IPs based on the first time sequence access sequence;
the second clustering module is used for clustering the target IPs based on the time sequence access sequence of each target IP and determining a plurality of clustered IP clusters;
and the characteristic mining module is used for mining abnormal IP characteristics based on the IP cluster and updating detection rules for abnormal access flow on line based on the abnormal IP characteristics.
Therefore, in the embodiment of the disclosure, the IPs which conform to the abnormal access characteristics are clustered based on the time sequence access sequence, so that the abnormal IP characteristics, that is, the characteristics common to the entire abnormal IP cluster, can be more intuitively mined through clustering, the online detection rule is updated according to the characteristics, and the accuracy of online detection of the abnormal access traffic can be improved.
In an embodiment of the present disclosure, the screening module is specifically configured to:
and judging whether the first time sequence access sequence of the candidate IP is a time sequence stable sequence, if so, determining that the candidate IP accords with preset abnormal access characteristics, and determining the candidate IP as a target IP.
Therefore, in the embodiment of the disclosure, it is considered that the access request of the normal IP conforms to the time sequence stability, and the access request of the abnormal IP does not conform to the time sequence stability, so that if the time sequence access sequence of the candidate IP is judged to belong to the time sequence stability sequence, it is determined that the candidate IP conforms to the abnormal access characteristic. Abnormal IP can be screened out efficiently.
In one embodiment of the present disclosure, the method further includes:
the identification determining module is used for determining the unnatural identification of which the service access times in the third time period are greater than a preset threshold;
a second sequence determining module, configured to determine a second time sequence access sequence corresponding to the unnatural person identifier, where the second time sequence access sequence includes the number of times of access of the unnatural person identifier in each sub-period in the third period;
a third clustering module, configured to cluster the unnatural person identifiers based on the second time sequence access sequence, determine a plurality of clustered unnatural person identifier clusters, and determine a clustering time sequence access sequence of each clustered unnatural person identifier cluster;
the marking module is used for judging whether the clustering time sequence access sequence of the unnatural person identification cluster accords with the preset natural person access characteristic or not, and if so, marking the unnatural person identification cluster as a non-abnormal access identification cluster;
and the correcting module is used for carrying out misjudgment correction on the abnormal identifier detected on the line based on the non-abnormal access identifier cluster.
Therefore, in the embodiment of the disclosure, the IPs which meet the abnormal access characteristics are clustered based on the time sequence access sequence, so that the abnormal IP characteristics, that is, the characteristics common to the entire abnormal IP cluster, can be more intuitively mined through clustering, the online detection rules are updated according to the characteristics, and the accuracy of online detection of the abnormal access flow can be improved.
In one embodiment of the present disclosure, the unnatural people identification includes one or more of IP, browser user agent, and client fingerprint.
Therefore, in the embodiment of the present disclosure, the unnatural person identifier may cover various types of information, including an IP, a browser user agent, and a client fingerprint, when a user (a natural person) accesses a service line, the unnatural person identifier information may be generated, and a timing access sequence corresponding to the unnatural person identifier may be efficiently determined by counting access requests.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the device 600 comprises a computing unit 601, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, and the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Computing unit 601 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 executes the respective methods and processes described above, such as the abnormal access detection method. For example, in some embodiments, the anomalous access detection method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the above described method of abnormal access detection may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the abnormal access detection method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (19)

1. An anomalous access detection method comprising:
determining a target user accessing a first service line in a first period;
acquiring access resource information corresponding to the user identification of each target user; the access resource information represents access resources used when the target user initiates an access request;
clustering the user identification based on the access resource information, and determining a plurality of clustered user clusters;
and detecting the user cluster, and determining an abnormal user cluster with abnormal access.
2. The method of claim 1, wherein,
the access resource information includes: one or more of the duplication removal number of the Internet protocol address IP, the duplication removal number of the IP network segment, the duplication removal number of the user identity cache identifier and the duplication removal number of the browser user agent.
3. The method of claim 1, wherein the target subscriber is a subscriber that accesses the first service line more than a set number of times within the first period of time.
4. The method of claim 1, further comprising:
marking the access resource information of the abnormal user cluster as abnormal resource information;
and marking the request which is detected on line and accessed by adopting the abnormal resource information as an abnormal access request.
5. The method of claim 1, further comprising:
determining candidate IPs for accessing a second service line in a second time period;
acquiring a first time sequence access sequence of each candidate IP, wherein the first time sequence access sequence comprises the access times of the candidate IP in each sub-period in the second period;
screening out a target IP which accords with preset abnormal access characteristics from the candidate IPs based on the first time sequence access sequence;
clustering the target IPs based on the time sequence access sequence of each target IP, and determining a plurality of clustered IP clusters;
and mining abnormal IP characteristics based on the IP cluster, and updating abnormal access detection rules deployed on the line based on the abnormal IP characteristics.
6. The method of claim 5, wherein the step of screening out the target IPs that meet the preset abnormal access characteristics from the candidate IPs based on the first time sequence access comprises:
and judging whether the first time sequence access sequence of the candidate IP is a time sequence stable sequence, if so, determining that the candidate IP accords with preset abnormal access characteristics, and determining the candidate IP as a target IP.
7. The method of any of claims 1-6, further comprising:
determining an unnatural person identifier of which the service access times are greater than a preset threshold value in a third time period;
determining a second time sequence access sequence corresponding to the unnatural person identifier, wherein the second time sequence access sequence contains the access times of the unnatural person identifier in each sub-period in the third period;
clustering the unnatural person identifiers based on the second time sequence access sequence, and determining a plurality of clustered unnatural person identifier clusters and a clustering time sequence access sequence of each clustered unnatural person identifier cluster;
judging whether the clustering time sequence access sequence of the unnatural person identification cluster conforms to the preset natural person access characteristics, if so, marking the unnatural person identification cluster as a non-abnormal access identification cluster;
and carrying out misjudgment correction on the abnormal identification detected on the line based on the non-abnormal access identification cluster.
8. The method of claim 7, wherein the unnatural person identification comprises one or more of an IP, a browser user agent, and a client fingerprint.
9. An abnormal access detection apparatus comprising:
the target user determining module is used for determining a target user accessing the first service line in a first period;
the information acquisition module is used for acquiring access resource information corresponding to the user identification of each target user; the access resource information represents access resources used when the target user initiates an access request;
the first clustering module is used for clustering the user identifications based on the access resource information and determining a plurality of clustered user clusters;
and the detection module is used for detecting the user cluster and determining the abnormal user cluster with abnormal access.
10. The apparatus of claim 9, wherein,
the access resource information includes: one or more of the duplication-removing number of the IP address, the duplication-removing number of the IP network segment, the duplication-removing number of the user identity cache mark and the duplication-removing number of the browser user agent.
11. The apparatus of claim 9, wherein the target subscriber is a subscriber who accesses the first service line more than a set number of times in the first period.
12. The apparatus of claim 9, further comprising:
the first marking module is used for marking the access resource information of the abnormal user cluster as abnormal resource information;
and the second marking module is used for marking the request which is detected on line and accessed by adopting the abnormal resource information as an abnormal access request.
13. The apparatus of claim 9, further comprising:
the candidate IP determining module is used for determining candidate IPs for accessing the second service line in a second time period;
a first sequence obtaining module, configured to obtain a first sequence access sequence of each candidate IP, where the first sequence access sequence includes access times of the candidate IP in each sub-period in the second period;
the screening module is used for screening a target IP which accords with preset abnormal access characteristics from the candidate IPs based on the first time sequence access sequence;
the second clustering module is used for clustering the target IPs based on the time sequence access sequence of each target IP and determining a plurality of clustered IP clusters;
and the characteristic mining module is used for mining abnormal IP characteristics based on the IP cluster and updating the abnormal access detection rules deployed on the line based on the abnormal IP characteristics.
14. The apparatus according to claim 13, wherein the screening module is specifically configured to:
and judging whether the first time sequence access sequence of the candidate IP is a time sequence stable sequence, if so, determining that the candidate IP accords with preset abnormal access characteristics, and determining the candidate IP as a target IP.
15. The apparatus of any of claims 9-14, further comprising:
the identification determining module is used for determining the unnatural person identification of which the service access times in the third time period are greater than a preset threshold value;
a second sequence determining module, configured to determine a second time sequence access sequence corresponding to the unnatural person identifier, where the second time sequence access sequence includes the number of times of access of each sub-period of the unnatural person identifier in the third period;
a third clustering module, configured to cluster the unnatural person identifiers based on the second time sequence access sequence, determine a plurality of clustered unnatural person identifier clusters, and determine a clustering time sequence access sequence of each clustered unnatural person identifier cluster;
the third marking module is used for judging whether the clustering time sequence access sequence of the unnatural person identification cluster accords with the preset natural person access characteristic or not, and if so, marking the unnatural person identification cluster as a non-abnormal access identification cluster;
and the correcting module is used for carrying out misjudgment correction on the abnormal identification detected on the line based on the non-abnormal access identification cluster.
16. The apparatus of claim 15, wherein the unnatural person identification comprises one or more of IP, a browser user agent, and a client fingerprint.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.
19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.
CN202211121064.8A 2022-09-15 2022-09-15 Abnormal access detection method and device Pending CN115603947A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211121064.8A CN115603947A (en) 2022-09-15 2022-09-15 Abnormal access detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211121064.8A CN115603947A (en) 2022-09-15 2022-09-15 Abnormal access detection method and device

Publications (1)

Publication Number Publication Date
CN115603947A true CN115603947A (en) 2023-01-13

Family

ID=84842762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211121064.8A Pending CN115603947A (en) 2022-09-15 2022-09-15 Abnormal access detection method and device

Country Status (1)

Country Link
CN (1) CN115603947A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807487A (en) * 2019-10-31 2020-02-18 北京邮电大学 Method and device for identifying user based on domain name system flow record data
WO2021114454A1 (en) * 2019-12-13 2021-06-17 网宿科技股份有限公司 Method and apparatus for detecting crawler request
CN113518058A (en) * 2020-04-09 2021-10-19 中国移动通信集团海南有限公司 Abnormal login behavior detection method and device, storage medium and computer equipment
CN114338171A (en) * 2021-12-29 2022-04-12 中国建设银行股份有限公司 Black product attack detection method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807487A (en) * 2019-10-31 2020-02-18 北京邮电大学 Method and device for identifying user based on domain name system flow record data
WO2021114454A1 (en) * 2019-12-13 2021-06-17 网宿科技股份有限公司 Method and apparatus for detecting crawler request
CN113518058A (en) * 2020-04-09 2021-10-19 中国移动通信集团海南有限公司 Abnormal login behavior detection method and device, storage medium and computer equipment
CN114338171A (en) * 2021-12-29 2022-04-12 中国建设银行股份有限公司 Black product attack detection method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
许彩滇;刘晓丽;: "基于改进K-means算法的网络入侵行为取证研究", 中国人民公安大学学报(自然科学版), no. 02, 15 May 2020 (2020-05-15) *

Similar Documents

Publication Publication Date Title
CN107809331B (en) Method and device for identifying abnormal flow
CN108090567B (en) Fault diagnosis method and device for power communication system
CN110442712B (en) Risk determination method, risk determination device, server and text examination system
CN111259952A (en) Abnormal user identification method and device, computer equipment and storage medium
CN113360918A (en) Vulnerability rapid scanning method, device, equipment and storage medium
CN110995687B (en) Cat pool equipment identification method, device, equipment and storage medium
CN116743474A (en) Decision tree generation method and device, electronic equipment and storage medium
CN117499148A (en) Network access control method, device, equipment and storage medium
CN113204695A (en) Website identification method and device
CN117093627A (en) Information mining method, device, electronic equipment and storage medium
CN116820826A (en) Root cause positioning method, device, equipment and storage medium based on call chain
CN115603947A (en) Abnormal access detection method and device
CN115599687A (en) Method, device, equipment and medium for determining software test scene
CN115687406A (en) Sampling method, device and equipment of call chain data and storage medium
CN115062304A (en) Risk identification method and device, electronic equipment and readable storage medium
CN115344627A (en) Data screening method and device, electronic equipment and storage medium
CN114444087A (en) Unauthorized vulnerability detection method and device, electronic equipment and storage medium
CN113434432A (en) Performance test method, device, equipment and medium for recommendation platform
CN115378746B (en) Network intrusion detection rule generation method, device, equipment and storage medium
CN117119434B (en) Personnel identification method, device, equipment and storage medium
CN116070601B (en) Data splicing method and device, electronic equipment and storage medium
CN115499231A (en) Flow detection method and device, electronic equipment and storage medium
CN115619413A (en) Method, device, equipment and storage medium for determining abnormal transactions
CN113360688A (en) Information base construction method, device and system
CN113961898A (en) Detection method, device and equipment for anchor in live broadcast room and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination