CN115603947A - Abnormal access detection method and device - Google Patents
Abnormal access detection method and device Download PDFInfo
- Publication number
- CN115603947A CN115603947A CN202211121064.8A CN202211121064A CN115603947A CN 115603947 A CN115603947 A CN 115603947A CN 202211121064 A CN202211121064 A CN 202211121064A CN 115603947 A CN115603947 A CN 115603947A
- Authority
- CN
- China
- Prior art keywords
- access
- abnormal
- sequence
- user
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 190
- 238000001514 detection method Methods 0.000 title claims abstract description 59
- 238000005065 mining Methods 0.000 claims abstract description 8
- 238000000034 method Methods 0.000 claims description 32
- 238000012216 screening Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 11
- 238000012937 correction Methods 0.000 claims description 6
- 230000002547 anomalous effect Effects 0.000 claims description 2
- 230000000875 corresponding effect Effects 0.000 description 17
- 239000013598 vector Substances 0.000 description 12
- 238000004422 calculation algorithm Methods 0.000 description 10
- 239000003795 chemical substances by application Substances 0.000 description 9
- 238000010586 diagram Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 238000003064 k means clustering Methods 0.000 description 7
- 230000006399 behavior Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000009412 basement excavation Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 235000014510 cooky Nutrition 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/10—Network architectures or network communication protocols for network security for controlling access to devices or network resources
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The disclosure provides an abnormal access detection method and device, relates to the technical field of computers, and particularly relates to the field of big data. The specific implementation scheme is as follows: determining a target user accessing a first service line in a first period; acquiring access resource information corresponding to the user identification of each target user; clustering the user identifications based on the access resource information, and determining a plurality of clustered user clusters; and detecting the user cluster, and determining an abnormal user cluster with abnormal access. The access resource information of the users is used as the clustering characteristics for clustering, abnormal access teams are found through clustering results, timeliness is higher compared with a manual mining and analyzing mode, and meanwhile abnormal access teams which are not easy to find can be found.
Description
Technical Field
The present disclosure relates to the field of computer technology, and more particularly, to the field of big data technology.
Background
The web crawler flow is that the web traffic is automatically captured under the control of a script according to a certain rule, which is different from the way that a normal user acquires information traffic, and therefore the web crawler flow belongs to cheating traffic or is called abnormal traffic.
To maintain the security of web information, web crawler traffic needs to be detected.
Disclosure of Invention
The present disclosure provides an abnormal access detection method, apparatus, electronic device, computer-readable storage medium, and computer program product.
According to a first aspect of the present disclosure, there is provided an abnormal access detection method, the method including:
determining a target user accessing a first service line in a first period;
acquiring access resource information corresponding to the user identification of each target user; the access resource information represents access resources used when the target user initiates an access request;
clustering the user identification based on the access resource information, and determining a plurality of clustered user clusters;
and detecting the user cluster, and determining an abnormal user cluster with abnormal access.
According to a second aspect of the present disclosure, there is provided an abnormal access detection apparatus, the apparatus including:
the target user determining module is used for determining a target user accessing the first service line in a first period;
the information acquisition module is used for acquiring access resource information corresponding to the user identification of each target user; the access resource information represents access resources used when the target user initiates an access request;
the first clustering module is used for clustering the user identifications based on the access resource information and determining a plurality of clustered user clusters;
and the detection module is used for detecting the user cluster and determining the abnormal user cluster with abnormal access.
According to a third aspect of the present disclosure, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform an abnormal access detection method.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to execute an abnormal access detection method.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements an abnormal access detection method.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a schematic flowchart of an abnormal access detection method according to an embodiment of the present disclosure;
fig. 2 is another schematic flow chart of an abnormal access detection method provided in the embodiment of the present disclosure;
fig. 3 is a schematic flowchart of another abnormal access detection method provided in the embodiment of the present disclosure;
fig. 4 is a schematic diagram of an abnormal access detection method provided in the embodiment of the present disclosure;
FIG. 5 is a block diagram of an apparatus for implementing the abnormal access detection method of an embodiment of the present disclosure;
fig. 6 is a block diagram of an electronic device provided by an embodiment of the disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The web crawler flow is that the web traffic is automatically captured under the control of a script according to a certain rule, which is different from the way that a normal user acquires information traffic, and therefore the web crawler flow belongs to cheating traffic or is called abnormal traffic.
To maintain the security of web information, web crawler traffic needs to be detected.
Because crawler behaviors are generally of teamwork, in the related technology, abnormal access teams are analyzed and mined in a manual mode, but higher crawler behaviors are more dispersed in the aspect of resource use, for example, an IP pool is used, a plurality of account numbers are cracked and invaded, the abnormal teams are difficult to find visually through log flow, the specific behavior patterns of the abnormal teams are lack of understanding, the resource pools used by the teams cannot be effectively correlated, and the teams using the same resource pools in subsequent related businesses cannot be located and tracked in time.
In order to solve the technical problem, the present disclosure provides an abnormal access detection method and apparatus.
In one embodiment of the present disclosure, an abnormal access detection method is provided, and the method includes:
determining a target user accessing a first service line in a first period;
acquiring access resource information corresponding to the user identification of each target user;
clustering the user identification based on the access resource information, and determining a plurality of clustered user clusters;
and detecting the user cluster, and determining an abnormal user cluster with abnormal access.
The exception access has the following features: abnormally accessed teams are often associated with a uniform resource pool and frequently change access resources from the resource pool to avoid detection. In the embodiment of the present disclosure, in consideration of the above characteristics of abnormal access, the access resource information of the users is clustered as a clustering characteristic, so that users adopting similar access resource information are clustered into one class. For the abnormal access team, the same resource pool is adopted, clustering is carried out according to the access resource information, the identifiers of the user accounts adopted by the abnormal access team can be gathered into one class, and whether each user cluster is an abnormal user cluster or not can be easily identified after clustering, namely, the abnormal access team is mined through a clustering result. Compared with a manual excavation and analysis mode, the timeliness is stronger, and meanwhile, an abnormal access team which is not easy to find can be excavated.
The following describes the abnormal access detection method provided by the embodiment of the present disclosure in detail.
Referring to fig. 1, fig. 1 is a schematic flowchart of an abnormal access detection method provided in an embodiment of the present disclosure, and as shown in fig. 1, the method may include the following steps:
s101: a target user accessing the first service line during the first time period is determined.
In the embodiment of the present disclosure, a flow log is obtained, and log standardization is performed, for example, data cleaning, field extraction, and database dropping are performed in sequence.
The users accessing the service line in a specific time period can be determined according to the standardized log, and for convenience of description, the users accessing the first service line in the first time period are taken as target users.
S102: and acquiring access resource information corresponding to the user identifier of each target user, wherein the access resource information represents access resources used when the target user initiates an access request.
The User identifier may be a User Identification (UID), that is, a numerical value generated by the network side when the User registers, and may be used as a unique identifier of the User.
And the access resource information corresponding to the user identifier of each target user can be obtained through the standardized log, and represents the access resource used when the target user initiates an access request.
As an example, an IP Address (Internet Protocol Address) is a resource that is necessary when a user initiates an access request, and thus the IP Address can be used as access resource information. For convenience of description, the IP address is hereinafter referred to as IP.
S103: and clustering the user identifications based on the access resource information, and determining a plurality of clustered user clusters.
In the embodiment of the present disclosure, the access resource information of each user identifier may be represented by a feature vector.
Before clustering, the feature vectors may be normalized. After the normalization process, clustering is performed using a related clustering algorithm.
As an example, the number of categories is first determined using the elbow method provided by the k-means clustering algorithm, and then the user identities are clustered using the k-means clustering model.
The k-means clustering algorithm may include the following steps:
1) Determining the number k of the categories, and selecting the initial k samples as initial clustering centers.
2) And calculating the distances from each sample in the data set to the k cluster centers, and classifying the samples into the class corresponding to the cluster center with the minimum distance.
3) For each category, the cluster centers are recalculated.
4) Repeating steps 2-3 until a termination condition is reached.
In the embodiment of the present disclosure, the input of the k-means clustering model is a normalized feature vector of the access resource information of each user identifier.
After clustering is completed, a plurality of user identification clusters can be obtained, and the user identification clusters can also be understood as user clusters.
S104: and detecting the user cluster and determining the abnormal user cluster with abnormal access.
In the embodiment of the disclosure, in order to avoid online detection, an abnormal access team (e.g., a web crawler team) may create a resource pool such as an IP pool, and frequently change access resources, while a normal access user may not change access resources in a frequency band, so that clustering is performed based on access resource information, and then abnormal access teams using the same resource pool can be accurately mined.
Therefore, the abnormal user cluster with abnormal access can be determined by detecting the user cluster.
In the embodiment of the present disclosure, a specific rule may be set, for example, if the average number of IPs of a user cluster exceeds a set value, or the overall traffic scale of the user cluster is large, the user cluster is considered to be an abnormal user cluster.
Therefore, in the embodiment of the present disclosure, the access resource information of the users is used as the clustering feature for clustering, so that the users adopting similar access resource information are clustered into one class. For the abnormal access team, the same resource pool is adopted, clustering is carried out according to the access resource information, the identifiers of the user accounts adopted by the abnormal access team can be gathered into one class, and whether each user cluster is an abnormal user cluster or not can be easily identified after clustering, namely, the abnormal access team is mined through a clustering result. Compared with a manual excavation and analysis mode, the timeliness is stronger, and meanwhile, an abnormal access team which is not easy to find can be excavated.
In one embodiment of the present disclosure, accessing the resource information may include: one or more of the duplication removal number of the IP, the duplication removal number of the IP network segment, the duplication removal number of the user identity cache identification and the duplication removal number of the browser user agent.
The user identity cache identifier may be a cookie, which is data generated by the website for identifying the user identity and stored in the user local terminal. Part of fields in IP network segment, IPC, IP address. A browser User Agent (UA) is used to identify browser client information.
Therefore, in the embodiment of the disclosure, when an abnormal access team accesses a service line, the IP, the user identity cache identifier, and the browser user agent are frequently replaced, so that the access resource information is used as a basis for clustering, and the abnormal access team is mined according to a clustering result.
In one embodiment of the present disclosure, the target subscriber may be a subscriber who accesses the first service line more than a set value for a first period of time.
Because the access times of the abnormal access users are larger, the users with larger access times can be obtained by performing preliminary screening according to the access times, and the users may relate to abnormal access and are taken as target users.
Therefore, in the embodiment of the disclosure, the preliminary screening is performed according to the access times, so that the data amount participating in clustering is reduced, and the abnormal access detection efficiency is further improved.
In an embodiment of the present disclosure, on the basis of the method shown in fig. 1, the method may further include:
marking the access resource information of the abnormal user cluster as abnormal resource information;
and marking the request which is detected on line and accessed by adopting the abnormal resource information as an abnormal access request.
Specifically, since the user account in the abnormal user cluster is an account adopted by the abnormal access team, the corresponding access resource also belongs to the resource pool created by the abnormal access team, and therefore the access resource information is marked as abnormal resource information, and in the subsequent detection process, if a request for accessing by adopting the abnormal resource information is detected, the request is directly marked as an abnormal access request.
Therefore, in the embodiment of the disclosure, clustering is performed based on the access resource information, an abnormal access team is mined, the access resource information adopted by the abnormal access team is marked, understanding of a specific behavior mode of the abnormal access team is facilitated, a resource pool used by the abnormal access team is effectively associated, and subsequently, when an abnormal access team using the same resource pool appears in related services, positioning and tracking can be performed in time.
As an example, first, one-day traffic data of a certain service line in a standardized log is obtained, feature dimension aggregation is performed based on the UID, the IP deduplication number, the IPC deduplication number, the cookie deduplication number, and the UA deduplication number of the UID are obtained as clustering features, the UID with the request number greater than 1000 is screened, and finally 4039 UID feature vectors of the service line are screened, so that corresponding access resource information can be represented as 4039 4-dimensional feature vectors.
And standardizing the UID characteristic vector, and clustering by a k-means clustering algorithm to obtain the category label of each UID. And then, detecting and identifying the user cluster obtained by each cluster, and finally positioning to a plurality of typical abnormal access teams.
Furthermore, the access resource information of the abnormal user cluster is marked as abnormal resource information for online detection, so that abnormal access teams adopting the same resource pool can be positioned and tracked in time.
In an embodiment of the present disclosure, the abnormal risk pattern mining may be further performed based on clustering, the abnormal features of the risk pattern are located, and the online detection rule is perfected, specifically referring to fig. 2, where fig. 2 is another schematic flow diagram of the abnormal access detection method provided in the embodiment of the present disclosure, and the method may include:
s201: candidate IPs for accessing the second service line within the second time period are determined.
In particular, the IP that accesses the service line at a particular time period, i.e., the IP employed by the user accessing the service line, can be determined from the standardized log.
For convenience of description, the IP accessing the second service line in the second period is taken as an example and is denoted as a candidate IP.
S202: and acquiring a first time sequence access sequence of each candidate IP, wherein the first time sequence access sequence comprises the access times of the candidate IP in each sub-period in the second period.
The first sequence of time-ordered accesses for each candidate IP can be further obtained from the standardized log. The first time sequence access sequence comprises the access times of the candidate IP in each subinterval in the second time period.
As an example, if the second time interval is a day and each sub-interval is 1 minute, the first time access sequence may be represented as a feature vector with dimension 1440, each value representing the number of accesses to the second service line by the candidate IP in the corresponding sub-interval.
S203: and screening out target IPs which accord with preset abnormal access characteristics from the candidate IPs based on the first time sequence access sequence.
In the embodiment of the present disclosure, the abnormal access feature may be set according to the detection experience of the abnormal traffic. For example, the number of access requests by normal users is not generally smooth, typically having peaks and valleys, i.e., higher daytime access and lower nighttime access, while abnormal access is controlled by a script, typically smooth throughout the day.
Therefore, in an embodiment of the present disclosure, it may be determined whether the first time sequence access sequence of the candidate IP is a time sequence stationary sequence, and if so, determining that the candidate IP meets the abnormal access characteristic, and determining the candidate IP as the target IP.
Therefore, in the embodiment of the disclosure, it is considered that the access request of the normal IP conforms to the time sequence stability, and the access request of the abnormal IP does not conform to the time sequence stability, so that if the time sequence access sequence of the candidate IP is judged to belong to the time sequence stability sequence, it is determined that the candidate IP conforms to the abnormal access characteristic. Abnormal IP can be screened out efficiently.
S204: and clustering the target IPs based on the time sequence access sequence of each target IP, and determining a plurality of clustered IP clusters.
The target IPs are then clustered based on the time-ordered access sequences in order to mine common features of the abnormal IP clusters.
S205: and mining abnormal IP characteristics based on the IP cluster, and updating the online deployed abnormal access detection rules based on the abnormal IP characteristics.
Specifically, abnormal IP characteristics, namely characteristics shared by the whole abnormal IP cluster, can be more intuitively mined through clustering, online detection rules are updated according to the characteristics, and the accuracy of online detection of abnormal access flow can be improved.
As an example, if there are some abnormal IPs undetected in the service feedback, it is confirmed that there is a type of traffic that continues all day long but with low frequency bypassing the on-line detection rule, and it is necessary to locate the abnormal characteristics of the type of low frequency traffic, thereby perfecting the on-line detection rule.
Specifically, a standardized log is obtained, and one-day traffic data of the service line is obtained. Feature dimension aggregation is carried out based on IP, a time sequence request sequence of the IP is obtained to serve as clustering features, the IP with the request number larger than 30000 is screened, and the request number base number can be configured according to specific service scenes. The number of the final screened IP is 580, and the stable time sequence sequences are obtained by using ADF (automatic document-Fuller, unit root inspection) inspection, and the input of the final clustering algorithm can be represented by feature vectors with 121 dimensions 1440.
And standardizing the characteristic vectors, clustering by adopting a k-means clustering algorithm, and determining the class label of each IP.
As an example, the finally determined abnormal IPs are all of the IDC (Internet Data Center) type, and the Internet Data Center has complete devices (including high-speed Internet access bandwidth, high-performance local area network, safe and reliable computer room environment, and the like), has a service platform for specialized management, and frequently changes the nickname of the user, and the generation time of the nickname of the user is new, so that the online detection rules can be further perfected by using these features.
Therefore, in the embodiment of the disclosure, the IPs which conform to the abnormal access characteristics are clustered based on the time sequence access sequence, so that the abnormal IP characteristics, that is, the characteristics common to the entire abnormal IP cluster, can be more intuitively mined through clustering, the online detection rule is updated according to the characteristics, and the accuracy of online detection of the abnormal access traffic can be improved.
In an embodiment of the present disclosure, the online misjudgment result may be corrected based on clustering, specifically referring to fig. 3, where fig. 3 is another schematic flow diagram of the abnormal access detection method provided in the embodiment of the present disclosure, and the method may include:
s301: and determining the unnatural person identifier of which the service access times are greater than a preset threshold value in the third time period.
Specifically, the unnatural person identifier for accessing the service line in a specific time period can be determined from the standardized journal, and for convenience of description, the unnatural person identifier for accessing the third service line in the third time period is taken as an example.
Wherein the unnatural people identification can comprise one or more of an IP, a browser user agent, and a client fingerprint, which can be a JA3 fingerprint.
Therefore, in the embodiment of the present disclosure, the unnatural person identifier may cover various types of information, including an IP, a browser user agent, and a client fingerprint, when a user (a natural person) accesses a service line, the unnatural person identifier information may be generated, and a timing access sequence corresponding to the unnatural person identifier may be efficiently determined by counting access requests.
S302: and determining a second time sequence access sequence corresponding to the unnatural person identifier, wherein the second time sequence access sequence contains the access times of the unnatural person identifier in each sub-period in the third period.
And further acquiring a second time sequence access sequence corresponding to each unnatural person identifier according to the standardized log. As an example, if the third time period is a day, and each sub-period is 1 minute, then the second time series of accesses may be represented as a feature vector of dimension 1440, each numerical value representing the number of accesses to the second service line by the unnatural person's logo within the corresponding sub-period.
It is easy to understand that, in the embodiment of the present disclosure, the access times of the unnatural person identifier for the service line are substantially the access times of the user who performs service access by using the unnatural person identifier for the service line.
S303: and clustering the unnatural person identifiers based on the second time sequence access sequence, and determining a plurality of clustered unnatural person identifier clusters and a clustering time sequence access sequence of each clustered unnatural person identifier cluster.
And then clustering the unnatural person identifiers based on the second time sequence access sequence to obtain a plurality of unnatural person identifier clusters, and determining the clustering time sequence access sequence of each clustered unnatural person identifier cluster.
S304: and judging whether the clustering time sequence access sequence of the unnatural person identification cluster conforms to the preset natural person access characteristic, if so, marking the unnatural person identification cluster as a non-abnormal access identification cluster.
And then sequentially judging whether the clustering time sequence access sequence of each unnatural person identification cluster conforms to the access characteristics of natural persons.
For example, if the cluster-ordered access sequence is non-stationary, has a peak period and a trough period, it may be determined to conform to natural human access characteristics.
S305: and carrying out misjudgment correction on the abnormal identification detected on the line based on the non-abnormal access identification cluster.
Specifically, the detection rule configured on the line is inevitably adopted, so that misjudgment can occur. For example, in the service test process, the service test traffic is greatly different from the access traffic of a normal user, so the service test traffic is easily identified as crawler traffic, but when a tester performs the service test, the generated traffic also conforms to the access characteristics of natural people, that is, the time sequence behavior of the whole day can be kept consistent with that of normal people.
Therefore, if the abnormal mark detected on the line belongs to the non-abnormal access mark cluster, the abnormal mark belongs to the misjudgment condition, and the abnormal mark is corrected.
Therefore, in the embodiment of the present disclosure, the unnatural person identifier is clustered according to the time sequence access sequence, and then it is determined whether the time sequence access sequence conforms to the access characteristics of the natural person. If the non-natural person identification is matched with the abnormal access identification, the non-natural person identification does not belong to the abnormal access identification, if the non-natural person identification is detected to belong to the abnormal identification through the detection rule deployed on the line, the detection on the line can be determined to belong to misjudgment, and then misjudgment correction can be carried out, the detection rule is further perfected, and the accuracy of detecting the abnormal access flow is improved.
As an example, if an unnatural person who produces more than 30000 requests per service line, including IP, UA, and JA3, then statistics is performed on the temporal access sequences, and a total of 13354 temporal access sequences are finally produced, then the input of the clustering algorithm can be represented by feature vectors of 13354 dimensions and 1440 dimensions.
And standardizing the characteristic vectors, clustering by adopting a k-means clustering algorithm, and determining a class label to which each unnatural person identifier belongs. And then judging whether each cluster accords with the normal time sequence characteristics, namely the natural human access characteristics. If the cluster accords with the access characteristics of the natural people, the cluster is not an abnormal access team, and misjudgment correction can be carried out on the abnormal identification detected on the line based on the cluster.
For the convenience of understanding, the abnormal access detection method provided by the embodiment of the present disclosure is further described below with reference to fig. 4 of the drawings.
Referring to fig. 4, fig. 4 is a schematic diagram of an abnormal access detection method provided in the embodiment of the present disclosure, first obtaining a service traffic standardized log, and then determining a cluster ID (i.e., a cluster object) and corresponding cluster characteristics according to different scenarios, where the cluster object may include a UID, an IP, an unnatural person identifier, and the like; the clustering features may include: IP deduplication, UA deduplication, sequential access sequences, etc. And then preprocessing the clustering characteristics, and clustering by a clustering algorithm.
When the clustering object is UID, the clustering characteristic is one or more of IP duplication removal number, IPC duplication removal number, cookie duplication removal number and UA duplication removal number, and an abnormal access team can be mined;
when the clustering object is an IP and the clustering characteristic is a time sequence access sequence, the abnormal IP characteristic can be mined, and the online detection rule is perfected;
when the clustering object is an unnatural mark and the clustering characteristic is a time sequence access sequence, the online misjudgment can be corrected.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an abnormal access detection apparatus provided in an embodiment of the present disclosure, where the apparatus may include:
a target user determining module 501, configured to determine a target user accessing a first service line in a first period;
an information obtaining module 502, configured to obtain access resource information corresponding to the user identifier of each target user;
a first clustering module 503, configured to cluster the user identifiers based on the access resource information, and determine a plurality of clustered user clusters;
a detecting module 504, configured to detect the user cluster, and determine an abnormal user cluster with abnormal access.
Therefore, in the embodiment of the present disclosure, the access resource information of the users is used as the clustering feature for clustering, so that the users adopting similar access resource information are clustered into one class. For the abnormal access team, the same resource pool is adopted, clustering is carried out according to the access resource information, the identifications of the user accounts adopted by the abnormal access team can be clustered into one class, and whether each user cluster is an abnormal user cluster or not can be easily identified after clustering, namely the abnormal access team is mined through a clustering result. Compared with a manual excavation and analysis mode, the timeliness is stronger, and meanwhile, an abnormal access team which is not easy to find can be excavated.
In an embodiment of the present disclosure, the accessing resource information includes: one or more of the duplication-removing number of the IP address, the duplication-removing number of the IP network segment, the duplication-removing number of the user identity cache mark and the duplication-removing number of the browser user agent.
Therefore, in the embodiment of the disclosure, when an abnormal access team accesses a service line, the IP, the user identity cache identifier, and the browser user agent are frequently replaced, so that the access resource information is used as a basis for clustering, and the abnormal access team is mined according to a clustering result.
In one embodiment of the present disclosure, the target user is a user who accesses the first service line more than a set value in the first period.
Therefore, in the embodiment of the disclosure, the preliminary screening is performed according to the access times, the data amount participating in clustering is reduced, and the abnormal access detection efficiency is further improved.
In one embodiment of the present disclosure, the method further includes:
the first marking module is used for marking the access resource information of the abnormal user cluster as abnormal resource information;
and the second marking module is used for marking the request which is detected on line and accessed by adopting the abnormal resource information as an abnormal access request.
Therefore, in the embodiment of the disclosure, clustering is performed based on the access resource information, an abnormal access team is mined, the access resource information adopted by the abnormal access team is marked, understanding of a specific behavior mode of the abnormal access team is facilitated, a resource pool used by the abnormal access team is effectively associated, and subsequently, when an abnormal access team using the same resource pool appears in related services, positioning and tracking can be performed in time.
In one embodiment of the present disclosure, the method further includes:
the candidate IP determining module is used for determining candidate IPs for accessing the second service line in a second time interval;
a first sequence obtaining module, configured to obtain a first sequence access sequence of each candidate IP, where the first sequence access sequence includes access times of the candidate IP in each sub-period in the second period;
the screening module is used for screening a target IP which accords with preset abnormal access characteristics from the candidate IPs based on the first time sequence access sequence;
the second clustering module is used for clustering the target IPs based on the time sequence access sequence of each target IP and determining a plurality of clustered IP clusters;
and the characteristic mining module is used for mining abnormal IP characteristics based on the IP cluster and updating detection rules for abnormal access flow on line based on the abnormal IP characteristics.
Therefore, in the embodiment of the disclosure, the IPs which conform to the abnormal access characteristics are clustered based on the time sequence access sequence, so that the abnormal IP characteristics, that is, the characteristics common to the entire abnormal IP cluster, can be more intuitively mined through clustering, the online detection rule is updated according to the characteristics, and the accuracy of online detection of the abnormal access traffic can be improved.
In an embodiment of the present disclosure, the screening module is specifically configured to:
and judging whether the first time sequence access sequence of the candidate IP is a time sequence stable sequence, if so, determining that the candidate IP accords with preset abnormal access characteristics, and determining the candidate IP as a target IP.
Therefore, in the embodiment of the disclosure, it is considered that the access request of the normal IP conforms to the time sequence stability, and the access request of the abnormal IP does not conform to the time sequence stability, so that if the time sequence access sequence of the candidate IP is judged to belong to the time sequence stability sequence, it is determined that the candidate IP conforms to the abnormal access characteristic. Abnormal IP can be screened out efficiently.
In one embodiment of the present disclosure, the method further includes:
the identification determining module is used for determining the unnatural identification of which the service access times in the third time period are greater than a preset threshold;
a second sequence determining module, configured to determine a second time sequence access sequence corresponding to the unnatural person identifier, where the second time sequence access sequence includes the number of times of access of the unnatural person identifier in each sub-period in the third period;
a third clustering module, configured to cluster the unnatural person identifiers based on the second time sequence access sequence, determine a plurality of clustered unnatural person identifier clusters, and determine a clustering time sequence access sequence of each clustered unnatural person identifier cluster;
the marking module is used for judging whether the clustering time sequence access sequence of the unnatural person identification cluster accords with the preset natural person access characteristic or not, and if so, marking the unnatural person identification cluster as a non-abnormal access identification cluster;
and the correcting module is used for carrying out misjudgment correction on the abnormal identifier detected on the line based on the non-abnormal access identifier cluster.
Therefore, in the embodiment of the disclosure, the IPs which meet the abnormal access characteristics are clustered based on the time sequence access sequence, so that the abnormal IP characteristics, that is, the characteristics common to the entire abnormal IP cluster, can be more intuitively mined through clustering, the online detection rules are updated according to the characteristics, and the accuracy of online detection of the abnormal access flow can be improved.
In one embodiment of the present disclosure, the unnatural people identification includes one or more of IP, browser user agent, and client fingerprint.
Therefore, in the embodiment of the present disclosure, the unnatural person identifier may cover various types of information, including an IP, a browser user agent, and a client fingerprint, when a user (a natural person) accesses a service line, the unnatural person identifier information may be generated, and a timing access sequence corresponding to the unnatural person identifier may be efficiently determined by counting access requests.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the device 600 comprises a computing unit 601, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, and the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.
Claims (19)
1. An anomalous access detection method comprising:
determining a target user accessing a first service line in a first period;
acquiring access resource information corresponding to the user identification of each target user; the access resource information represents access resources used when the target user initiates an access request;
clustering the user identification based on the access resource information, and determining a plurality of clustered user clusters;
and detecting the user cluster, and determining an abnormal user cluster with abnormal access.
2. The method of claim 1, wherein,
the access resource information includes: one or more of the duplication removal number of the Internet protocol address IP, the duplication removal number of the IP network segment, the duplication removal number of the user identity cache identifier and the duplication removal number of the browser user agent.
3. The method of claim 1, wherein the target subscriber is a subscriber that accesses the first service line more than a set number of times within the first period of time.
4. The method of claim 1, further comprising:
marking the access resource information of the abnormal user cluster as abnormal resource information;
and marking the request which is detected on line and accessed by adopting the abnormal resource information as an abnormal access request.
5. The method of claim 1, further comprising:
determining candidate IPs for accessing a second service line in a second time period;
acquiring a first time sequence access sequence of each candidate IP, wherein the first time sequence access sequence comprises the access times of the candidate IP in each sub-period in the second period;
screening out a target IP which accords with preset abnormal access characteristics from the candidate IPs based on the first time sequence access sequence;
clustering the target IPs based on the time sequence access sequence of each target IP, and determining a plurality of clustered IP clusters;
and mining abnormal IP characteristics based on the IP cluster, and updating abnormal access detection rules deployed on the line based on the abnormal IP characteristics.
6. The method of claim 5, wherein the step of screening out the target IPs that meet the preset abnormal access characteristics from the candidate IPs based on the first time sequence access comprises:
and judging whether the first time sequence access sequence of the candidate IP is a time sequence stable sequence, if so, determining that the candidate IP accords with preset abnormal access characteristics, and determining the candidate IP as a target IP.
7. The method of any of claims 1-6, further comprising:
determining an unnatural person identifier of which the service access times are greater than a preset threshold value in a third time period;
determining a second time sequence access sequence corresponding to the unnatural person identifier, wherein the second time sequence access sequence contains the access times of the unnatural person identifier in each sub-period in the third period;
clustering the unnatural person identifiers based on the second time sequence access sequence, and determining a plurality of clustered unnatural person identifier clusters and a clustering time sequence access sequence of each clustered unnatural person identifier cluster;
judging whether the clustering time sequence access sequence of the unnatural person identification cluster conforms to the preset natural person access characteristics, if so, marking the unnatural person identification cluster as a non-abnormal access identification cluster;
and carrying out misjudgment correction on the abnormal identification detected on the line based on the non-abnormal access identification cluster.
8. The method of claim 7, wherein the unnatural person identification comprises one or more of an IP, a browser user agent, and a client fingerprint.
9. An abnormal access detection apparatus comprising:
the target user determining module is used for determining a target user accessing the first service line in a first period;
the information acquisition module is used for acquiring access resource information corresponding to the user identification of each target user; the access resource information represents access resources used when the target user initiates an access request;
the first clustering module is used for clustering the user identifications based on the access resource information and determining a plurality of clustered user clusters;
and the detection module is used for detecting the user cluster and determining the abnormal user cluster with abnormal access.
10. The apparatus of claim 9, wherein,
the access resource information includes: one or more of the duplication-removing number of the IP address, the duplication-removing number of the IP network segment, the duplication-removing number of the user identity cache mark and the duplication-removing number of the browser user agent.
11. The apparatus of claim 9, wherein the target subscriber is a subscriber who accesses the first service line more than a set number of times in the first period.
12. The apparatus of claim 9, further comprising:
the first marking module is used for marking the access resource information of the abnormal user cluster as abnormal resource information;
and the second marking module is used for marking the request which is detected on line and accessed by adopting the abnormal resource information as an abnormal access request.
13. The apparatus of claim 9, further comprising:
the candidate IP determining module is used for determining candidate IPs for accessing the second service line in a second time period;
a first sequence obtaining module, configured to obtain a first sequence access sequence of each candidate IP, where the first sequence access sequence includes access times of the candidate IP in each sub-period in the second period;
the screening module is used for screening a target IP which accords with preset abnormal access characteristics from the candidate IPs based on the first time sequence access sequence;
the second clustering module is used for clustering the target IPs based on the time sequence access sequence of each target IP and determining a plurality of clustered IP clusters;
and the characteristic mining module is used for mining abnormal IP characteristics based on the IP cluster and updating the abnormal access detection rules deployed on the line based on the abnormal IP characteristics.
14. The apparatus according to claim 13, wherein the screening module is specifically configured to:
and judging whether the first time sequence access sequence of the candidate IP is a time sequence stable sequence, if so, determining that the candidate IP accords with preset abnormal access characteristics, and determining the candidate IP as a target IP.
15. The apparatus of any of claims 9-14, further comprising:
the identification determining module is used for determining the unnatural person identification of which the service access times in the third time period are greater than a preset threshold value;
a second sequence determining module, configured to determine a second time sequence access sequence corresponding to the unnatural person identifier, where the second time sequence access sequence includes the number of times of access of each sub-period of the unnatural person identifier in the third period;
a third clustering module, configured to cluster the unnatural person identifiers based on the second time sequence access sequence, determine a plurality of clustered unnatural person identifier clusters, and determine a clustering time sequence access sequence of each clustered unnatural person identifier cluster;
the third marking module is used for judging whether the clustering time sequence access sequence of the unnatural person identification cluster accords with the preset natural person access characteristic or not, and if so, marking the unnatural person identification cluster as a non-abnormal access identification cluster;
and the correcting module is used for carrying out misjudgment correction on the abnormal identification detected on the line based on the non-abnormal access identification cluster.
16. The apparatus of claim 15, wherein the unnatural person identification comprises one or more of IP, a browser user agent, and a client fingerprint.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.
19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211121064.8A CN115603947A (en) | 2022-09-15 | 2022-09-15 | Abnormal access detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211121064.8A CN115603947A (en) | 2022-09-15 | 2022-09-15 | Abnormal access detection method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115603947A true CN115603947A (en) | 2023-01-13 |
Family
ID=84842762
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211121064.8A Pending CN115603947A (en) | 2022-09-15 | 2022-09-15 | Abnormal access detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115603947A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110807487A (en) * | 2019-10-31 | 2020-02-18 | 北京邮电大学 | Method and device for identifying user based on domain name system flow record data |
WO2021114454A1 (en) * | 2019-12-13 | 2021-06-17 | 网宿科技股份有限公司 | Method and apparatus for detecting crawler request |
CN113518058A (en) * | 2020-04-09 | 2021-10-19 | 中国移动通信集团海南有限公司 | Abnormal login behavior detection method and device, storage medium and computer equipment |
CN114338171A (en) * | 2021-12-29 | 2022-04-12 | 中国建设银行股份有限公司 | Black product attack detection method and device |
-
2022
- 2022-09-15 CN CN202211121064.8A patent/CN115603947A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110807487A (en) * | 2019-10-31 | 2020-02-18 | 北京邮电大学 | Method and device for identifying user based on domain name system flow record data |
WO2021114454A1 (en) * | 2019-12-13 | 2021-06-17 | 网宿科技股份有限公司 | Method and apparatus for detecting crawler request |
CN113518058A (en) * | 2020-04-09 | 2021-10-19 | 中国移动通信集团海南有限公司 | Abnormal login behavior detection method and device, storage medium and computer equipment |
CN114338171A (en) * | 2021-12-29 | 2022-04-12 | 中国建设银行股份有限公司 | Black product attack detection method and device |
Non-Patent Citations (1)
Title |
---|
许彩滇;刘晓丽;: "基于改进K-means算法的网络入侵行为取证研究", 中国人民公安大学学报(自然科学版), no. 02, 15 May 2020 (2020-05-15) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107809331B (en) | Method and device for identifying abnormal flow | |
CN108090567B (en) | Fault diagnosis method and device for power communication system | |
CN110442712B (en) | Risk determination method, risk determination device, server and text examination system | |
CN111259952A (en) | Abnormal user identification method and device, computer equipment and storage medium | |
CN113360918A (en) | Vulnerability rapid scanning method, device, equipment and storage medium | |
CN110995687B (en) | Cat pool equipment identification method, device, equipment and storage medium | |
CN116743474A (en) | Decision tree generation method and device, electronic equipment and storage medium | |
CN117499148A (en) | Network access control method, device, equipment and storage medium | |
CN113204695A (en) | Website identification method and device | |
CN117093627A (en) | Information mining method, device, electronic equipment and storage medium | |
CN116820826A (en) | Root cause positioning method, device, equipment and storage medium based on call chain | |
CN115603947A (en) | Abnormal access detection method and device | |
CN115599687A (en) | Method, device, equipment and medium for determining software test scene | |
CN115687406A (en) | Sampling method, device and equipment of call chain data and storage medium | |
CN115062304A (en) | Risk identification method and device, electronic equipment and readable storage medium | |
CN115344627A (en) | Data screening method and device, electronic equipment and storage medium | |
CN114444087A (en) | Unauthorized vulnerability detection method and device, electronic equipment and storage medium | |
CN113434432A (en) | Performance test method, device, equipment and medium for recommendation platform | |
CN115378746B (en) | Network intrusion detection rule generation method, device, equipment and storage medium | |
CN117119434B (en) | Personnel identification method, device, equipment and storage medium | |
CN116070601B (en) | Data splicing method and device, electronic equipment and storage medium | |
CN115499231A (en) | Flow detection method and device, electronic equipment and storage medium | |
CN115619413A (en) | Method, device, equipment and storage medium for determining abnormal transactions | |
CN113360688A (en) | Information base construction method, device and system | |
CN113961898A (en) | Detection method, device and equipment for anchor in live broadcast room and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |