CN113726783A - Abnormal IP address identification method and device, electronic equipment and readable storage medium - Google Patents

Abnormal IP address identification method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN113726783A
CN113726783A CN202111012528.7A CN202111012528A CN113726783A CN 113726783 A CN113726783 A CN 113726783A CN 202111012528 A CN202111012528 A CN 202111012528A CN 113726783 A CN113726783 A CN 113726783A
Authority
CN
China
Prior art keywords
group
access
address
abnormal
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111012528.7A
Other languages
Chinese (zh)
Other versions
CN113726783B (en
Inventor
唐华阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Knownsec Information Technology Co Ltd
Original Assignee
Beijing Knownsec Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Knownsec Information Technology Co Ltd filed Critical Beijing Knownsec Information Technology Co Ltd
Priority to CN202111012528.7A priority Critical patent/CN113726783B/en
Publication of CN113726783A publication Critical patent/CN113726783A/en
Application granted granted Critical
Publication of CN113726783B publication Critical patent/CN113726783B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0236Filtering by address, protocol, port number or service, e.g. IP-address or URL
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1466Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the application provides an abnormal IP address identification method, an abnormal IP address identification device, electronic equipment and a readable storage medium, and relates to the field of computers. The method comprises the following steps: acquiring respective access record sets of the target domain name in a plurality of continuous preset periods; dividing the low-frequency IP address in each access record set into at least one IP group, wherein the first access characteristics corresponding to the access description information corresponding to the IP address in one IP group are similar; calculating the similarity between every two arbitrary IP groups in adjacent preset periods, wherein the preset periods corresponding to the two arbitrary IP groups are different; determining IP group sequences according to the similarity and the time sequence of a plurality of preset periods, wherein every two adjacent IP groups in one IP group sequence correspond to two adjacent preset periods; and acquiring a comprehensive access characteristic set of each IP group sequence, and identifying whether the IP addresses in the IP group sequences are abnormal or not based on the comprehensive access characteristic set. Thus, it is possible to identify whether or not the IP address of low frequency is abnormal.

Description

Abnormal IP address identification method and device, electronic equipment and readable storage medium
Technical Field
The present application relates to the field of computers, and in particular, to a method and an apparatus for identifying an abnormal IP address, an electronic device, and a readable storage medium.
Background
Currently, when a client frequently accesses a domain name using an IP address, the IP address is considered to be an abnormal IP address. An attacker may also employ many machines to simulate normal access and access a domain name at low frequency in order to bypass various cloud defense systems built based on this approach. Thus, the server providing the service is adversely affected. Therefore, how to identify an abnormal IP address that may exist in a low-frequency IP address is a technical problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
The embodiment of the application provides an abnormal IP address identification method, an abnormal IP address identification device, electronic equipment and a readable storage medium, which can identify an abnormal IP address possibly existing in a low-frequency IP address so as to intercept an access request based on the low-frequency abnormal IP address based on an identification result, thereby protecting a server providing service.
The embodiment of the application can be realized as follows:
in a first aspect, an embodiment of the present application provides an abnormal IP address identification method, including:
acquiring respective access record sets of the target domain name in a plurality of continuous preset periods, wherein the access record sets comprise IP addresses and access description information used by access equipment when being accessed;
dividing the low-frequency IP address in each access record set into at least one IP group according to the IP address and the access description information, wherein the first access characteristics corresponding to the access description information corresponding to the IP address in one IP group are similar;
calculating the similarity between every two arbitrary IP groups in adjacent preset periods, wherein the preset periods corresponding to the two arbitrary IP groups are different;
determining IP group sequences according to the similarity and the time sequence of the preset periods, wherein every two adjacent IP groups in one IP group sequence correspond to two adjacent preset periods;
and acquiring a comprehensive access feature set of each IP group sequence, and identifying whether the IP address in each IP group sequence is abnormal or not according to the comprehensive access feature set of each IP group sequence.
In a second aspect, an embodiment of the present application provides an abnormal IP address identification apparatus, including:
the device comprises a record obtaining module, a storage module and a processing module, wherein the record obtaining module is used for obtaining respective access record sets of a target domain name in a plurality of continuous preset periods, and the access record sets comprise IP addresses and access description information used by access equipment when being accessed;
the dividing module is used for dividing the low-frequency IP address in each access record set into at least one IP group according to the IP address and the access description information, wherein the first access characteristics corresponding to the access description information corresponding to the IP address in one IP group are similar;
the similarity calculation module is used for calculating the similarity between every two arbitrary IP groups in adjacent preset periods, wherein the preset periods corresponding to the two arbitrary IP groups are different;
the sequence determining module is used for determining IP group sequences according to the similarity and the time sequence of the preset periods, wherein every two adjacent IP groups in one IP group sequence correspond to two adjacent preset periods;
and the identification module is used for obtaining the comprehensive access feature set of each IP group sequence and identifying whether the IP address in the IP group sequence is abnormal or not according to the comprehensive access feature set of each IP group sequence.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor and a memory, where the memory stores machine executable instructions that can be executed by the processor, and the processor can execute the machine executable instructions to implement the method for identifying an abnormal IP address described in the foregoing embodiment.
In a fourth aspect, the present application provides a readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the abnormal IP address identification method according to the foregoing embodiment.
According to the method, the device, the electronic equipment and the readable storage medium for identifying the abnormal IP address, the respective access record sets of the target domain name in a plurality of continuous preset periods are obtained; then, for each access record set, dividing the low-frequency IP addresses in the access record set into at least one IP group according to the IP address used by the access equipment when the target domain name is accessed by the access equipment and the corresponding access description information in the access record set, wherein the first access characteristics corresponding to the access description information corresponding to the IP address in one IP group are similar; then, the similarity between every two arbitrary IP groups in adjacent preset periods can be calculated, and an IP group sequence is determined according to the similarity and the time sequence of a plurality of continuous preset periods, wherein the preset periods corresponding to the arbitrary two IP groups are different, and the adjacent IP groups in one IP group sequence correspond to the two adjacent preset periods; and finally, obtaining a comprehensive access feature set of each IP group sequence, and identifying whether the IP address in the IP group sequence is an abnormal IP address or not based on the comprehensive access feature set. Therefore, the IP group sequence corresponding to the similar access behavior can be determined by performing time sequence extension on the access behavior, and whether the IP address in the IP group sequence is abnormal or not is identified based on the comprehensive access characteristic set of the IP group sequence, so that the access request based on the low-frequency abnormal IP address is intercepted based on the identification result, and the server providing the service is protected.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a block diagram of an electronic device according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of an abnormal IP address identification method according to an embodiment of the present application;
fig. 3 is a schematic diagram illustrating an example of identifying an abnormal IP address according to an embodiment of the present application;
FIG. 4 is a flowchart illustrating the sub-steps included in step S130 of FIG. 2;
FIG. 5 is a flowchart illustrating one of the sub-steps included in step S150 of FIG. 2;
FIG. 6 is a second schematic flowchart of the sub-steps included in step S150 in FIG. 2;
fig. 7 is a schematic block diagram of an abnormal IP address identification apparatus according to an embodiment of the present application.
Icon: 100-an electronic device; 110-a memory; 120-a processor; 130-a communication unit; 200-an abnormal IP address identification means; 210-record obtaining module; 220-a partitioning module; 230-similarity calculation module; 240-sequence determination module; 250-identification module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It is noted that relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Currently, when a client frequently accesses a domain name using an IP address, the IP address is considered to be an abnormal IP address. Therefore, the existing cloud defense system mainly aims at the identification and protection of the abnormal single IP address. In a practical scenario, an attacker may also employ many machines, use a large number of IP addresses, simulate normal access, and access a domain name continuously and respectively at a low frequency in order to bypass various cloud defense systems established based on this approach. Thus, although the number of accesses based on a single IP address is small, the total number of accesses is very large, and the same attack purpose can be achieved. Therefore, how to identify an abnormal IP address that may exist in a low-frequency IP address is a technical problem that needs to be solved urgently by those skilled in the art.
In view of the above situation, embodiments of the present application provide an abnormal IP address identification method, an abnormal IP address identification device, an electronic device, and a readable storage medium, which may determine an IP group sequence corresponding to a similar access behavior by performing time sequence extension on the access behavior, and further identify whether an IP address in the IP group sequence is abnormal based on a comprehensive access feature set of the IP group sequence, so as to intercept an access request based on a low-frequency and abnormal IP address based on an identification result, thereby implementing protection.
The values of the above solutions are described as the results of practical and careful study of the inventors, and therefore, the discovery process of the above problems and the solutions proposed by the following embodiments of the present application for the above problems should be contributions to the present application during the course of the present application.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Referring to fig. 1, fig. 1 is a block diagram of an electronic device 100 according to an embodiment of the present disclosure. The electronic device 100 may be, but is not limited to, a computer, a server, etc. The electronic device 100 may include a memory 110, a processor 120, and a communication unit 130. The elements of the memory 110, the processor 120 and the communication unit 130 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.
The memory 110 is used to store programs or data. The Memory 110 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
The processor 120 is used to read/write data or programs stored in the memory 110 and perform corresponding functions. For example, the memory 110 stores therein an abnormal IP address recognition apparatus 200, and the abnormal IP address recognition apparatus 200 includes at least one software functional module which can be stored in the memory 110 in the form of software or firmware (firmware). The processor 120 executes various functional applications and data processing by running software programs and modules stored in the memory 110, such as the abnormal IP address recognition apparatus 200 in the embodiment of the present application, so as to implement the abnormal IP address recognition method in the embodiment of the present application.
The communication unit 130 is used for establishing a communication connection between the electronic apparatus 100 and another communication terminal via a network, and for transceiving data via the network.
It should be understood that the structure shown in fig. 1 is only a schematic structural diagram of the electronic device 100, and the electronic device 100 may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.
Referring to fig. 2, fig. 2 is a schematic flow chart of an abnormal IP address identification method according to an embodiment of the present application. The method may be applied to the electronic device 100 described above. The specific flow of the abnormal IP address identification method is explained in detail below.
Step S110, obtaining respective access record sets of the target domain name in a plurality of continuous preset periods.
In this embodiment, the access record of the target domain name within a certain time length (e.g., 50s) can be obtained. One access record corresponds to one access activity. An access record may include an IP address used by an access device when accessing the target domain name this time, and access description information describing the access. The access description information may include information for describing any feature of the current access, and may be specifically determined according to actual requirements. The target domain name is a domain name which needs to be protected, and can be determined according to an actual application scene.
The access records of the target domain name within the certain time length can be divided into a plurality of access record sets according to a preset period, wherein one preset period corresponds to one access record set. And the access time of the corresponding access behavior in one access record set is within a preset period corresponding to the access record set. The length of the preset period may be set according to actual requirements, for example, 10 s.
And step S120, dividing the low-frequency IP address in each access record set into at least one IP group according to the IP address and the access description information.
In this embodiment, for each access record set, according to the IP addresses and the access description information included in the access record set, the first access characteristic corresponding to each low-frequency IP address is determined, and then the low-frequency IP addresses with similar first access characteristics are divided into one IP group. As such, the low frequency IP addresses in one access record set may be divided into at least one IP group. The low-frequency IP address indicates an IP address with a relatively low access request amount in a certain time range.
Step S130, calculating the similarity between every two arbitrary IP groups in the adjacent preset periods.
In this embodiment, one IP group may be selected from the IP groups corresponding to the adjacent preset periods, and then the similarity between the two IP groups may be calculated. By repeating the above processing, the similarity between every two arbitrary IP groups in the adjacent preset periods can be obtained, and the preset periods corresponding to the two arbitrary IP groups are different. The processing is performed on the IP groups corresponding to every two adjacent preset periods, so that the similarity between any two IP groups in any adjacent preset period can be obtained. The similarity may be calculated according to the access description information of the two IP groups, may also be calculated according to the first access characteristic, and may also be calculated by other methods, which is not specifically limited herein.
And step S140, determining an IP group sequence according to the similarity and the time sequence of the preset periods.
In this embodiment, the IP groups with adjacent preset periods and high similarity may be determined to form an IP group sequence according to the similarity obtained in step S130 and the time sequence of the multiple continuous preset periods. Therefore, each two adjacent IP groups in one IP group sequence correspond to two adjacent preset periods. In this manner, a time sequential extension of access behavior may be achieved.
And S150, acquiring a comprehensive access feature set of each IP group sequence, and identifying whether the IP address in each IP group sequence is abnormal or not according to the comprehensive access feature set of each IP group sequence.
Under the condition of determining the IP group sequence, determining a comprehensive access characteristic set of each IP group sequence according to the access description information corresponding to each IP address in the IP group sequence aiming at each IP group sequence, and further identifying whether the IP address in the IP group sequence is an abnormal IP address or not according to the comprehensive access characteristic set.
Therefore, the IP group sequence corresponding to the similar access behavior can be determined by performing time sequence extension on the access behavior, and whether the IP address in the IP group sequence is abnormal or not is identified based on the comprehensive access characteristic set of the IP group sequence. By the method, the abnormal IP addresses with low frequency and a plurality of numbers can be identified. The abnormal IP addresses with a low frequency and a plurality of numbers indicate that the access behaviors of the low frequency IP addresses are similar and abnormal, for example, all access only one URL (Uniform Resource Locator), and the Cookie/User _ agents are the same. After the abnormal IP address is determined through the scheme, if an access behavior based on the abnormal IP address is found in the attack process, the access behavior can be determined to be the abnormal behavior, the behavior can be intercepted, and subsequent access can not be performed, so that the protection purpose is achieved.
As an alternative implementation manner, in the case of obtaining the access record of the target domain name within a certain time length (for example, 50s), the total number of accesses of each IP address within the certain time length may be obtained according to the access record. And then comparing the total access times of each IP address with the preset total access times, and then taking the IP address of which the total access times is less than the preset total access times as a low-frequency IP address. In this manner, a low frequency IP address may be determined.
As another optional implementation manner, when obtaining access records of the target domain name within a certain time length (e.g., 50s), the access records may be grouped according to the duration of a preset period to obtain a plurality of access record sets. Then, for each access record set, counting the access times of each IP address in the access record set, and further screening out the IP addresses with the access times smaller than the preset access times as the low-frequency IP addresses. Therefore, the situation that the preset access times are set to be larger due to the fact that the corresponding duration of the preset access times serving as the threshold is longer, and more normal IP addresses enter the subsequent processing flow can be avoided, and therefore the processing amount can be reduced.
Referring to fig. 3, fig. 3 is a schematic diagram illustrating an example of identifying an abnormal IP address according to an embodiment of the present application. The process of determining the low frequency address is illustrated below in conjunction with fig. 3.
The access records of the target domain name in 50s may be divided into 5 access record sets according to the access time of the access request corresponding to each access record, and the access records may be divided into 5 access record sets according to a mode that 10s is a preset period. The preset period of the access record set 1 is 10s 1, and the preset periods of the access record sets 2-5 are 10s 2, 3, 4 and 5 respectively.
Assuming that the preset access time is 10, that is, if the access time of an IP address within 10s is less than 10, the IP address may be considered as a low frequency IP address.
The number of accesses to each IP address in the access record set 1 may be counted for the access record set 1. Then, aiming at each IP address, judging whether the access frequency of the IP address is less than 10; if the IP address is smaller than the preset threshold, determining that the IP address is a low-frequency IP address; and if not, determining that the IP address is not the low-frequency IP address. In this manner, the low frequency IP address included in the access record set 1 may be obtained. The above-described processing is also performed for the access record set 2-5, and the low-frequency IP address included in the access record set 2-5 can be obtained.
Under the condition that the low-frequency IP address in one access record set is determined, all access description information of each low-frequency IP address can be obtained from the access record set by taking the IP address as a unit, and then feature extraction is carried out according to the access description information, so that first access features corresponding to each IP address are obtained. In this way, one first access characteristic may be extracted for each low-frequency IP address in each preset period.
Optionally, the access description information may include: the access device accesses a URL, a user _ agent, a URI (Uniform Resource Identifier), a host, a record, a Cookie value, a status code in the received reply message, a message size of the sent message, a message size of the reply message, and the like. The Cookie value may be a field value, which is returned to the access device by the server providing the service, and the specific allocation rule of the value may be determined by the server in combination with a preset rule; typically, the Cookie value of an access device does not change. It should be understood that the above description of the content included in the access description information is only an example, and may be determined according to actual situations, and is not limited specifically herein.
Illustratively, the feature dimension included in the obtained first access feature may include, from the access description information: mean/variance of transmitted or received quantities, state code entropy/percentage, URL depth entropy, Cookie entropy/percentage, etc. The average value of the transmission quantity represents the average value of the message size of the transmission message of an IP address in a preset period; the average value of the received quantity represents an average value of message sizes of the received reply messages of the IP address in a preset period. It should be noted that, the above is only an example, and the specific feature dimension included in the first access feature may be determined in combination with the actual application scenario.
In the case of obtaining the first access characteristic of each low-frequency IP address in one access record set, the low-frequency IP addresses in the access record set may be grouped based on the first access characteristic in any manner, so as to obtain at least one IP group.
Optionally, as an optional implementation manner, as shown in fig. 3, for the low-frequency IP addresses included in each access record set, clustering may be performed according to the first access characteristic corresponding to each low-frequency IP address, so that a plurality of clusters may be obtained in each preset period. Wherein one cluster is an IP group. C _1_0, C _1_4, C _2_0, C _5_2, etc. in fig. 3 all represent clusters obtained by clustering, that is, C _1_0, etc. each represent an IP group. The clustering method used may be DBSCAN. Of course the at least one group may be obtained in other ways.
In a possible implementation manner, the similarity between every two arbitrary IP groups in the adjacent preset period may be directly calculated for the plurality of IP groups obtained in step 120. Optionally, the similarity between any two IP groups may be calculated according to the IP addresses in the any two IP groups and/or the second access characteristic of each IP group.
When calculating the similarity according to the IP addresses in any two IP groups, a ratio between the number of the same IP addresses in the two IP groups and the number of the IP addresses included in the two IP groups (the repeated IP addresses are calculated as one) may be calculated, and the calculated ratio is used as the similarity between any two IP groups.
Such asAssuming that the ith period and the jth period are two adjacent preset periods, when calculating the similarity between the jth IP group of the ith period and the mth IP group of the nth period, the above calculation process may be represented as:
Figure BDA0003239403150000101
where sim (ij, nm) represents the similarity, Set (ip)ij) An IP address in the jth IP group representing the ith cycle, that is, a set of IP addresses; set (ip)nm) Indicating the IP address in the mth IP group of the nth cycle.
When the similarity is calculated according to the respective second access characteristics of any two IP groups, the second access characteristic of each IP group may be calculated according to the first access characteristic of each IP address in each IP group. Then, the similarity between the second access characteristics of the two IP groups is calculated as the similarity of any two IP groups.
The second access characteristic of one IP group is the mean and/or variance of at least one characteristic dimension in the first access characteristic corresponding to each IP address in the IP group. That is, the mean and/or variance of the characteristic values in at least one characteristic dimension in the first access characteristic of each IP address included for the IP group.
For example, when the first access characteristic includes a URL depth entropy value, a mean and/or a variance of the URL depth entropy values of each IP address in an IP group may be calculated as a second access characteristic of the IP group; the mean and/or variance of the first access characteristic may also be directly calculated as the second access characteristic of the IP group. In the case where an IP group is a cluster, the second access characteristic of the IP group corresponds to the characteristic vector of the cluster.
When calculating the similarity between the jth IP group in the ith period and the mth IP group in the nth period, the process of calculating the similarity according to the second access characteristic may be represented as: sim (ij, nm) cos (feat)ij,featnm) Therein, featijA second access characteristic in a jth IP group representing an ith cycle; featnmA second access characteristic representing an mth IP group of the nth period.
The similarity may also be calculated according to the IP addresses in any two IP groups and the second access characteristic of each IP group:
Figure BDA0003239403150000111
it should be noted that the above calculation method of the similarity is only an example, and the similarity between two IP groups may also be obtained through other calculation methods.
In another possible implementation, the similarity used in step S150 may be obtained by the method shown in fig. 4. Referring to fig. 4, fig. 4 is a flowchart illustrating sub-steps included in step S130 in fig. 2. Step S130 may include substeps S131 through substep S133.
The substep S131 counts the number of IP addresses included in each IP group.
And a substep S132 of screening out IP groups whose number is greater than a third preset number.
And a substep S133 of calculating a similarity between every two arbitrary IP groups in adjacent preset periods with respect to the screened IP groups.
In this embodiment, the number of IP addresses included in each IP group may be counted, and then the number may be compared with a third preset number; if the number is larger than the third preset number, the IP group is reserved; if the number is not greater than the third preset number, the IP group may be rejected. Wherein the third preset number may be set in combination with actual conditions, for example, set to 10. Thus, the IP groups with the number of IP addresses included in the divided IP groups larger than the third preset number can be screened out. Then, for the screened IP groups, the similarity between any two IP groups can be calculated according to the IP addresses in any two IP groups and/or the second access characteristics of each IP group in the adjacent preset period. For the description of the manner of calculating the similarity according to the specific IP group, reference may be made to the above description, which is not repeated herein.
Since an attacker cannot achieve the purpose of attack by adopting a small amount of machines to manufacture low-frequency multi-IP attack behaviors, the IP group with the smaller number of IP addresses is more likely to be the normal IP address, and thus the IP group with the smaller number of IP addresses can be directly filtered out. In this manner, the amount of calculation of the subsequent process can be reduced.
After the similarity is calculated, the IP group sequences can then be found. Optionally, the IP group sequence may be found according to a preset similarity and the calculated similarity.
For example, as shown in fig. 3, it is assumed that the IP groups for which the similarity is calculated are C _1_0 to C _1_4 for the 1 st 10s, C _2_0 to C _2_3 for the 2 nd 10s, C _3_0 to C _3_2 for the 3 rd 10s, C _4_0 to C _4_5 for the 4 th 10s, and C _5_0 to C _5_2 for the 5 th 10 s.
The feature similarity calculation can be performed based on the time sequence for the IP group: calculating the similarity of each IP group of the 1 st 10s and each IP group in the 2 nd 10s, calculating the similarity of each IP group of the 2 nd 10s and each IP group in the 3 rd 10s, calculating the similarity of each IP group of the 3 rd 10s and each IP group in the 4 th 10s, and calculating the similarity of each IP group of the 4 th 10s and each IP group in the 5 th 10 s.
Assuming that the preset similarity is 0.87, based on the similarity calculated in the above manner, the similarity may be compared with 0.87, so as to determine that the IP groups with the adjacent preset periods and the similarity greater than 0.87 form an IP group sequence. For example, since the similarity between C _1_0 and C _2_1 is greater than 0.87 and the associated preset periods are adjacent, the similarity between C _2_0 and C _3_1 is greater than 0.87 and the associated preset periods are adjacent, the similarity between C _3_1 and C _4_0 is greater than 0.87 and the associated preset periods are adjacent, it can be determined that an IP group sequence is: c _1_0, C _2_1, C _3_1, C _4_ 0.
When the time sequence is extended in the above manner, if the similarity between one IP group and a plurality of IP groups in another adjacent preset period is greater than 0.87, a plurality of IP group sequences can be determined. For example, if the similarity between C _1_0 and C _2_0 is also greater than 0.87, but the similarity between C _2_0 and each IP group in the 3 rd 10s is less than 0.87, it can be determined that one IP group sequence is: c _1_0, C _2_ 0.
If the similarity between one IP group and a plurality of adjacent IP groups in another predetermined period is greater than 0.87, the time sequence extension may be performed only according to the maximum similarity. For example, if the similarity 1 between C _1_0 and C _2_0, the similarity 2 between C _1_0 and C _2_1 are all greater than 0.87, the similarity 2 is greater than the similarity 1, and the similarity between C _2_0, C _2_1 and each IP group in the 3 rd 10s is less than 0.87, then only one IP group sequence may be determined as: c _1_0, C _2_ 1. Thus, the calculation is convenient.
Under the condition that the IP group sequence is determined, aiming at each IP group sequence, obtaining a comprehensive access characteristic set of the IP group sequence according to access description information of each IP address in the IP group sequence in a plurality of preset periods to which the IP group sequence belongs, and further identifying whether the IP address in the IP group sequence is an abnormal low-frequency IP address or not on the basis of the comprehensive access characteristic set. The preset periods to which one IP group sequence belongs are preset periods corresponding to access record sets used when each IP group in the IP group sequence is divided. For example, as shown in fig. 3, the IP group sequence: c _1_0, C _2_1, C _3_1 and C _4_0, wherein the plurality of periods are 10s from 1 st to 4 th; IP group sequence: the plurality of periods C _1_2, C _2_3, C _3_2, and C _4_2 are 10s from 1 st to 5 th.
Optionally, the integrated access feature set may include the first feature and/or the second feature. The first and second features may be obtained as follows.
And aiming at each IP group sequence, obtaining the first characteristics of the whole IP group sequence in a plurality of preset periods according to the access description information of the IP group sequence in the plurality of preset periods. Thus, an IP group sequence is regarded as an IP, the access description information corresponding to the IP group sequence is the access description information of the IP, and feature extraction can be performed from the access description information corresponding to the IP group sequence, so as to obtain the first feature. The step of feature extraction is similar to the process of obtaining the first access feature.
Wherein the first feature may include: mean/variance of received or transmitted quantities, etc., state code entropy/percentage, etc. The average value of the transmission amount in the first characteristic represents an average value of the message sizes of the transmission messages of the whole corresponding IP group sequence in a plurality of preset periods. The feature dimension included in the first feature may be specifically set according to actual requirements, as long as the access feature of the whole IP group sequence in the multiple preset periods to which the whole IP group sequence belongs can be embodied, and then an abnormal IP address can be identified.
And aiming at each IP group sequence, obtaining second characteristics related to multiple IP addresses in the IP group sequence according to the access description information of each IP address in the IP group sequence in a plurality of preset periods to which the IP group sequence belongs and each IP address included in the IP group sequence. In this manner, the sequence of IP groups can be viewed as access by multiple IPs, thereby extracting features related to multiple IPs.
Wherein the second feature may include: IP entropy/percentage, IP percentage of IDC (Data Center) class, IP percentage where IP and Cookie are bijections, etc. The IP percentage represents the ratio of the access times of each IP address to the total access times of each IP address in the IP group sequence in a plurality of preset periods to which the IP group sequence belongs. The IP percentage of the IDC class represents the ratio of the number of IP addresses belonging to the IDC class to the number of IP addresses that co-occur in the sequence of the IP group (when the number is counted, the same plurality of IP addresses count to 1). IDC denotes an internet data center. The IP and the Cookie are bijective IPs, and represent the IP addresses of which the IP addresses correspond to the Cookie one by one. The feature dimension included in the second feature may be specifically set according to actual requirements, as long as the access features related to multiple IPs in multiple preset periods to which the IP group sequence belongs can be embodied, and then the abnormal IP address can be identified.
Optionally, as a first optional implementation manner, a comprehensive feature access set of each IP group sequence may be directly obtained for each IP group sequence in the above manner, and further, based on the comprehensive feature access set, whether the low-frequency IP addresses included in the IP group sequence are all normal IP addresses or abnormal IP addresses may be identified in an arbitrary manner. Optionally, the comprehensive access feature set and a classification model trained in advance based on the sample comprehensive access feature set and a corresponding sample classification solution result may be used for identification.
Optionally, as a second optional implementation manner, the identification of the abnormal IP address may also be performed in a manner shown in fig. 5. Referring to fig. 5, fig. 5 is a flowchart illustrating one of the sub-steps included in step S150 in fig. 2. Step S150 may include sub-step S151 to sub-step S153.
And a substep S151 of determining, for each IP group sequence, an IP group subsequence from the IP group sequence according to a first preset number.
And each two adjacent IP groups in one IP group subsequence correspond to two adjacent preset periods, and the number of the preset periods to which one IP group subsequence belongs is the first preset number. In this way, IP group subsequences of fixed time length can be obtained. The first preset number is greater than or equal to 2, and may be specifically set according to an actual requirement, for example, set 3.
As shown in fig. 3, an IP group sequence is: c _1_0, C _2_1, C _3_1, C _4_0, at least one IP group subsequence may be determined from the IP group sequence by a first predetermined number 3. Alternatively, the IP group subsequences may be determined in a step size of 1: c _1_0, C _2_1 and C _3_ 1; c _2_1, C _3_1 and C _4_ 0. The IP group sub-sequence may also be determined in such a way that no repeated IP groups are included in the determined IP group sub-sequence, for example, for the above example, only one IP group sub-sequence may be determined: c _1_0, C _2_1 and C _3_ 1. The specific mode can be set according to actual requirements.
Optionally, in order to make each IP group in the IP group sequence accessible for subsequent identification, when determining the IP group subsequence, a complementary manner may be adopted to divide the subsequence. For example, for the IP group sequence: c _1_0, C _2_1, C _3_1, C _4_0, which can be divided into IP group subsequences: c _1_0, C _2_1 and C _3_ 1; c _4_0, 0.
And a substep S152, obtaining the comprehensive access characteristics of each IP group subsequence.
Under the condition that the IP group subsequences are determined, aiming at each IP group subsequence, the comprehensive access characteristics of the IP group subsequence can be obtained according to the access description information and the IP address corresponding to the IP group subsequence. The comprehensive access characteristics of one IP group subsequence comprise first characteristics of the whole IP group subsequence in a plurality of preset periods and/or second characteristics related to multiple IP addresses in the IP group subsequence. For the description of the obtaining process of the first feature and the second feature, reference may be made to the above description, which is not repeated herein. The comprehensive access feature set of one IP group sequence comprises comprehensive access features of at least one IP group subsequence, and the at least one IP group subsequence is a subsequence of the IP group sequence.
And a substep S153, determining whether the IP address in the IP group subsequence is an abnormal IP address according to the comprehensive access characteristics of each IP group subsequence.
In the case of obtaining the comprehensive access characteristic of one IP group subsequence, it is possible to identify whether the low-frequency IP addresses included in the IP group subsequence are all normal IP addresses or abnormal IP addresses in an arbitrary manner based on the comprehensive access characteristic.
Optionally, it may be determined that the IP address in the IP group subsequence is an abnormal IP address or a normal IP address according to the comprehensive access feature and a pre-trained classification model, where the classification model is obtained by training according to the sample comprehensive access feature of the sample IP group subsequence and a sample classification result.
When the classification model is obtained through training, whether the training is finished or not can be determined according to whether the recall rate of the model meets the requirement or not. Of course training may be based on other criteria as well. The classification model may be Catboost. The classification model may be obtained by pre-training the electronic device 100, or may be obtained by pre-training other devices, and is not specifically limited herein.
Optionally, if an IP address is in a different IP group subsequence or a different IP group subsequence, and the identification result obtained by the IP address based on the identification is different, the final identification result of the IP address may be determined by combining actual requirements. For example, if the IP group subsequence 1 in which an IP address is located is identified as normal, the IP group subsequence 2 in which the IP address is located is identified as abnormal, and the application scenario is more concerned not to identify a normal IP address as an abnormal IP address, the IP address can be finally determined to be a normal IP address.
Therefore, the problem that the reliability of the result of whether the IP address in the IP group sequence is abnormal is low due to the fact that the difference between the number of the periods corresponding to the features used in the model training and the number of the periods corresponding to the currently obtained features is large when the model is used for identification can be avoided.
As a third alternative, the identification of the abnormal IP address may also be performed in the manner shown in fig. 6. Referring to fig. 6, fig. 6 is a second schematic flowchart illustrating the sub-steps included in step S150 in fig. 2. Step S150 may include sub-steps S1501 to S1503.
Substep S1501, obtaining the period number of the preset period to which each IP group sequence belongs;
substep S1502, screening out IP group sequences with a period number greater than a second preset number;
and a substep S1503, obtaining the comprehensive access feature set of each screened IP group sequence, and identifying whether the IP address in the IP group sequence is an abnormal IP address according to the comprehensive access feature set of each screened IP group sequence.
In this embodiment, for each IP group sequence, the number of cycles of the preset period to which the IP group sequence belongs, that is, the number of IP groups may be counted, and the preset period corresponding to each IP group is different; then, comparing the number of cycles with a second preset number; if the period number is larger than the second preset number, the IP group sequence is reserved; if the number of cycles is not greater than the second predetermined number, the IP group sequence may be proposed. Wherein the second preset number may be set in combination with actual conditions, for example, set to 3. Therefore, IP group sequences with the period number larger than the third preset period number can be screened out, then, a comprehensive access characteristic set of the IP group sequences can be obtained through characteristic extraction aiming at each screened IP group sequence, and whether the IP addresses in the IP group sequences are abnormal IP addresses or not is identified according to the comprehensive access characteristic set.
Because the attack behavior of the same attacker has certain continuity, if the number of continuous periods is reduced, namely the time line is continued and disconnected, the attack purpose cannot be achieved, so that the IP group sequence with less period number can be larger and possibly be a normal IP address, and the IP group sequence with less period number can be directly filtered. In this manner, the amount of calculation of the subsequent process can be reduced.
Under the condition that the IP group sequences with the period number larger than the second preset number are screened out, for each IP group sequence, as in the first implementation mode, the IP group sequence is directly used as a processing object to obtain the first characteristic and/or the second characteristic of the IP group sequence, and then based on the first characteristic and/or the second characteristic, the IP addresses in the IP group sequence are determined to be normal IP addresses or abnormal IP addresses.
In a second manner, the IP group subsequences with the first preset number of the preset period number are determined based on the IP group sequences, then the comprehensive access characteristics of each IP group subsequence are obtained, and whether the IP addresses in the corresponding IP group subsequences are all normal IP addresses or all abnormal IP addresses is identified according to the comprehensive access characteristics. In this embodiment, the first preset number is less than or equal to the second preset number.
Generally, the attack behaviors of the same attacker are consistent and have a certain continuity, that is, the attack cannot be performed only in a certain second, and no subsequent attack is performed. Some highly-sophisticated attackers employ many machines to bypass the existing various cloud defense systems (the existing defense systems mainly use single-IP request interception), and each machine only sends a few requests, so that the total amount of the requests is very high, and the same attack purpose is achieved.
The method and the device can group all low-frequency requests within a certain time range, and the IP addresses used by the low-frequency requests with similar access behaviors are gathered in the same group, so that a plurality of groups can be obtained; then, similarity calculation can be carried out between each group of adjacent time periods to obtain time sequence extension of different access behaviors; finally, feature extraction is performed on the time sequence extension of each access behavior, and classification is performed based on the extracted features, so that whether the access behavior is abnormal or not is judged. Therefore, whether the access request with low frequency is an abnormal access condition or not and whether the IP address used with low frequency is an abnormal IP address or not can be identified. When the access behavior using the abnormal IP address is discovered again later, the behavior can be intercepted, and the subsequent access can not be carried out, thereby achieving the purpose of protection. And extracting behavior characteristics aiming at the abnormal access behavior of the identified abnormal IP address, and then identifying whether the newly-appeared access behavior is the abnormal behavior or not based on the behavior characteristics.
In order to execute the corresponding steps in the above embodiments and various possible manners, an implementation manner of the abnormal IP address identifying apparatus 200 is given below, and optionally, the abnormal IP address identifying apparatus 200 may adopt the device structure of the electronic device 100 shown in fig. 1. Further, referring to fig. 7, fig. 7 is a block diagram illustrating an abnormal IP address identification apparatus 200 according to an embodiment of the present disclosure. It should be noted that the basic principle and the generated technical effect of the abnormal IP address recognition apparatus 200 provided in the present embodiment are the same as those of the above embodiments, and for the sake of brief description, no part of the present embodiment is mentioned, and corresponding contents in the above embodiments may be referred to. The abnormal IP address identifying apparatus 200 may include: a record obtaining module 210, a dividing module 220, a similarity calculation module 230, a sequence determination module 240, and an identification module 250.
The record obtaining module 210 is configured to obtain respective access record sets of the target domain name in a plurality of consecutive preset periods. And the access record set comprises an IP address used by the access equipment when accessed and access description information.
The dividing module 220 is configured to divide the low-frequency IP address in each access record set into at least one IP group according to the IP address and the access description information. Wherein, the first access characteristics corresponding to the access description information corresponding to the IP address in one IP group are similar.
The similarity calculating module 230 is configured to calculate a similarity between every two arbitrary IP groups in adjacent preset periods. And the corresponding preset periods of any two IP groups are different.
The sequence determining module 240 is configured to determine an IP group sequence according to the similarity and the time sequence of the multiple preset periods. Wherein, every two adjacent IP groups in one IP group sequence correspond to two adjacent preset periods.
The identifying module 250 is configured to obtain a comprehensive access feature set of each IP group sequence, and identify whether an IP address in each IP group sequence is abnormal according to the comprehensive access feature set of each IP group sequence.
Optionally, in this embodiment, the integrated access feature set includes a first feature and/or a second feature, and the identifying module 250 obtains the integrated access feature set of each IP group sequence by: aiming at each IP group sequence, obtaining a first characteristic of the whole IP group sequence in a plurality of preset periods according to the access description information of the IP group sequence in the plurality of preset periods; and obtaining second characteristics related to the multiple IP addresses in the IP group sequence according to the access description information of each IP address in the IP group sequence in a plurality of preset periods to which the IP group sequence belongs and each IP address included in the IP group sequence.
Optionally, in this embodiment, the integrated access feature set includes an integrated access feature of at least one IP group subsequence, and the identifying module 250 is specifically configured to: for each IP group sequence, determining IP group subsequences from the IP group sequence according to a first preset number, wherein every two adjacent IP groups in one IP group subsequence correspond to two adjacent preset periods, and the number of the preset periods to which one IP group subsequence belongs is the first preset number; acquiring comprehensive access characteristics of each IP group subsequence, wherein the comprehensive access characteristics comprise first characteristics of the whole IP group subsequence in a plurality of preset periods and/or second characteristics related to multiple IP addresses in the IP group subsequence; and determining whether the IP address in each IP group subsequence is an abnormal IP address or not according to the comprehensive access characteristics of each IP group subsequence.
Optionally, in this embodiment, the identifying module 250 is specifically configured to: and determining that the IP address in the IP group subsequence is an abnormal IP address or a normal IP address according to the comprehensive access characteristics and a pre-trained classification model. And the classification model is obtained by training according to the sample comprehensive access characteristics of the sample IP group subsequence and the sample classification result.
Optionally, in this embodiment, the identifying module 250 is specifically configured to: acquiring the number of cycles of a preset period to which each IP group sequence belongs; screening IP group sequences with the cycle number larger than a second preset number; and acquiring the comprehensive access characteristic set of each screened IP group sequence, and identifying whether the IP address in the IP group sequence is an abnormal IP address according to the comprehensive access characteristic set of each screened IP group sequence.
Optionally, in this embodiment, the similarity calculation module 230 is specifically configured to: and calculating the similarity according to the IP addresses in any two IP groups and/or the second access characteristic of each IP group. The second access characteristic of one IP group is the mean and/or variance of at least one characteristic dimension in the first access characteristic corresponding to each IP address in the IP group.
Optionally, in this embodiment, the similarity calculation module 230 is specifically configured to: counting the number of IP addresses included in each IP group; screening out IP groups with the number larger than a third preset number; and calculating the similarity between every two arbitrary IP groups in the adjacent preset periods aiming at the screened IP groups.
Alternatively, the modules may be stored in the memory 110 shown in fig. 1 in the form of software or Firmware (Firmware) or may be fixed in an Operating System (OS) of the electronic device 100, and may be executed by the processor 120 in fig. 1. Meanwhile, data, codes of programs, and the like required to execute the above-described modules may be stored in the memory 110.
An embodiment of the present application further provides a readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for identifying an abnormal IP address.
To sum up, the embodiment of the present application provides a method, an apparatus, an electronic device and a readable storage medium for identifying an abnormal IP address, where first, respective access record sets of a target domain name in a plurality of consecutive preset periods are obtained; then, for each access record set, dividing the low-frequency IP addresses in the access record set into at least one IP group according to the IP address used by the access equipment when the target domain name is accessed by the access equipment and the corresponding access description information in the access record set, wherein the first access characteristics corresponding to the access description information corresponding to the IP address in one IP group are similar; then, the similarity between every two arbitrary IP groups in adjacent preset periods can be calculated, and an IP group sequence is determined according to the similarity and the time sequence of a plurality of continuous preset periods, wherein the preset periods corresponding to the arbitrary two IP groups are different, and the adjacent IP groups in one IP group sequence correspond to the two adjacent preset periods; and finally, obtaining a comprehensive access feature set of each IP group sequence, and identifying whether the IP address in the IP group sequence is an abnormal IP address or not based on the comprehensive access feature set. Therefore, the IP group sequence corresponding to the similar access behavior can be determined through the time sequence extension of the access behavior, and whether the IP address in the IP group sequence is abnormal or not is identified based on the comprehensive access characteristic set of the IP group sequence, so that the access request based on the low-frequency abnormal IP address is intercepted based on the identification result, and the server providing the service is protected.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The foregoing is illustrative of only alternative embodiments of the present application and is not intended to limit the present application, which may be modified or varied by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. An abnormal IP address identification method is characterized by comprising the following steps:
acquiring respective access record sets of the target domain name in a plurality of continuous preset periods, wherein the access record sets comprise IP addresses and access description information used by access equipment when being accessed;
dividing the low-frequency IP address in each access record set into at least one IP group according to the IP address and the access description information, wherein the first access characteristics corresponding to the access description information corresponding to the IP address in one IP group are similar;
calculating the similarity between every two arbitrary IP groups in adjacent preset periods, wherein the preset periods corresponding to the two arbitrary IP groups are different;
determining IP group sequences according to the similarity and the time sequence of the preset periods, wherein every two adjacent IP groups in one IP group sequence correspond to two adjacent preset periods;
and acquiring a comprehensive access feature set of each IP group sequence, and identifying whether the IP address in each IP group sequence is abnormal or not according to the comprehensive access feature set of each IP group sequence.
2. The method of claim 1, wherein the integrated access feature set comprises a first feature and/or a second feature, and wherein obtaining the integrated access feature set for each IP group sequence comprises:
aiming at each IP group sequence, obtaining a first characteristic of the whole IP group sequence in a plurality of preset periods according to the access description information of the IP group sequence in the plurality of preset periods;
and obtaining second characteristics related to the multiple IP addresses in the IP group sequence according to the access description information of each IP address in the IP group sequence in a plurality of preset periods to which the IP group sequence belongs and each IP address included in the IP group sequence.
3. The method according to claim 1 or 2, wherein the comprehensive access feature set includes comprehensive access features of at least one IP group subsequence, and the obtaining of the comprehensive access feature set of each IP group sequence and identifying whether the IP address in each IP group sequence is abnormal according to the comprehensive access feature set of each IP group sequence includes:
for each IP group sequence, determining IP group subsequences from the IP group sequence according to a first preset number, wherein every two adjacent IP groups in one IP group subsequence correspond to two adjacent preset periods, and the number of the preset periods to which one IP group subsequence belongs is the first preset number;
acquiring comprehensive access characteristics of each IP group subsequence, wherein the comprehensive access characteristics comprise first characteristics of the whole IP group subsequence in a plurality of preset periods and/or second characteristics related to multiple IP addresses in the IP group subsequence;
and determining whether the IP address in each IP group subsequence is an abnormal IP address or not according to the comprehensive access characteristics of each IP group subsequence.
4. The method of claim 3, wherein determining whether the IP address in each IP group subsequence is an abnormal IP address according to the integrated access characteristics of the IP group subsequence comprises:
and determining that the IP address in the IP group subsequence is an abnormal IP address or a normal IP address according to the comprehensive access characteristic and a pre-trained classification model, wherein the classification model is obtained by training according to the sample comprehensive access characteristic and the sample classification result of the sample IP group subsequence.
5. The method of claim 1, wherein the obtaining the comprehensive access feature set of each IP group sequence and identifying whether the IP address in the IP group sequence is an abnormal IP address according to the comprehensive access feature set of each IP group sequence comprises:
acquiring the number of cycles of a preset period to which each IP group sequence belongs;
screening IP group sequences with the cycle number larger than a second preset number;
and acquiring the comprehensive access characteristic set of each screened IP group sequence, and identifying whether the IP address in the IP group sequence is an abnormal IP address according to the comprehensive access characteristic set of each screened IP group sequence.
6. The method according to claim 1, wherein the calculating the similarity between any two IP groups in adjacent preset periods comprises:
and calculating to obtain the similarity according to the IP addresses in any two IP groups and/or the second access characteristics of each IP group, wherein the second access characteristics of one IP group are the mean value and/or the variance of at least one characteristic dimension in the first access characteristics corresponding to the IP addresses in the IP group.
7. The method according to claim 1 or 6, wherein the calculating the similarity between any two IP groups in adjacent preset periods comprises:
counting the number of IP addresses included in each IP group;
screening out IP groups with the number larger than a third preset number;
and calculating the similarity between every two arbitrary IP groups in the adjacent preset periods aiming at the screened IP groups.
8. An abnormal IP address recognition apparatus, comprising:
the device comprises a record obtaining module, a storage module and a processing module, wherein the record obtaining module is used for obtaining respective access record sets of a target domain name in a plurality of continuous preset periods, and the access record sets comprise IP addresses and access description information used by access equipment when being accessed;
the dividing module is used for dividing the low-frequency IP address in each access record set into at least one IP group according to the IP address and the access description information, wherein the first access characteristics corresponding to the access description information corresponding to the IP address in one IP group are similar;
the similarity calculation module is used for calculating the similarity between every two arbitrary IP groups in adjacent preset periods, wherein the preset periods corresponding to the two arbitrary IP groups are different;
the sequence determining module is used for determining IP group sequences according to the similarity and the time sequence of the preset periods, wherein every two adjacent IP groups in one IP group sequence correspond to two adjacent preset periods;
and the identification module is used for obtaining the comprehensive access feature set of each IP group sequence and identifying whether the IP address in the IP group sequence is abnormal or not according to the comprehensive access feature set of each IP group sequence.
9. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor to implement the method of identifying an abnormal IP address of any one of claims 1 to 7.
10. A readable storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the method of identifying an abnormal IP address according to any one of claims 1 to 7.
CN202111012528.7A 2021-08-31 2021-08-31 Abnormal IP address identification method and device, electronic equipment and readable storage medium Active CN113726783B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111012528.7A CN113726783B (en) 2021-08-31 2021-08-31 Abnormal IP address identification method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111012528.7A CN113726783B (en) 2021-08-31 2021-08-31 Abnormal IP address identification method and device, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN113726783A true CN113726783A (en) 2021-11-30
CN113726783B CN113726783B (en) 2023-03-24

Family

ID=78679729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111012528.7A Active CN113726783B (en) 2021-08-31 2021-08-31 Abnormal IP address identification method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN113726783B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114338205A (en) * 2021-12-31 2022-04-12 广州方硅信息技术有限公司 Target IP address obtaining method and device, electronic equipment and storage medium
CN115022011A (en) * 2022-05-30 2022-09-06 北京天融信网络安全技术有限公司 Method, device, equipment and medium for identifying missed scanning software access request
CN115086060A (en) * 2022-06-30 2022-09-20 深信服科技股份有限公司 Flow detection method, device and equipment and readable storage medium
CN116055182A (en) * 2023-01-28 2023-05-02 北京特立信电子技术股份有限公司 Network node anomaly identification method based on access request path analysis

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105530265A (en) * 2016-01-28 2016-04-27 李青山 Mobile Internet malicious application detection method based on frequent itemset description
CN106911697A (en) * 2017-02-28 2017-06-30 北京百度网讯科技有限公司 Access rights method to set up, device, server and storage medium
US20180097828A1 (en) * 2016-09-30 2018-04-05 Yahoo! Inc. Computerized system and method for automatically determining malicious ip clusters using network activity data
CN108173884A (en) * 2018-03-20 2018-06-15 国家计算机网络与信息安全管理中心 Based on network attack with the ddos attack population analysis method of behavior
CN109685376A (en) * 2018-12-26 2019-04-26 国家电网公司华中分部 A kind of power customer abnormal behaviour method for early warning based on similarity analysis theory
CN110933080A (en) * 2019-11-29 2020-03-27 上海观安信息技术股份有限公司 IP group identification method and device for user login abnormity
CN111371778A (en) * 2020-02-28 2020-07-03 中国工商银行股份有限公司 Attack group identification method, device, computing equipment and medium
CN112800419A (en) * 2019-11-13 2021-05-14 北京数安鑫云信息技术有限公司 Method, apparatus, medium and device for identifying IP group
CN112839014A (en) * 2019-11-22 2021-05-25 北京数安鑫云信息技术有限公司 Method, system, device and medium for establishing model for identifying abnormal visitor
CN113225325A (en) * 2021-04-23 2021-08-06 北京明略昭辉科技有限公司 IP (Internet protocol) blacklist determining method, device, equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105530265A (en) * 2016-01-28 2016-04-27 李青山 Mobile Internet malicious application detection method based on frequent itemset description
US20180097828A1 (en) * 2016-09-30 2018-04-05 Yahoo! Inc. Computerized system and method for automatically determining malicious ip clusters using network activity data
CN106911697A (en) * 2017-02-28 2017-06-30 北京百度网讯科技有限公司 Access rights method to set up, device, server and storage medium
CN108173884A (en) * 2018-03-20 2018-06-15 国家计算机网络与信息安全管理中心 Based on network attack with the ddos attack population analysis method of behavior
CN109685376A (en) * 2018-12-26 2019-04-26 国家电网公司华中分部 A kind of power customer abnormal behaviour method for early warning based on similarity analysis theory
CN112800419A (en) * 2019-11-13 2021-05-14 北京数安鑫云信息技术有限公司 Method, apparatus, medium and device for identifying IP group
CN112839014A (en) * 2019-11-22 2021-05-25 北京数安鑫云信息技术有限公司 Method, system, device and medium for establishing model for identifying abnormal visitor
CN110933080A (en) * 2019-11-29 2020-03-27 上海观安信息技术股份有限公司 IP group identification method and device for user login abnormity
CN111371778A (en) * 2020-02-28 2020-07-03 中国工商银行股份有限公司 Attack group identification method, device, computing equipment and medium
CN113225325A (en) * 2021-04-23 2021-08-06 北京明略昭辉科技有限公司 IP (Internet protocol) blacklist determining method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王倩等: "面向用户互联网访问日志的异常点击分析", 《中文信息学报》 *
王建等: "网络用户角色辨识及其恶意访问行为的发现方法", 《计算机科学》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114338205A (en) * 2021-12-31 2022-04-12 广州方硅信息技术有限公司 Target IP address obtaining method and device, electronic equipment and storage medium
CN114338205B (en) * 2021-12-31 2024-03-01 广州方硅信息技术有限公司 Target IP address acquisition method and device, electronic equipment and storage medium
CN115022011A (en) * 2022-05-30 2022-09-06 北京天融信网络安全技术有限公司 Method, device, equipment and medium for identifying missed scanning software access request
CN115022011B (en) * 2022-05-30 2024-02-02 北京天融信网络安全技术有限公司 Method, device, equipment and medium for identifying access request of missing scan software
CN115086060A (en) * 2022-06-30 2022-09-20 深信服科技股份有限公司 Flow detection method, device and equipment and readable storage medium
CN115086060B (en) * 2022-06-30 2023-11-07 深信服科技股份有限公司 Flow detection method, device, equipment and readable storage medium
CN116055182A (en) * 2023-01-28 2023-05-02 北京特立信电子技术股份有限公司 Network node anomaly identification method based on access request path analysis
CN116055182B (en) * 2023-01-28 2023-06-06 北京特立信电子技术股份有限公司 Network node anomaly identification method based on access request path analysis

Also Published As

Publication number Publication date
CN113726783B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
CN113726783B (en) Abnormal IP address identification method and device, electronic equipment and readable storage medium
US20220327409A1 (en) Real Time Detection of Cyber Threats Using Self-Referential Entity Data
CN108768943B (en) Method and device for detecting abnormal account and server
CN110431817B (en) Identifying malicious network devices
US10002144B2 (en) Identification of distinguishing compound features extracted from real time data streams
CN111355697B (en) Detection method, device, equipment and storage medium for botnet domain name family
US20140280075A1 (en) Multidimension clusters for data partitioning
US9674210B1 (en) Determining risk of malware infection in enterprise hosts
CN110457223B (en) Gray test drainage method, device, proxy server and readable storage medium
CN108366012B (en) Social relationship establishing method and device and electronic equipment
US20210281609A1 (en) Rating organization cybersecurity using probe-based network reconnaissance techniques
US10742668B2 (en) Network attack pattern determination apparatus, determination method, and non-transitory computer readable storage medium thereof
CN113992340A (en) User abnormal behavior recognition method, device, equipment, storage medium and program
CN113301017B (en) Attack detection and defense method and device based on federal learning and storage medium
CN110912933B (en) Equipment identification method based on passive measurement
CN111885011A (en) Method and system for analyzing and mining safety of service data network
CN115296904B (en) Domain name reflection attack detection method and device, electronic equipment and storage medium
CN107948022B (en) Identification method and identification device for peer-to-peer network traffic
CN113923039A (en) Attack equipment identification method and device, electronic equipment and readable storage medium
CN112560085B (en) Privacy protection method and device for business prediction model
CN114465816A (en) Detection method and device for password spray attack, computer equipment and storage medium
CN113568952A (en) Internet of things resource data analysis method
CN108133046B (en) Data analysis method and device
Liebovitch et al. Information flow dynamics and timing patterns in the arrival of email viruses
CN109063081B (en) NFS service monitoring method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant