CN110392032B - Method, device and storage medium for detecting abnormal URL - Google Patents

Method, device and storage medium for detecting abnormal URL Download PDF

Info

Publication number
CN110392032B
CN110392032B CN201810368224.6A CN201810368224A CN110392032B CN 110392032 B CN110392032 B CN 110392032B CN 201810368224 A CN201810368224 A CN 201810368224A CN 110392032 B CN110392032 B CN 110392032B
Authority
CN
China
Prior art keywords
url
access
attribute
record
host
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810368224.6A
Other languages
Chinese (zh)
Other versions
CN110392032A (en
Inventor
才宇东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201810368224.6A priority Critical patent/CN110392032B/en
Publication of CN110392032A publication Critical patent/CN110392032A/en
Application granted granted Critical
Publication of CN110392032B publication Critical patent/CN110392032B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application discloses a method, a device and a storage medium for detecting abnormal URL, belonging to the technical field of network security. The method comprises the following steps: determining a URL set accessed by a first host within a first preset time period, then selecting a URL from the URL set, determining a dynamic access attribute set of the selected URL, and performing anomaly detection on the selected URL according to the dynamic access attribute set of the selected URL. The dynamic access attribute set comprises a first attribute and a second attribute, the first attribute refers to the number of times that the first host accesses the selected URL within a first preset time period, and the second attribute refers to the number of times that the first host accesses the selected URL in the webpages pointed by all accessed URLs from the initialization time to the current time.

Description

Method, device and storage medium for detecting abnormal URL
Technical Field
The present application relates to the field of network security technologies, and in particular, to a method and an apparatus for detecting an abnormal Uniform Resource Locator (URL), and a storage medium.
Background
At present, a local area network is usually deployed in an enterprise, and each host in the local area network performs information interaction with an external internet through a gateway. In the process of information interaction, in order to avoid the host in the local area network from being attacked by hackers with vulnerabilities, it is generally necessary to deploy a security device to detect the URL accessed by the host, so as to determine whether the URL accessed by the host is an abnormal URL.
In the related art, a traffic collection device may be deployed at a gateway location. The flow acquisition equipment is network flow data acquisition equipment, and when the flow acquisition equipment is used, the flow passing through the flow acquisition equipment is subjected to operations such as screening and statistics according to configuration, and the obtained flow data is sent to a server through a standard or self-defined communication protocol for comprehensive analysis. In the existing network, a server for comprehensively analyzing traffic data is a network security intelligent System (CIS). The traffic collection device deployed at the gateway location may determine a URL accessed by the host in the local area network and send the determined URL to the CIS. When the CIS receives the URL sent by the flow acquisition equipment, a static access attribute set of the URL is determined, and a feature vector is generated according to the static access attribute set. The CIS then processes the feature vector of the URL through an anomaly detection model to determine if the URL is an anomalous URL. The static access attribute set comprises random entropy of the domain name, the number of continuous characters in the domain name, N-Gram (a language processing model) frequency of the domain name, national top-level domain name (ccTLD) of the domain name and the like. The anomaly detection model is obtained by training CIS through a plurality of known anomaly URLs in advance. For example, the training principles include: and determining a static access attribute set of each abnormal URL, generating a characteristic vector according to the static access attribute set of each abnormal URL, thereby obtaining characteristic vectors corresponding to the abnormal URLs one by one, and training the characteristic vectors to obtain an abnormal detection model.
In the above method, it is actually determined whether the URL is abnormal by detecting whether the domain name included in the URL is abnormal. Although the detection method is simple, the practicability is low, and some potential illegal network access behaviors cannot be found. Such as the web access behavior of the Advanced Persistent Threat (APT) penetration phase, or the behavior of revealing security data through backgating of web pages of legitimate websites. The above method is therefore not highly accurate.
Disclosure of Invention
In order to solve the problem of high error probability of determining an abnormal URL in the related art, the present application provides a method for detecting an abnormal URL, where the method includes:
the method comprises the steps of obtaining URL access records of a plurality of hosts in a local area network in a first preset time period, wherein the first preset time period is before the current time and the duration of the first preset time period is a first preset duration;
determining a URL set accessed by a first host within the first preset time period according to the URL access record, wherein URLs in the URL set are different from each other, and the first host is one of the hosts;
selecting a URL from the URL set, and executing the following processing aiming at the selected URL until each URL in the URL set is processed:
determining a dynamic access attribute set of the selected URL, wherein the dynamic access attribute set comprises a first attribute and a second attribute, the first attribute refers to the number of times that the first host accesses the selected URL within the first preset time period, and the second attribute refers to the number of times that the selected URL appears in the webpages pointed to by all accessed URLs of the first host from the initialization time and until the current time;
and carrying out anomaly detection on the selected URL according to the dynamic access attribute set of the selected URL and a first anomaly detection model, wherein the first anomaly detection model is obtained by training in advance according to a plurality of abnormal sample URLs accessed by the first host and the dynamic access attribute set of each abnormal sample URL in the plurality of abnormal sample URLs.
According to the method, URL access records of a plurality of hosts in a local area network in a first preset time period are obtained, a URL set accessed by the first host in the first preset time period is determined according to the URL access records, then a URL is selected from the URL set, a dynamic access attribute set of the selected URL is determined, and anomaly detection is conducted on the selected URL according to the dynamic access attribute set of the selected URL and a first anomaly detection model. The dynamic access attribute set comprises a first attribute and a second attribute, wherein the first attribute refers to the number of times that the first host accesses the selected URL within a first preset time period, and the second attribute refers to the number of times that the first host accesses the selected URL in the web pages pointed by all URLs from the initialization time to the current time, so that the dynamic access attribute set of the selected URL can represent the behavior characteristics of the first host accessing the selected URL. That is, in the present application, determining whether the selected URL is an abnormal URL is determined according to a behavior of the first host accessing the selected URL. In the detection process, the behavior characteristics of the host user accessing the URL are concerned, for example, the behavior relevance in the process of accessing a series of URLs is concerned, so that the detection result of the abnormal URL can show the specificity of different users, and is not determined only according to the static access attribute of the selected URL. Therefore, potential illegal network access behaviors can be found, and the detection accuracy of the abnormal URL is improved.
Optionally, the dynamic access attribute set of the selected URL further includes a third attribute and a fourth attribute, where the third attribute refers to a URL appearing in a web Page pointed to by the selected URL, and the fourth attribute refers to a web Page Rank (PR) of the web Page pointed to by the selected URL.
The third attribute and the fourth attribute are used as supplements, and whether the selected URL is an abnormal URL or not is comprehensively judged by combining the first attribute and the second attribute of the selected URL, so that the detection accuracy is further improved.
Optionally, the determining the dynamic access attribute set of the selected URL includes:
acquiring an access behavior information set of the first host, where the access behavior information set includes at least one access information record, each access information record in the at least one access information record corresponds to a URL, and URLs corresponding to different access information records are different, a first access information record corresponding to a first URL in the at least one access information record includes multiple access times of the first URL, a second attribute of the first URL, a third attribute of the first URL, and a fourth attribute of the first URL, the multiple access times refer to the times of accessing the first URL by the first host in multiple time periods, the multiple time periods are obtained by dividing time periods starting at an initialization time of the first host and ending at a current time, and a duration of each time period in the multiple time periods is a second preset duration, the first preset time length is greater than or equal to the second preset time length;
and determining a dynamic access attribute set of the selected URL according to the access behavior information set of the first host.
In the application, in order to improve efficiency of detecting URL anomalies, for a first host, an access behavior information set of the first host is predetermined, where the access behavior information set is used to record access information records corresponding to URLs visited by the first host, so that when a dynamic access attribute set of a certain URL needs to be determined subsequently, a first attribute, a second attribute, a third attribute, and a fourth attribute of the selected URL may be determined directly according to the access behavior information set of the first host.
Optionally, the determining a set of dynamic access attributes of the selected URL according to the set of access behavior information of the first host includes:
if the time for updating the access behavior information set last time is the current time, acquiring an access information record from the access behavior information set, wherein the URL corresponding to the acquired access information record is the selected URL;
searching for the access times of the corresponding time period within the first preset time period from the multiple access times included in the acquired access information record;
accumulating the searched access times to obtain a first attribute of the selected URL;
and generating a dynamic access attribute set of the selected URL according to the first attribute of the selected URL and the second attribute, the third attribute and the fourth attribute included in the acquired access information record.
Since the first attribute in the dynamic access attribute set refers to the number of times that the first host accesses the selected URL within the first preset time period, and the second attribute refers to the number of times that the first host appears in the web pages pointed by all accessed URLs from the initialization time to the current time, it is necessary to determine whether the time for updating the access behavior information set last time is the current time. If the time of last updating the access behavior information set is the current time, the dynamic access attribute set of the selected URL can be determined directly according to the access behavior information set.
Optionally, the determining a set of dynamic access attributes of the selected URL according to the set of access behavior information of the first host includes:
if the time for updating the access behavior information set last time is different from the current time, acquiring an incremental URL access record of the first host, wherein the incremental URL access record is a URL access record which is newly added between the time for updating the access behavior information set last time and the current time of the first host;
if the access behavior information set has an access information record corresponding to the selected URL, acquiring the access information record from the access behavior information set, wherein the URL corresponding to the acquired access information record is the selected URL;
determining the incremental access times of the first host accessing the selected URL from the time of updating the access behavior information set last time to the current time according to the incremental URL access record;
searching for the access times of the corresponding time period within the first preset time period from the multiple access times included in the acquired access information record;
accumulating the increment access times and the searched access times to obtain a first attribute of the selected URL;
and determining a dynamic access attribute set of the selected URL according to the first attribute of the selected URL, the incremental URL access record, and the second attribute, the third attribute and the fourth attribute included in the obtained access information record.
Accordingly, if the time of updating the access behavior information set last time is not the current time, at this time, an access information record corresponding to the selected URL may exist in the access behavior information set, or an access information record corresponding to the selected URL may not exist. When the access information record corresponding to the selected URL exists in the access behavior information set, the dynamic access attribute set of the selected URL may be determined in the above manner.
Optionally, the determining a dynamic access attribute set of the selected URL according to the first attribute of the selected URL, the incremental URL access record, and the second attribute, the third attribute, and the fourth attribute included in the obtained access information record includes:
if the URLs appearing in the incremental URL access records exist in URLs corresponding to the access information records included in the access behavior information set, generating a dynamic access attribute set of the selected URL according to the first attribute of the selected URL, the second attribute, the third attribute and the fourth attribute included in the obtained access information records;
if not all URLs appearing in the incremental URL access records exist in the URLs corresponding to the access information records included in the access behavior information set, obtaining at least one URL from the incremental URL access record, wherein the obtained at least one URL is different from the URL corresponding to the access information record included in the access behavior information set, determining a first number of times that the selected URL appears in the acquired webpage pointed by the at least one URL, determining a second attribute of the selected URL according to the first number of times and a second attribute included in the acquired access information record, and generating a dynamic access attribute set of the selected URL according to the first attribute and the second attribute of the selected URL and the third attribute and the fourth attribute included in the acquired access information record.
Further, when there is an access information record corresponding to the selected URL in the access behavior information set, if a new URL appears in the incremental URL access record, the second attribute of the selected URL is no longer the second attribute of the access information record corresponding to the selected URL, and therefore, the second attribute of the selected URL needs to be determined in the above manner.
Optionally, after obtaining the incremental URL access record of the first host, the method further includes:
if the access behavior information set does not have an access information record corresponding to the selected URL, determining the incremental access times of the first host accessing the selected URL from the time of updating the access behavior information set last time to the current time according to the incremental URL access record, and determining the incremental access times as a first attribute of the selected URL;
determining a second number of times that the selected URL appears in the webpage pointed by other URLs except the selected URL in the incremental URL access record;
determining a second attribute of the selected URL according to the second times and a third attribute of the URL corresponding to each access information record in the access behavior information set;
determining a third attribute and a fourth attribute of the selected URL according to the webpage pointed by the selected URL;
and generating a dynamic access attribute set of the selected URL according to the first attribute, the second attribute, the third attribute and the fourth attribute of the selected URL.
On the other hand, when there is no access information record corresponding to the selected URL in the access behavior information set, the dynamic access attribute set of the selected URL may be determined in the manner described above.
Optionally, before the obtaining the access behavior information set of the first host, the method further includes:
after the first host is initialized, judging whether the current time is preset updating time or not, wherein the preset updating time is the time which is reached after every second preset time after the first host is initialized;
if the current time is the preset updating time, acquiring an incremental URL access record of the first host, wherein the incremental URL access record is a URL access record newly added between the time when the access behavior information set is updated last time and the current time by the first host;
and updating the access behavior information set after the last update according to the incremental URL access record, wherein the obtained access behavior information set of the first host is the access behavior information set after the current time is updated.
In this application, since the network access behavior of the first host changes with time, after the first host is initialized, the access behavior information set may be updated every second preset duration.
Optionally, the updating, according to the incremental URL access record, the access behavior information set after the last update includes:
acquiring a second URL and the access times of the second URL from the incremental URL access record;
if the second URL already exists in the URL corresponding to the access information record included in the access behavior information set after the latest update, adding the access times of the second URL to the access information record corresponding to the second URL in the access behavior information set after the latest update;
and if the second URL does not exist in the URL corresponding to the access information record included in the access behavior information set after the latest update, performing incremental update on the access behavior information set after the latest update according to the second URL, the access times of the second URL and the incremental URL access record, so as to add the access information record corresponding to the second URL to the access behavior information set after the latest update, and realize the update of the second attribute of the URL corresponding to the access information record existing in the access behavior information set after the latest update.
When updating the access behavior information set after the last update, two cases need to be considered, one case is: the second URL in the incremental URL access record has a corresponding access information record already in the access behavior information set after the last update. The other situation is that: the second URL in the incremental URL access record has no corresponding access information record in the access behavior information set after the last update. That is, in the present application, how to update the access behavior information set after the last update needs to be considered for the two cases, respectively.
Optionally, the incrementally updating the access behavior information set after the latest update according to the second URL, the number of times of accessing the second URL, and the incremental URL access record includes:
determining a third time, wherein the third time is the time of the second URL appearing in the webpage pointed by other URLs except the second URL in the incremental URL access record;
determining a second attribute of the second URL according to the third times and a third attribute of the URL corresponding to each access information record in the access behavior information set after the latest update;
determining a third attribute and a fourth attribute of the second URL according to the webpage pointed by the second URL;
generating an access information record corresponding to the second URL according to the access times of the second URL, the second attribute, the third attribute and the fourth attribute of the second URL;
adding the access information record corresponding to the second URL to the access behavior information set after the latest update;
and updating the second attribute of the URL corresponding to each access information record in the access behavior information set after the latest update according to the third attribute of the second URL.
Further, when the second URL in the incremental URL access record does not have a corresponding access information record in the access behavior information set after the last update, the access behavior information set after the last update may be incrementally updated in the manner described above.
Optionally, after determining the dynamic access attribute set of the selected URL, the method further includes:
determining a static access attribute set of the selected URL, wherein the static access attribute set comprises random entropy of a domain name of the selected URL, the number of continuous characters in the domain name, N-Gram frequency of the domain name and cctLD of the domain name;
correspondingly, the performing anomaly detection on the selected URL according to the dynamic access attribute set of the selected URL and the first anomaly detection model includes:
performing anomaly detection on the selected URL according to the dynamic access attribute set, the static access attribute set and the first anomaly detection model of the selected URL;
the first anomaly detection model is obtained by training in advance according to a plurality of anomaly sample URLs accessed by the first host and a dynamic access attribute set and a static access attribute set of each anomaly sample URL in the plurality of anomaly sample URLs.
In the application, whether the URL is an abnormal URL or not can be comprehensively judged according to the selected dynamic access attribute set and the static access attribute set of the URL, and the accuracy of determining the abnormal URL is further improved.
In a second aspect, an apparatus for detecting an abnormal URL is provided, where the apparatus for detecting an abnormal URL has a function of implementing the behavior of the method for detecting an abnormal URL in the first aspect. The apparatus for detecting an abnormal URL includes at least one module, where the at least one module is configured to implement the method for detecting an abnormal URL provided in the first aspect.
In a third aspect, an apparatus for detecting an abnormal URL is provided, where the apparatus for detecting an abnormal URL includes a processor and a memory, and the memory is used to store a program that supports the apparatus for detecting an abnormal URL to execute the method for detecting an abnormal URL provided in the first aspect, and store data used to implement the method for detecting an abnormal URL provided in the first aspect. The processor is configured to execute programs stored in the memory. The operating means of the memory device may further comprise a communication bus for establishing a connection between the processor and the memory.
In a fourth aspect, there is provided a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform the method for detecting an anomalous URL of the first aspect described above.
In a fifth aspect, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method for detecting an anomalous URL as described in the first aspect above.
The technical effects obtained by the above second, third, fourth and fifth aspects are similar to the technical effects obtained by the corresponding technical means in the first aspect, and are not described herein again.
The beneficial effect that technical scheme that this application provided brought is:
according to the method, URL access records of a plurality of hosts in a local area network in a first preset time period are obtained, a URL set accessed by the first host in the first preset time period is determined according to the URL access records, then a URL is selected from the URL set, a dynamic access attribute set of the selected URL is determined, and anomaly detection is conducted on the selected URL according to the dynamic access attribute set of the selected URL and a first anomaly detection model. The dynamic access attribute set comprises a first attribute and a second attribute, wherein the first attribute refers to the number of times that the first host accesses the selected URL within a first preset time period, and the second attribute refers to the number of times that the first host accesses the selected URL in the web pages pointed by all URLs from the initialization time to the current time, so that the dynamic access attribute set of the selected URL can represent the behavior characteristics of the first host accessing the selected URL. According to the method and the device, whether the URL to be detected is the abnormal URL or not is judged according to the behavior characteristics of the first host accessing the URL to be detected and the abnormal detection model generated based on the prior network access behavior of the first host, and the accuracy of determining the abnormal URL can be improved.
Drawings
FIG. 1 is a schematic diagram of a system for detecting an abnormal URL according to an embodiment of the present disclosure;
FIG. 2 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of a method for detecting an abnormal URL according to an embodiment of the present disclosure;
FIG. 4 is a flowchart of a method for determining whether a selected URL is an abnormal URL according to an embodiment of the present disclosure;
FIG. 5 is a block diagram of an apparatus for detecting an abnormal URL according to an embodiment of the present disclosure;
fig. 6 is a block diagram of an apparatus of a processing module according to an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Before explaining the embodiments of the present application in detail, the application background of the present application will be described:
at present, the advent of APT has resulted in an unprecedented challenge for enterprise data security. Among them, APT is a "malicious commercial spy threat" that is created by a hacker aiming at stealing core data and aiming at network attacks and attacks launched by clients for a long time. From the analysis of the attack principle, the advanced performance of the APT attack is that an attacker needs to accurately collect the business process information of an attack object before launching the APT attack. In the process of collecting the business process information, an attacker can actively analyze the vulnerabilities of the application programs used by the attacked objects and use the vulnerabilities to implement attacks. Typical APTs exploit vulnerabilities of e-mail, instant messaging software, social networks or applications, etc. to attract users to access malicious web pages. And then, attacking the network where the user is located through the malicious codes embedded in the malicious webpage. For example, hackers initially often send phishing mails to certain specific employees as a source of attacks using APT techniques. Web resources such as web pages are typically identified by URLs. A URL is a compact representation of the location and access method to a resource available from the internet, and is the address of a standard resource on the internet. Each file on the internet has a unique URL that contains information that indicates the location of the file. How to detect malicious URLs to identify malicious web pages is therefore a primary task to prevent APT attacks.
The method for detecting the abnormal URL is applied to the APT scene, so that the abnormal URL can be quickly detected in the APT scene, and data of an enterprise is prevented from being threatened.
Fig. 1 is a schematic diagram of a system for detecting an abnormal URL according to an embodiment of the present application, and as shown in fig. 1, the system includes at least one host 101, a gateway 102, a traffic collection device 103, and a network security device 104. Each host 101 and the gateway 102 are connected in a wired or wireless manner for communication, the traffic collection device 103 is disposed in the gateway 102, and the traffic collection device 103 and the network security device 104 are connected in a wired or wireless manner for communication. Alternatively, the network security device may be a CIS.
The at least one host 101 is a host in a local area network, and each host 101 is connected with the external internet through a gateway 102 for information interaction. The traffic collection device 103 is configured to collect URLs visited by each host, periodically report the collected URLs to the network security device 104, and the network security device 104 detects which URLs are abnormal URLs.
The host 101 may be any communication device in a local area network, and the communication device may be a computer, a server, or the like. In addition, fig. 1 is only illustrated by taking 3 hosts as an example, and the number of hosts in the local area network is not limited in the present application.
It should be noted that the network security device is an anomaly detection system, and specifically, the network security device may extract key information from the acquired data, and process the extracted key information by using a big data analysis or machine learning method, so as to detect an abnormal behavior, accurately identify and defend an APT attack, and avoid the core information of an enterprise from being attacked.
At present, a network security device is composed of a plurality of modules, for example, a URL anomaly detection module, a hidden channel detection module, a mail anomaly detection module, a log analysis module, and the like. The modules can be distributed and deployed on different hosts or on the same host. The URL abnormity detection module included in the network security equipment is used for realizing detection of the abnormal URL provided by the embodiment of the application.
In addition, in this application, since it is necessary to determine whether a URL is an abnormal URL according to a behavior characteristic of a URL accessed by the host, and it is necessary to acquire a webpage to which the URL points to determine the behavior characteristic of the URL accessed by the host, in this application, the network security device further includes a webpage fetching (webAgent) module, where the webpage fetching module is configured to access the internet to acquire a webpage to which a specified URL points.
Optionally, for the sake of networking security, the URL anomaly detection module and the web page crawling module in the network security device are implemented by separate hosts. The URL abnormity detection module is in a local area network environment and cannot directly access an external network, and the firewall performs access control on the URL abnormity detection module so that the URL abnormity detection module can only access equipment in the local area network. The webpage grabbing module can access the extranet, and the firewall opens the extranet access permission of the webpage grabbing module, so that the webpage grabbing module can access the equipment in the extranet. The URL abnormity detection module and the webpage capture module are connected through a local network to carry out information interaction.
Fig. 2 is a schematic structural diagram of a computer device according to an embodiment of the present application. Any of the modules included in the network security device described above may be implemented by the computer device shown in fig. 2. Referring to fig. 2, the computer device comprises a processor 201, a communication bus 202, a memory 203 and at least one network interface 204.
The processor 201 may be a Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs according to the present disclosure.
The communication bus 202 may include a path that conveys information between the aforementioned components.
The Memory 203 may be a Read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these. The memory 203 may be self-contained and coupled to the processor 201 via the communication bus 202. The memory 203 may also be integrated with the processor 201.
Network interface 204, using any transceiver or the like, is used to communicate with other devices or communication Networks, such as ethernet, Radio Access Network (RAN), Wireless Local Area Network (WLAN), etc.
In particular implementations, processor 201 may include one or more CPUs, as one embodiment.
In particular implementations, a computer device may include multiple processors, as one embodiment. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
In particular implementations, a computer device may also include an output device and an input device, as one embodiment. An output device, which is in communication with the processor 201, may display information in a variety of ways. For example, the output device may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device is in communication with the processor 201 and may receive user input in a variety of ways. For example, the input device may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.
The computer device may be a general purpose computer device or a special purpose computer device. In a specific implementation, the computer device may be a desktop computer, a laptop computer, a network server, a Personal Digital Assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, a communication device, or an embedded device. The embodiment of the application does not limit the type of the computer equipment.
The memory 203 is used for storing program codes for executing the scheme of the application, and the processor 201 controls the execution. The processor 201 is used to execute program code stored in the memory 203. One or more software modules may be included in the program code.
Alternatively, when the URL abnormality detection module and the web crawling module in the network security device are implemented by separate hosts, the structure of each host is as shown in fig. 2. The network interface of the host used for realizing the function of the URL abnormity detection module is connected with the network interface of the host used for realizing the webpage capturing module through a special link so as to carry out information interaction. Or the network interface of the host for realizing the URL anomaly detection module and the network interface of the host for realizing the web page capture module are connected through an ethernet to perform information interaction, and the ethernet includes forwarding devices such as switches.
The following explains the method for detecting an abnormal URL according to the embodiment of the present application in detail:
fig. 3 is a flowchart of a method for detecting an abnormal URL according to an embodiment of the present application, where the method is applied to the network security device shown in fig. 1. As shown in fig. 3, the method comprises the steps of:
step 301: the method comprises the steps of obtaining URL access records of a plurality of hosts in a local area network in a first preset time period, wherein the first preset time period is before the current time, and the duration of the first preset time period is a first preset duration.
Because the flow acquisition equipment is deployed in the gateway, a plurality of hosts in the local area network need to perform information interaction with the external internet through the gateway. Therefore, the embodiment of the application can determine the URL access records of a plurality of hosts in the local area network within a certain period of time through the traffic collection device. Specifically, the traffic collection device may extract information in the network traffic in a certain time period through a mirror image or a light splitting link, so as to determine URL access records of a plurality of hosts in the local area network in the time period.
In a possible implementation manner, in the process of determining the URL access record by the traffic acquisition device, the traffic acquisition device may also report the URL access record to the network security device periodically with a third preset time as a cycle length. That is, the traffic collection and acquisition device reports the URL access record every other third preset time period, and the URL access record reported each time is used to describe URLs visited by the multiple hosts in the local area network within the corresponding third preset time period. And the network safety equipment stores the received URL access record when receiving the URL access record reported by the flow acquisition equipment. Therefore, the URL access records in different time periods are stored in the network security device.
Based on the above description, when the network security device determines that the abnormality detection needs to be performed on the URL access records of the first preset time period, the network security device may obtain URL access records of corresponding time periods within the first preset time period from the stored plurality of URL access records, so as to obtain URL access records of the plurality of hosts in the local area network within the first preset time period.
The third preset time period is a preset time period, and the third preset time period may be 1 second, 2 seconds, 3 seconds, or the like. For example, when the third preset duration is 1 second, the traffic collection device reports the URL access record every 1 second.
In addition, the first preset time period is also a preset time period, and the first preset time period may be 1 hour, 2 hours, 4 hours, or the like. For example, the end time of the first preset time period is the current time, and when the first preset time period is 1 hour, it indicates that it is currently necessary to detect an abnormal URL in URLs visited by multiple hosts in the local area network within the last 1 hour. When the first preset time is 4 hours, the detection of abnormal URLs in URLs visited by a plurality of hosts in the local area network within the last 4 hours is indicated.
It should be noted that the first preset time period may be greater than or equal to the third preset time period. At this time, when the network security device receives the URL access record reported by the traffic collection device, the received URL access record is directly stored. For example, the third preset time is 1 second, the first preset time is 4 hours, the traffic collection device reports the URL access record every 1 second, and the URL access record stored in the network security device is a URL access record corresponding to every 1 second before the current time. When the network security equipment needs to detect the abnormal URL, the URL access record of the corresponding time period within 4 hours before the current time is obtained from the stored URL access record.
Optionally, the first preset time period may also be less than the third preset time period. At this time, when the network security device receives the URL access record reported by the traffic collection device, the network security device may divide the received URL access record according to a preset time period, and store the divided URL access record. And the time length corresponding to the preset time period is less than the first preset time length. For example, the third preset time period is 12 hours, and the first preset time period is 4 hours. At this time, the traffic collection device reports the URL access record once every 12 hours, and when the network security device receives the URL access record reported by the traffic collection device, the network security device divides the received URL access record according to the time period of every 1 hour, and stores the divided URL access record. When the network security equipment needs to detect the abnormal URL, the URL access record of the corresponding time period within 4 hours before the current time is obtained from the stored URL access record.
After the network security device obtains the URL access records of the multiple hosts in the local area network within the first preset time period through step 301, the URL access records need to be processed to determine the URL sets that the hosts in the local area network access within the first preset time period, respectively, and then the abnormal URLs are detected through the URL sets that the hosts access within the first preset time period. In addition, since the network security device basically has the same implementation manner of detecting an abnormal URL according to the URL set accessed by each host within the first preset time period, in the following steps 302 to 303, a first host is taken as an example, where the first host is one of the multiple hosts.
Step 302: and determining URL sets accessed by the first host within a first preset time period according to the URL access records, wherein the URLs in the URL sets are different from each other.
In one possible implementation, the URL access record includes a plurality of pieces of record information, and each piece of record information includes a URL and a host identifier. The host identifier included in each piece of record information refers to an identifier of a host that accesses the URL included in each piece of record information. Therefore, it is possible to search for record information including a host identifier as a host identifier of the first host from the plurality of pieces of record information, count URLs included in the searched record information, and determine a URL set accessed by the first host within the first preset time period according to the counted URLs.
The implementation manner of determining the URL set accessed by the first host within the first preset time period according to the counted URLs is as follows: and eliminating the URLs which reappear in the counted URLs, and combining the eliminated residual URLs into a URL set which is accessed by the first host within a first preset time period. For example, the statistical URLs include URL1, URL1, URL2, URL3, URL4, URL4 and URL5, and it may be determined that repeated URLs in the statistical URLs include one URL1 and one URL4, and thus one URL1 and one URL4 in the statistical URLs may be removed, so that the remaining URLs after the removal are URL1, URL2, URL3, URL4 and URL5, and therefore, URL1, URL2, URL3, URL4 and URL5 may be combined into a set of URLs visited by the first host within the first preset time period.
Step 303: it is determined whether the URLs in the set of URLs are anomalous URLs.
Specifically, a URL is selected from the URL set, and with reference to fig. 4, the following three steps are performed for the selected URL until each URL in the URL set is processed. That is, for each URL in the set of URLs, the network security device may refer to steps 3031 and 3033 in fig. 4 to determine whether the URL is an abnormal URL.
Step 3031: acquiring an access behavior information set of the first host, wherein the access behavior information set comprises at least one access information record, each access information record in the at least one access information record corresponds to a URL, the URLs corresponding to different access information records are different, a first access information record corresponding to a first URL in the at least one access information record comprises a plurality of access times of the first URL, a second attribute of the first URL, a third attribute of the first URL and a fourth attribute of the first URL, the plurality of access times refer to a number of times the first host accesses the first URL over a plurality of time periods, the plurality of time periods are obtained by dividing the time period from the initialization time of the first host to the current time, and the time length of each time period in the plurality of time periods is a second preset time length, and the first preset time length is greater than or equal to the second preset time length.
The second preset time period is a preset time period, and the second preset time period may be 1 hour, 2 hours or 3 hours.
In the embodiment of the present application, in order to improve the accuracy of determining an abnormal URL, it is necessary to determine whether a selected URL is an abnormal URL by using a dynamic access attribute set of the selected URL. Further, in order to improve the efficiency of determining the dynamic access attribute set of the selected URL, an access behavior information set may be maintained in advance for the first host, so that the network security device may determine the dynamic access attribute set of the selected URL according to the access behavior information set.
For the convenience of the following description, the generation process of the access behavior information set is explained in detail first
It should be noted that, in the embodiment of the present application, in fact, one access behavior information set is maintained for each host, so that a dynamic access attribute set of a URL accessed by a certain host may be determined directly according to the access behavior information set of the host. Since the generation process of the access behavior information sets of the respective hosts is substantially the same, the following description will take the example of generating the access behavior information set of the first host. The generation process of the access behavior information set of the other host may refer to the generation process of the access behavior information set described below.
In this embodiment, when a second preset duration passes after the initialization of the first host, the access behavior information set of the first host may be generated. However, since the number of times of access to the URL and the second attribute are updated all the time, after the set of access behavior information of the first host is generated, the set of access behavior information of the first host needs to be updated every second preset time.
That is, after the first host is initialized, it is determined whether the current time is a predetermined update time, where the predetermined update time is a time that arrives after every second preset duration after the first host is initialized. If the current time is the preset updating time and the current time is the time which is reached after a second preset time length passes after the first host is initialized, the URL access record of the first host in the time can be obtained, and the access behavior information set of the first host is generated according to the obtained URL access record. If the current time is preset updating time and the current time is the time that the first host reaches after at least two second preset durations after the first host is initialized, acquiring an incremental URL access record of the first host, wherein the incremental URL access record is a URL access record newly added between the time that the first host updates the access behavior information set last time and the current time; and updating the access behavior information set after the last update according to the incremental URL access record. In this way, the access behavior information set of the first host acquired in step 3031 is the access behavior information set after the current time update.
The above can be explained in detail in two steps as follows:
(1) and when the first second preset time length is reached, generating an access behavior information set of the first host.
And when the first second preset time length is reached, determining the URL access record of the first host within the first second preset time length. And determining the URL accessed by the first host within the first and second preset time periods and the access times of each URL accessed within the time period according to the URL access record. And then generating an access information record corresponding to each URL according to the URL accessed by the first host in the period of time and the access times of each URL. And adding the generated access information record to the blank access behavior information set to obtain an access behavior information set of the first host.
The specific operation of generating the access information record corresponding to each URL according to the URL accessed by the first host in the period of time and the access frequency of each URL may be: and acquiring the webpage pointed by each URL accessed by the first host in the period of time, and determining the URL appeared in the webpage pointed by each URL. And for any URL accessed by the first host in the period of time, determining the number of times of the URL appearing in the webpages pointed by other URLs according to the URL appearing in the webpage pointed by each URL accessed in the period of time so as to obtain the second attribute of the URL. And generating an access information record corresponding to the URL according to the access frequency of the URL, the second attribute of the URL, the URL appearing in the webpage pointed by the URL and the PR of the webpage pointed by the URL. For other URLs that the first host accesses during this period, the access information record corresponding to the URL may also be generated in the manner described above, and will not be elaborated herein.
Optionally, after obtaining the URLs visited by the first host during this time, it may also be determined that a URL of depth 2 to a URL of depth M is associated with each URL, M being a positive integer greater than or equal to 2. At this time, the network security device generates the access behavior information set of the first host according to the URLs accessed by the first host in the period of time and the URLs with the depth of 2 to the depth of M associated with each URL. At this time, the implementation manner of generating the access behavior information set of the first host is as follows: and determining the occurrence times of the URL in the webpages pointed by other URLs to obtain a second attribute of the URL, wherein the URLs are accessed by the first host within a first and a second preset time length and any URL in the depth of 2 to the depth of M associated with each URL. And determining the access information record corresponding to the URL according to the access times of the URL, the second attribute of the URL, the URL appearing in the webpage pointed by the URL and the PR of the webpage pointed by the URL. The other URLs refer to URLs visited by the first host within the first and second preset time periods, and other URLs except the URL in the URLs with the depth of 2 to the depth of M associated with each URL.
It should be noted that, for any URL of URLs with a depth of 2 to a depth of M associated with each URL, when determining the number of visits of the URL, the number of visits of the URL within the first and second preset time periods is set to 0. In addition, in the embodiment of the present application, the URL with the depth of 1 is a URL directly accessed by the first host, the URL with the depth of i is a URL appearing in a webpage pointed by the URL with the depth of i-1, and i is a positive integer greater than or equal to 2 and less than or equal to M.
As can be seen from the above description, in the embodiment of the present application, the access behavior information set of the first host may be determined directly according to the URL directly accessed by the first host, and the access behavior information set of the first host may also be determined according to the URL directly accessed by the first host and the URL indirectly accessed by the first host, which is not specifically limited herein. The URL indirectly accessed by the first host refers to a URL with a depth of 2 to a depth of M associated with the URL directly accessed by the first host.
Further, taking M as 2 illustrates how to determine the access behavior information set of the first host according to the URL directly accessed by the first host and the URL indirectly accessed by the first host:
after the URL accessed by the first host within the first second preset time is obtained, the obtained URL is determined as the URL with the depth of 1. And acquiring a webpage pointed by each URL in the URLs with the depth of 1 by a webpage crawling module included in the network security equipment. And analyzing the URL appearing in the webpage pointed by the URL with the depth of 1 to obtain the URL with the depth of 2. And acquiring a webpage pointed by each URL in the URLs with the depth of 2 by a webpage crawling module included in the network security equipment. And then, analyzing the webpage pointed by each URL in the URLs with the depth of 2 to obtain the URLs with the depth of 1 and the URLs appeared in the webpage pointed by each URL in the URLs with the depth of 2. And determining the occurrence times of each URL in the URL with the depth of 1 and the URL in the URL with the depth of 2 in the webpages pointed by other URLs according to the URLs in the URL with the depth of 1 and the URLs in the URL with the depth of 2. And generating access information records corresponding to the URL with the depth of 1 and the URL with the depth of 2 according to the occurrence frequency of each URL in the URLs with the depth of 1 and the URLs with the depth of 2 in the webpages pointed by other URLs, the access frequency of each URL, the URL appeared in the webpage pointed by each URL, and the PR of the webpage pointed by each URL. And adding the generated access information record to the blank access behavior information set to obtain an access behavior information set of the first host.
It should be noted that, in the process of analyzing the URLs appearing in the web pages pointed by the URLs with the depth of 1 to obtain the URLs with the depth of 2, the URLs with the depth of 1 need to be filtered out, that is, when other URLs with the depth of 1 appear in the web pages pointed by the URLs with the depth of 1, the URLs with the depth of 1 appearing in the web pages cannot be used as the URLs with the depth of 2.
In addition, the following two implementation manners are provided for analyzing the web page pointed to by each URL in the URLs with the depth of 2 to obtain the URLs appearing in the web pages pointed to by the URLs with the depth of 1 and the URLs with the depth of 2:
the first implementation mode comprises the following steps: for any URL in the depth 1 URL and the depth 2 URL, all URLs appearing in the webpage pointed by the URL are directly determined as URLs appearing in the webpage pointed by the URL.
In a second implementation manner, to avoid that the URL corresponding to a certain access information record in the access behavior information set does not appear in the web pages pointed by the URLs corresponding to other access information records, that is, an isolated URL appears. And for any URL in the URL with the depth of 1 and the URL with the depth of 2, determining all URLs appearing in the webpage pointed by the URL, selecting the same URL as any URL in the URL with the depth of 1 and the URL with the depth of 2 from all URLs appearing in the webpage pointed by the URL, and determining the selected URL as the URL appearing in the webpage pointed by the URL. If the URL identical to any URL of the URL with the depth of 1 and the URL with the depth of 2 does not exist in the URLs appearing in the webpage pointed by the URL, the URL is also determined as an isolated RUL. In this case, the isolated URL is removed from the URLs with the depth of 1 and the URLs with the depth of 2, and only the access information record corresponding to each of the remaining URLs with the depth of 1 and the URLs with the depth of 2 needs to be determined.
For example, it is determined for the first time that the URLs of the first host access are a.com, b.com, and c.com, respectively, that is, the URLs with a depth of 1 are a.com, b.com, and c.com, respectively. The web pages pointed to by a.com, b.com and c.com are obtained, the following results are obtained:
com points to the web page where all URLs appear are: c.com, b.com and a.a.com;
com points to the web page where all URLs appear are: com and b.a.com;
com points to all URLs appearing in the web page are: com and c.a.com.
Thus, URLs with depth of 2 are obtained as a.a.com, b.a.com, and c.a.com, respectively.
And continuing to acquire the URLs appearing in the webpage pointed by each URL in the URLs with the depth of 2, and obtaining the following results:
com points to the web page where all URLs appear are: com and a.b.com;
com points to the web page where all URLs appear are: com and b.b.com;
com points to the web page where all URLs appear are: c.b.com and c.c.com.
To this end, the URLs with depth 1 and the URLs with depth 2 are obtained as follows: a.com, b.com and c.com, and a.a.com, b.a.com and c.a.com.
When determining the URL appearing in the web page pointed to by each URL in the URLs with the depth 1 and the URLs with the depth 2 according to the second implementation manner, the following results are obtained:
com points to a URL that appears in the web page: c.com, b.com and a.a.com;
com points to the web page where the URLs appear are: com and b.a.com;
com points to the web page with the respective URLs: com and c.a.com;
com points to the web pages where the URLs appear are: com;
com points to the web pages where the URLs appear are: c.com;
com points to a web page where no other URL appears.
Since no other URL appears in the web page pointed to by c.a.com, c.a.com is removed from the URLs with the depth of 1 to the depth of 2, and the URLs appearing in the web page pointed to by each URL in the remaining URLs with the depth of 1 to the depth of 2 are respectively:
com points to a URL that appears in the web page: c.com, b.com and a.a.com;
com points to the web page where the URLs appear are: com and b.a.com;
com points to the web page with the respective URLs: com;
com points to the web pages where the URLs appear are: com;
com points to the web pages where the URLs appear are: and C.com.
Thus, it can be determined that the second attribute of a.com is 1, the second attribute of b.com is 2, the second attribute of c.com is 3, the second attribute of a.a.com is 1, and the second attribute of b.a.com is 1.
Suppose PR of a.com directed web page is 5, PR of b.com directed web page is 6, PR of c.com directed web page is 1, PR of a.a.com directed web page is 5, PR of b.a.com directed web page is 6. Further, suppose the number of access times of a.com is 10, the number of access times of b.com is 12, and the number of access times of c.com is 13. For convenience of explanation, the access information record is represented as (URL, access frequency, second attribute, third attribute, and fourth attribute), and at this time, the access information record corresponding to the following URL can be obtained:
com access information is recorded as: (a.com, 10, (c.com, b.com and a.a.com), 1, 5);
com access information is recorded as: (b.com, 12, (c.com and b.a.com), 2, 6);
com access information is recorded as: (c.com, 13, (a.com), 3, 1);
com access information is recorded as: (a.a.com, 0, (b.com), 1, 5);
com access information is recorded as: (b.a.com, 0, (c.com), 1, 6).
And adding the obtained access information records corresponding to the URLs to a blank access behavior information set to obtain an access behavior information set of the first host.
It should be noted that the access information record and the access behavior information set are concepts defined in the embodiments of the present application. In practical applications, the access information record and the access behavior information set may also be replaced by other terms, for example, five-tuple information is used to replace the access information record, and the embodiment of the present application is not specifically limited herein.
The above process is used to illustrate how the set of access behavior information of the first host is generated when the first and second preset durations are reached. When a second predetermined duration is reached again, the access behavior information set may be updated by the following step (2). When the second preset time length is reached, the process of updating the access behavior information set which is generated and the subsequent process of updating the access behavior information set after the latest updating are basically the same. The following step (2) is illustrated as a process of updating the access behavior information set after the last update.
(2) After the first host is initialized, judging whether the current time is the preset updating time, if the current time is the preset updating time, obtaining an incremental URL access record of the first host, wherein the incremental URL access record refers to a URL access record newly added between the time of the first host updating the access behavior information set last time and the current time, and updating the access behavior information set after the last updating according to the incremental URL access record.
Since there may be multiple URLs in the incremental URL access record, and for any one of the multiple URLs, the process of updating the access behavior information set by the URL is substantially the same, in this embodiment of the present application, it is described how to update the access behavior information set of the first host according to the second URL by taking the second URL in the incremental URL record as an example.
Specifically, the second URL and the number of times of access of the second URL are acquired from the incremental URL access record, and then the access behavior information set after the last update is updated according to the second URL and the number of times of access of the second URL.
Since the second URL may or may not already exist in the URL corresponding to the access information record included in the access behavior information set after the last update. Therefore, there are the following two scenarios for updating the access behavior information set after the latest update according to the second URL and the number of accesses of the second URL:
scene one: if the second URL already exists in the URL corresponding to the access information record included in the access behavior information set after the latest update, adding the access times of the second URL to the access behavior information set after the latest updateAnd merging the access information record corresponding to the second URL.
For the scenario one, since the second URL already exists in the URL corresponding to the access information record included in the access behavior information set after the latest update, that is, the access information record corresponding to the second URL already exists in the access behavior information set after the latest update, it is only necessary to add the access frequency of the second URL to the access information record corresponding to the second URL.
Scene two: and if the second URL does not exist in the URL corresponding to the access information record included in the access behavior information set after the latest update, performing incremental update on the access behavior information set after the latest update according to the second URL, the access times of the second URL and the incremental URL access record, so as to add the access information record corresponding to the second URL to the access behavior information set after the latest update, and realize the update of the second attribute of the URL corresponding to the access information record already existing in the access behavior information set after the latest update.
For the second scenario, the second URL does not exist in the URL corresponding to the access information record included in the access behavior information set after the latest update, that is, there is no access information record corresponding to the second URL in the access behavior information set after the latest update. In this way, it is necessary to generate an access record corresponding to the second URL to add the access information record corresponding to the second URL to the access behavior information set after the last update. At this time, the process of updating the access behavior information set after the last update is referred to as incremental update.
Specifically, according to the second URL, the number of times of accessing the second URL, and the incremental URL access record, the implementation manner of performing incremental update on the access behavior information set after the latest update is as follows: and determining a third time, wherein the third time refers to the times of the second URL appearing in the webpage pointed by other URLs except the second URL in the incremental URL access record. And determining a second attribute of the second URL according to the third time and the third attribute of the URL corresponding to each access information record in the access behavior information set after the last update. And determining a third attribute and a fourth attribute of the second URL according to the webpage pointed by the second URL. And generating an access information record corresponding to the second URL according to the access times of the second URL, the second attribute, the third attribute and the fourth attribute of the second URL. And adding the access information record corresponding to the second URL to the access behavior information set after the last update.
According to the third number of times and the third attribute of the URL corresponding to each access information record in the access behavior information set updated last time, determining that the second attribute of the second URL is implemented in the following manner: and determining a fourth time number according to the third attribute of the URL corresponding to each access information record in the access behavior information set after the latest update, wherein the fourth time number is the occurrence frequency of the second URL in the webpages pointed by the URLs corresponding to all the access information records, and adding the determined fourth time number and the third time number to obtain the second attribute of the second URL.
In addition, the implementation manner of determining the third attribute and the fourth attribute of the second URL according to the webpage pointed by the second URL is as follows: and analyzing the URL appearing in the webpage pointed by the second URL to obtain a third attribute of the second URL, determining PR of the webpage pointed by the second URL, and determining PR of the webpage pointed by the second URL as a fourth attribute of the second URL.
In addition, since the second URL does not exist in the URL corresponding to the access information record included in the access behavior information set after the last update, it may indicate that the first host accesses the new URL, and since the new URL may affect the second attribute in the access information record included in the access behavior information set after the last update, the second attribute of the URL corresponding to the access information record already existing in the access behavior information set after the last update needs to be updated. Specifically, for any access information record in the access behavior information set after the last update, according to the third attribute of the second URL, it is determined whether the URL corresponding to the access information record is a URL appearing in the web page pointed by the second URL. And if so, adding 1 to the value corresponding to the second attribute included in the access information record to update the second attribute of the URL corresponding to the access information record. And if not, not updating the second attribute of the URL corresponding to the access information record.
Optionally, since the access behavior information set of the first host may also be determined according to the URL directly accessed by the first host and the URL indirectly accessed by the first host, in this case, in scene 2, in addition to the access information record corresponding to the second URL, an access information record corresponding to each of the URLs with the depth of 2 to the depth of M associated with the second URL needs to be generated, which is not elaborated herein. The URL indirectly accessed by the first host refers to a URL with a depth of 2 to a depth of M associated with the URL directly accessed by the first host.
In addition, since there may be access information records in the URLs with the depth of 2 to the depth of M associated with the second URL, which URLs in the URLs with the depth of 2 to the depth of M associated with the second URL have no corresponding access information records in the access behavior information set after the last update, before generating the access information record corresponding to each URL in the URLs with the depth of 2 to the depth of M associated with the second URL, it is determined which URLs in the URLs with the depth of 2 to the depth of M associated with the second URL have no corresponding access information records in the access behavior information set after the last update. And for the determined URLs, generating access information records corresponding to the URLs, and adding the generated access information records to the access behavior information set after the latest update.
The above process is used to explain how the access behavior information set of the first host is generated and updated, so that when the network security device needs to perform the anomaly detection on the URL accessed by the first host, the network security device first obtains the access behavior information set after the latest update, and determines the dynamic access attribute set of the URL to be detected according to the obtained access behavior information set.
Step 3032: and determining a dynamic access attribute set of the selected URL according to the access behavior information set of the first host.
Wherein the set of dynamic access attributes includes a first attribute and a second attribute. The first attribute refers to the number of times that the first host accesses the selected URL within a first preset time period, and the second attribute refers to the number of times that the first host appears in the web pages pointed by all accessed URLs from the initialization time to the current time. Further, the dynamic access attribute set may also include a third attribute and a fourth attribute. The third attribute refers to a URL appearing in the web page to which the selected URL points, and the fourth attribute refers to PR of the web page to which the selected URL points.
It should be noted that, no matter whether the dynamic access attribute set includes the third attribute and the fourth attribute, each access information record in the access behavior information set of the first host in step 3031 may include the number of accesses of the corresponding URL, the second attribute, the third attribute, and the fourth attribute. Optionally, if the dynamic access attribute set includes only the first attribute and the second attribute, the access information record in the access behavior information set of the first host may include only the number of accesses of the corresponding URL and the second attribute, and the embodiment of the present application is not specifically limited herein.
Step 3032 is described below with an example that each access information record in the access behavior information set of the first host includes the access frequency, the second attribute, the third attribute, and the fourth attribute of the corresponding URL, and the dynamic access attribute set includes the first attribute, the second attribute, the third attribute, and the fourth attribute.
Since the access behavior information set of the first host is in the process of being continuously updated, in step 3032, the following two implementation manners may be used to determine the dynamic access attribute set of the selected URL according to the access behavior information set of the first host:
first implementationAnd the method is applied to a scene that the time of updating the access behavior information set last time is the current time.
At this time, the access information record is acquired from the access behavior information set, and the URL corresponding to the acquired access information record is the selected URL. And searching the access times of the corresponding time period within a first preset time period from the multiple access times included in the acquired access information record, accumulating the searched access times, and taking the accumulated result as the first attribute of the selected URL. And generating a dynamic access attribute set of the selected URL according to the first attribute of the selected URL and the second attribute, the third attribute and the fourth attribute included in the acquired access information record.
For example, if the second preset time duration is 1 hour, and the first preset time duration is the latest 4-hour time duration, the access times within the latest 4 hours are directly searched from the multiple access times, so as to obtain the first attribute of the selected URL.
In practical applications, after the first host is initialized, the time for generating and updating the access behavior information set may be set to be an hour, for example, the second preset duration is 1 hour, and then the access behavior information set is set to be updated at 24 hours per day. At this time, the number of accesses in the access information record corresponding to the selected URL is also corresponding to the whole time period, for example, 00:00 to 1:00 corresponds to one access number, 1:00 to 2:00 corresponds to one access number, and the like.
Correspondingly, the time for the network security device to detect the abnormal URL may also be set to be an integral time, for example, the detection is performed every 4 hours from 0:00 of each day, at this time, the network security device may directly search the access times of the corresponding time period within the first preset time period from the multiple access times included in the obtained access information record.
Optionally, if the access information record includes a plurality of access times recorded according to the hour, but the current first time period to be detected is not a time period divided according to the hour, the access times of the corresponding time period in the first time period may still be obtained from the plurality of access times, and the sum of the obtained access times is used as the first attribute of the selected URL. For example, the multiple access times in the access information record of the selected URL correspond to the whole time, for example, 00:00 to 1:00 correspond to one access time, 1:00 to 2:00 correspond to one access time, and the like, the current first time period to be detected is 2:30 to 4:30, at this time, the access times corresponding to 3:00 to 4:00 are obtained from the multiple access times, and the first attribute of the selected URL is determined according to the obtained access times.
Second implementationAnd the method is applied to a scene that the time of the latest updating access behavior information set is different from the current time.
In this embodiment of the present application, when the time of updating the access behavior information set last time is different from the current time, it is determined that the dynamic access attribute set of the selected URL has the following two policies:
strategy one: and determining a dynamic access attribute set of the selected URL according to the access behavior information set and the incremental URL access record which are updated last time. The incremental URL access record refers to a URL access record which is newly added between the time when the access behavior information set is updated last time and the current time by the first host.
In the first strategy, the behavior of the network security device for detecting the abnormal URL and the behavior for updating the access behavior information set are two completely independent behaviors, if the time for updating the access behavior information set at the last time is different from the current time, only the selected dynamic access attribute set of the URL needs to be determined, and the access behavior information set does not need to be updated.
Specifically, the following two situations exist in the process of determining the dynamic access attribute set of the selected URL according to the access behavior information set and the incremental URL access record updated last time:
(1) and if the access behavior information set has the access information record corresponding to the selected URL. At this time, the third attribute and the fourth attribute of the selected URL can be directly obtained from the access information record corresponding to the selected URL. However, since the time of the last update of the access behavior information set is different from the current time, the first attribute and the second attribute of the selected URL need to be determined according to the incremental URL access record and the last update of the access behavior information set.
Specifically, according to the incremental URL access record, determining incremental access times of the first host accessing the selected URL from the time of updating the access behavior information set last time to the current time, searching for access times of a corresponding time period within a first preset time period from a plurality of access times included in the acquired access information record, and accumulating the incremental access times and the searched access times to obtain a first attribute of the selected URL. And determining a dynamic access attribute set of the selected URL according to the first attribute of the selected URL, the incremental URL access record and the second attribute, the third attribute and the fourth attribute included in the obtained access information record.
The implementation manner of determining the dynamic access attribute set of the selected URL according to the first attribute of the selected URL, the incremental URL access record, and the second attribute, the third attribute, and the fourth attribute included in the obtained access information record is as follows: and if the URLs appearing in the incremental URL access records exist in the URLs corresponding to the access information records included in the access behavior information set, generating a dynamic access attribute set of the selected URLs according to the first attribute of the selected URLs, the second attribute, the third attribute and the fourth attribute included in the obtained access information records. And if the URLs appearing in the incremental URL access records do not exist in the URLs corresponding to the access information records included in the access behavior information set, acquiring at least one URL from the incremental URL access records, wherein the acquired at least one URL is different from the URLs corresponding to the access information records included in the access behavior information set, and determining a first frequency, wherein the first frequency is the frequency of the selected URLs appearing in the webpage pointed by the acquired at least one URL. And determining the second attribute of the selected URL according to the first times and the second attribute included in the acquired access information record. And according to the first attribute of the selected URL and the second attribute of the selected URL, acquiring a third attribute and a fourth attribute included in the access information record, and generating a dynamic access attribute set of the selected URL.
When the URLs appearing in the incremental URL access records all exist in the URLs corresponding to the access information records included in the access behavior information set, the fact that the first host does not access a new URL between the time when the access behavior information set is updated last time and the current time is indicated, and then the second attribute of the selected URL is the second attribute of the access information record corresponding to the selected URL. Therefore, the dynamic access attribute set of the selected URL may be generated directly according to the first attribute of the selected URL, the second attribute, the third attribute, and the fourth attribute included in the acquired access information record. If the URLs appearing in the incremental URL access records do not all exist in the URLs corresponding to the access information records included in the access behavior information set, the indication that the first host accesses a new URL between the time when the access behavior information set is updated last time and the current time indicates that the selected URL may appear in the web page pointed by the accessed new URL, and therefore the second attribute of the selected URL may not be the second attribute included in the acquired access information record. At this time, the first number of times may be determined in the above manner, and the determined first number of times and the second attribute included in the acquired access information record may be added to obtain the second attribute of the selected URL.
Optionally, when the set of access behavior information of the first host is determined according to the URL directly accessed by the first host and the URL indirectly accessed by the first host, the implementation manner of determining the first number at this time is: determining URLs with the depth of 2 to M and related to each obtained at least one URL, not including URLs with access information records in the access behavior information set after the last update in the determined URLs, and counting the times of occurrence of the selected URLs in the obtained at least one URL and the web pages pointed by each URL with the depth of 2 to M and related to each obtained at least one URL to obtain a first time.
(2) And if the access behavior information set does not have the access information record corresponding to the selected URL. At this time, the incremental access times of the first host accessing the selected URL from the time when the access behavior information set was last updated to the current time may be determined according to the incremental URL access record, and the incremental access times may be determined as the first attribute of the selected URL. And determining a second time, wherein the second time refers to the times of the selected URL appearing in the webpage pointed by other URLs except the selected URL in the incremental URL access record. And determining the second attribute of the selected URL according to the second times and the third attribute of the URL corresponding to each access information record in the access behavior information set. And determining the third attribute and the fourth attribute of the selected URL according to the webpage pointed by the selected URL. And generating a dynamic access attribute set of the selected URL according to the first attribute, the second attribute, the third attribute and the fourth attribute of the selected URL.
That is, when there is no access information record corresponding to the selected URL in the access behavior information set, the first attribute, the second attribute, the third attribute, and the fourth attribute of the selected URL need to be determined in the above manner, so as to generate a dynamic access attribute set of the selected URL.
And (2) strategy two: and updating the access behavior information set after the last update according to the incremental URL access record, and determining the dynamic access attribute set of the selected URL according to the updated access behavior information set. Or determining the dynamic access attribute set of the selected URL according to the access behavior information set and the incremental URL access record after the last update, and updating the access behavior information set after the last update according to the incremental URL access record.
In the second strategy, the behavior when the network security device detects an abnormal URL and the behavior when updating the access behavior information set are not two completely independent behaviors. And if the time for updating the access behavior information set last time is different from the current time, finishing updating the access behavior information set while determining the dynamic access attribute set of the selected URL.
The implementation manner of updating the access behavior information set after the last update according to the incremental URL access record may refer to the process of updating the access behavior information set according to the determined incremental URL access record every second preset time, which is not described in detail herein.
In practical application, if the first preset time is an integral multiple of the second preset time and the time for the network security device to detect the abnormal URL each time is exactly a predetermined update time, the dynamic access attribute set of the selected URL may be determined by using the second policy.
If the first preset time length and the second preset time length do not satisfy the relationship, and the time for the network security device to detect the abnormal URL each time has no relationship with the preset updating time, a first policy may be adopted to determine the dynamic access attribute set of the selected URL.
Of course, in this case, policy two may also be employed to determine the set of dynamic access attributes for the selected URL. However, at this time, when the access behavior information set after the latest update is updated according to the incremental URL access record, for the second URL in the incremental URL access record, the access frequency in the access information record corresponding to the second URL may not be the access frequency corresponding to the second preset time length, and at this time, the increased access frequency of the second URL is still added to the access information record of the second URL. When the network security equipment updates the access behavior information set subsequently, the access times in the access information records corresponding to the second URL are sorted, so that the access times in the access information records corresponding to the second URL are the access times corresponding to each second preset time.
In addition, in step 3031 and step 3032, the dynamic access attribute set of the selected URL is determined according to the access behavior information set of the first host. Of course, optionally, the network security device may also directly generate the dynamic access attribute set of the selected URL according to the access record in the first preset time period and all URLs visited by the first host from the initialization time to the current time, where at this time, the network security device does not need to maintain an access behavior information set for the first host in advance. However, determining a dynamic access attribute set for a selected URL in this manner is inefficient.
Step 3033: and carrying out anomaly detection on the selected URL according to the dynamic access attribute set of the selected URL and a first anomaly detection model, wherein the first anomaly detection model is obtained by training in advance according to a plurality of anomaly sample URLs accessed by the first host and the dynamic access attribute set of each anomaly sample URL in the plurality of anomaly sample URLs.
After obtaining the dynamic access attribute set of the selected URL according to steps 3031 and 3032, anomaly detection may be performed on the selected URL by a first anomaly detection model.
Since the first anomaly detection model is obtained by training in advance according to the plurality of anomaly sample URLs visited by the first host and the dynamic access attribute set of each anomaly sample URL in the plurality of anomaly sample URLs, when the dynamic access attribute set of the selected URL is processed by the first anomaly detection model, the first anomaly detection model can directly output a detection result for the selected URL, wherein the detection result is used for indicating whether the selected URL is an anomaly URL or a normal URL. For example, the detection result is a probability value that the selected URL is an abnormal URL, and/or a probability value that the selected URL is a normal URL.
That is, in the embodiment of the present application, a first abnormality detection model is determined in advance for the first host. The implementation manner of determining the first anomaly detection model may be: a plurality of exception sample URLs visited by the first host and a dynamic visit attribute set of each exception sample URL are obtained. And training the initialized detection model through the dynamic access attribute set of each abnormal sample URL to obtain a first abnormal detection model. That is, in the embodiment of the present application, the first anomaly detection model is obtained by training a plurality of anomaly sample URLs previously accessed by the first host.
It should be noted that, when the dynamic access attribute set includes the first attribute and the second attribute, since the first attribute and the second attribute are attributes related to a time period, the time period used in determining the first attribute of the exception sample URL and the first time period are consistent at least in terms of time length. For example, the first preset time duration is 4 hours, that is, the abnormal URL in the latest 4 hours needs to be detected currently, and the time duration adopted when determining the first attribute of the abnormal sample URL is also 4 hours. The second attribute of the abnormal sample URL is also determined based on a time period after the initialization of the first host and up to the current time, which is the time when the second attribute of the abnormal sample URL is determined.
Optionally, in this embodiment of the present application, after determining the dynamic access attribute set of the selected URL, a static access attribute set of the selected URL may also be determined, and at this time, the implementation manner of performing anomaly detection on the selected URL according to the dynamic access attribute set of the selected URL and the first anomaly detection model is as follows: and carrying out anomaly detection on the selected URL according to the dynamic access attribute set, the static access attribute set and the first anomaly detection model of the selected URL.
That is, in the embodiment of the present application, it may also be comprehensively determined whether the selected URL is an abnormal URL according to the dynamic access attribute set and the static access attribute set of the selected URL, so as to further improve the accuracy of determining the abnormal URL.
The static access attribute set comprises random entropy of the domain name of the selected URL, the number of continuous characters in the domain name, N-Gram frequency of the domain name, ccTLD of the domain name and the like.
And determining a static access attribute set of the selected URL, namely determining a domain name included by the selected URL, and determining the random entropy of the domain name, the number of continuous characters in the domain name, the N-Gram frequency of the domain name and the cctLD of the domain name according to the domain name included by the selected URL to obtain the static access attribute set of the selected URL.
The random entropy of the domain name is used to describe the degree of randomness of the domain name, and the random entropy of the domain name can be determined by calculating the random entropy in the related art, which is not described in detail herein. N-Gram is a language processing model, also known as a string text analysis model. In this model, it is assumed that the occurrence of the Nth word is only related to the first N-1 words and not to any other words. The N-Gram frequency is the product of the probabilities of occurrence of the individual characters. ccTLD is a country-level top-level domain name, such as, for example, a country-level top-level domain name including. cn,. in,. jp,. kr,. tw,. hk,. sg,. vn,. cc, etc.
It should be noted that, when the selected URL is comprehensively judged to be an abnormal URL by the dynamic access attribute set and the static access attribute set of the selected URL, the first abnormality detection model is obtained by training in advance according to the dynamic access attribute set and the static access attribute set of each abnormal sample URL in the plurality of abnormal sample URLs and the plurality of abnormal sample URLs visited by the first host, so that the dynamic access attribute set and the static access attribute set of the URL that can be selected by the first abnormality detection model can detect whether the selected URL is an abnormal URL.
Optionally, in this embodiment of the application, for the selected URL, when the detection result of the selected URL is determined according to the above steps, the detection result of the selected URL may also be displayed, so that the administrator may determine whether the detection result of the selected URL is incorrect.
Further, if the manager determines that the detection result of the selected URL is incorrect, the manager may continue to train the first anomaly detection model through the dynamic access attribute set of the selected URL to optimize the first anomaly detection model.
In the embodiment of the application, URL access records of a plurality of hosts in a local area network in a first preset time period are obtained, a URL set accessed by the first host in the first preset time period is determined according to the URL access records, then a URL is selected from the URL set, a dynamic access attribute set of the selected URL is determined, and anomaly detection is performed on the selected URL according to the dynamic access attribute set of the selected URL and a first anomaly detection model. The dynamic access attribute set comprises a first attribute and a second attribute, wherein the first attribute refers to the number of times that the first host accesses the selected URL within a first preset time period, and the second attribute refers to the number of times that the first host accesses the selected URL in the web pages pointed by all URLs from the initialization time to the current time, so that the dynamic access attribute set of the selected URL can represent the behavior characteristics of the first host accessing the selected URL. According to the method and the device, whether the URL to be detected is the abnormal URL or not is judged according to the behavior characteristics of the first host accessing the URL to be detected and the abnormal detection model generated based on the prior network access behavior of the first host, and the accuracy of determining the abnormal URL can be improved.
Referring to fig. 5, an apparatus 500 for detecting an abnormal URL according to an embodiment of the present application is provided, and is applied to the network security device shown in fig. 1, as shown in fig. 5, the apparatus 500 includes a first obtaining module 501, a determining module 502, and a processing module 503:
a first obtaining module 501, configured to perform step 301 in fig. 3;
a determining module 502 for performing step 302 in fig. 3;
a processing module 503, configured to select a URL from the URL set, and execute the following processing for the selected URL until each URL in the URL set is processed:
determining a dynamic access attribute set of the selected URL, wherein the dynamic access attribute set comprises a first attribute and a second attribute, the first attribute refers to the number of times that the first host accesses the selected URL within a first preset time period, and the second attribute refers to the number of times that the first host accesses the selected URL from the initialization time and until the current time, and the selected URL appears in the webpage pointed by all accessed URLs;
and carrying out anomaly detection on the selected URL according to the dynamic access attribute set of the selected URL and a first anomaly detection model, wherein the first anomaly detection model is obtained by training in advance according to a plurality of abnormal sample URLs accessed by the first host and the dynamic access attribute set of each abnormal sample URL in the plurality of abnormal sample URLs.
Optionally, the dynamic access attribute set of the selected URL further includes a third attribute and a fourth attribute, the third attribute refers to a URL appearing in the web page pointed to by the selected URL, and the fourth attribute refers to the web page rank PR of the web page pointed to by the selected URL.
Alternatively, referring to fig. 6, the processing module 503 comprises an obtaining unit 5031 and a determining unit 5032:
an obtaining unit 5031 configured to perform step 3031 in fig. 4;
a determining unit 5032 configured to perform step 3032 in fig. 4.
Optionally, the determining unit 5032 is specifically configured to:
if the time for updating the access behavior information set last time is the current time, acquiring an access information record from the access behavior information set, wherein the URL corresponding to the acquired access information record is the selected URL;
searching the access times of the corresponding time period within a first preset time period from a plurality of access times included in the acquired access information record;
accumulating the searched access times to obtain a first attribute of the selected URL;
and generating a dynamic access attribute set of the selected URL according to the first attribute of the selected URL and the second attribute, the third attribute and the fourth attribute included in the acquired access information record.
Optionally, the determining unit 5032 comprises:
the first obtaining subunit is configured to obtain an incremental URL access record of the first host if the time for updating the access behavior information set last time is different from the current time, where the incremental URL access record is a URL access record added by the first host between the time for updating the access behavior information set last time and the current time;
a second obtaining subunit, configured to, if an access information record corresponding to the selected URL exists in the access behavior information set, obtain an access information record from the access behavior information set, where the URL corresponding to the obtained access information record is the selected URL;
the first determining subunit is used for determining the incremental access times of the first host for accessing the selected URL from the time of updating the access behavior information set last time to the current time according to the incremental URL access record;
the searching subunit is used for searching the access times of the corresponding time period within a first preset time period from the multiple access times included in the acquired access information record;
the accumulation subunit is used for accumulating the increment access times and the searched access times to obtain a first attribute of the selected URL;
and the second determining subunit is used for determining a dynamic access attribute set of the selected URL according to the first attribute of the selected URL, the incremental URL access record, and the second attribute, the third attribute and the fourth attribute included in the acquired access information record.
Optionally, the second determining subunit is specifically configured to:
if the URLs appearing in the incremental URL access records exist in the URLs corresponding to the access information records included in the access behavior information set, generating a dynamic access attribute set of the selected URLs according to the first attribute of the selected URLs, the second attribute, the third attribute and the fourth attribute included in the obtained access information records;
and if the URLs appearing in the incremental URL access records do not exist in URLs corresponding to the access information records included in the access behavior information set, acquiring at least one URL from the incremental URL access records, wherein the acquired at least one URL is different from the URLs corresponding to the access information records included in the access behavior information set, determining a first frequency, the first frequency is the frequency of the selected URLs appearing in the webpage pointed by the acquired at least one URL, determining a second attribute of the selected URL according to the first frequency and a second attribute included in the acquired access information records, and generating a dynamic access attribute set of the selected URL according to the first attribute and the second attribute of the selected URL and a third attribute and a fourth attribute included in the acquired access information records.
Optionally, the determining unit further includes:
a third determining subunit, configured to determine, according to the incremental URL access record, incremental access times for the first host to access the selected URL from the time when the access behavior information set is updated last to the current time if the access behavior information set does not have an access information record corresponding to the selected URL, and determine the incremental access times as a first attribute of the selected URL;
a fourth determining subunit, configured to determine a second number, where the second number is the number of times that the selected URL appears in the web page pointed by the other URL except the selected URL in the incremental URL access record;
a fifth determining subunit, configured to determine a second attribute of the selected URL according to the second number of times and a third attribute of the URL corresponding to each access information record in the access behavior information set;
a sixth determining subunit, configured to determine a third attribute and a fourth attribute of the selected URL according to the webpage to which the selected URL points;
and the seventh determining subunit is used for generating a dynamic access attribute set of the selected URL according to the first attribute, the second attribute, the third attribute and the fourth attribute of the selected URL.
Optionally, the apparatus 500 further comprises:
the judging module is used for judging whether the current time is preset updating time or not after the first host is initialized, wherein the preset updating time is the time which arrives after every second preset time after the first host is initialized;
the second acquisition module is used for acquiring an incremental URL access record of the first host if the current time is the preset updating time, wherein the incremental URL access record is a URL access record newly added between the time when the access behavior information set is updated last time and the current time by the first host;
and the updating module is used for updating the access behavior information set after the last updating according to the incremental URL access record, and the obtained access behavior information set of the first host is the access behavior information set after the current time is updated.
In the embodiment of the application, URL access records of a plurality of hosts in a local area network in a first preset time period are obtained, a URL set accessed by the first host in the first preset time period is determined according to the URL access records, then a URL is selected from the URL set, a dynamic access attribute set of the selected URL is determined, and anomaly detection is performed on the selected URL according to the dynamic access attribute set of the selected URL and a first anomaly detection model. The dynamic access attribute set comprises a first attribute and a second attribute, wherein the first attribute refers to the number of times that the first host accesses the selected URL within a first preset time period, and the second attribute refers to the number of times that the first host accesses the selected URL in the web pages pointed by all URLs from the initialization time to the current time, so that the dynamic access attribute set of the selected URL can represent the behavior characteristics of the first host accessing the selected URL. According to the method and the device, whether the URL to be detected is the abnormal URL or not is judged according to the behavior characteristics of the first host accessing the URL to be detected and the abnormal detection model generated based on the prior network access behavior of the first host, and the accuracy of determining the abnormal URL can be improved.
It should be noted that: in the apparatus for detecting an abnormal URL according to the foregoing embodiment, when detecting an abnormal URL, the foregoing division of each function module is merely used as an example, and in practical applications, the function distribution may be completed by different function modules according to needs, that is, the internal structure of the device is divided into different function modules, so as to complete all or part of the functions described above. In addition, the apparatus for detecting an abnormal URL and the method for detecting an abnormal URL provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.
In the above embodiments, the implementation may be wholly or partly realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., Digital Versatile Disk (DVD)), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above-mentioned embodiments are provided for the purpose of illustration and not limitation, and any modifications, equivalents, improvements, etc. made within the principles of the disclosure of the above-mentioned embodiments of the present application are intended to be included within the scope of the present application.

Claims (18)

1. A method of detecting an anomalous uniform resource locator, URL, the method comprising:
the method comprises the steps of obtaining URL access records of a plurality of hosts in a local area network in a first preset time period, wherein the first preset time period is before the current time and the duration of the first preset time period is a first preset duration;
determining a URL set accessed by a first host within the first preset time period according to the URL access record, wherein URLs in the URL set are different from each other, and the first host is one of the hosts;
selecting a URL from the URL set, and executing the following processing aiming at the selected URL until each URL in the URL set is processed:
determining a dynamic access attribute set of the selected URL, wherein the dynamic access attribute set comprises a first attribute and a second attribute, the first attribute refers to the number of times that the first host accesses the selected URL within the first preset time period, and the second attribute refers to the number of times that the selected URL appears in the webpages pointed to by all accessed URLs of the first host from the initialization time and until the current time;
and carrying out anomaly detection on the selected URL according to the dynamic access attribute set of the selected URL and a first anomaly detection model, wherein the first anomaly detection model is obtained by training in advance according to a plurality of abnormal sample URLs accessed by the first host and the dynamic access attribute set of each abnormal sample URL in the plurality of abnormal sample URLs.
2. The method of claim 1, wherein the set of dynamic access attributes of the selected URL further includes a third attribute and a fourth attribute, the third attribute referring to URLs appearing in web pages pointed to by the selected URL, and the fourth attribute referring to a web page rank PR of web pages pointed to by the selected URL.
3. The method of claim 2, wherein the determining the set of dynamic access attributes for the selected URL comprises:
acquiring an access behavior information set of the first host, where the access behavior information set includes at least one access information record, each access information record in the at least one access information record corresponds to a URL, and URLs corresponding to different access information records are different, a first access information record corresponding to a first URL in the at least one access information record includes multiple access times of the first URL, a second attribute of the first URL, a third attribute of the first URL, and a fourth attribute of the first URL, the multiple access times refer to the times of accessing the first URL by the first host in multiple time periods, the multiple time periods are obtained by dividing time periods starting at an initialization time of the first host and ending at a current time, and a duration of each time period in the multiple time periods is a second preset duration, the first preset time length is greater than or equal to the second preset time length;
and determining a dynamic access attribute set of the selected URL according to the access behavior information set of the first host.
4. The method of claim 3, wherein determining the set of dynamic access attributes for the selected URL based on the set of access behavior information for the first host comprises:
if the time for updating the access behavior information set last time is the current time, acquiring an access information record from the access behavior information set, wherein the URL corresponding to the acquired access information record is the selected URL;
searching for the access times of the corresponding time period within the first preset time period from the multiple access times included in the acquired access information record;
accumulating the searched access times to obtain a first attribute of the selected URL;
and determining a dynamic access attribute set of the selected URL according to the first attribute of the selected URL and the second attribute, the third attribute and the fourth attribute included in the acquired access information record.
5. The method of claim 3, wherein determining the set of dynamic access attributes for the selected URL based on the set of access behavior information for the first host comprises:
if the time for updating the access behavior information set last time is different from the current time, acquiring an incremental URL access record of the first host, wherein the incremental URL access record is a URL access record which is newly added between the time for updating the access behavior information set last time and the current time of the first host;
if the access behavior information set has an access information record corresponding to the selected URL, acquiring the access information record from the access behavior information set, wherein the URL corresponding to the acquired access information record is the selected URL;
determining the incremental access times of the first host accessing the selected URL from the time of updating the access behavior information set last time to the current time according to the incremental URL access record;
searching for the access times of the corresponding time period within the first preset time period from the multiple access times included in the acquired access information record;
accumulating the increment access times and the searched access times to obtain a first attribute of the selected URL;
and determining a dynamic access attribute set of the selected URL according to the first attribute of the selected URL, the incremental URL access record, and the second attribute, the third attribute and the fourth attribute included in the obtained access information record.
6. The method of claim 5, wherein determining the set of dynamic access attributes for the selected URL based on the first attribute of the selected URL, the incremental URL access record, the second attribute, the third attribute, and the fourth attribute included in the retrieved access information record comprises:
if the URLs appearing in the incremental URL access records exist in URLs corresponding to the access information records included in the access behavior information set, generating a dynamic access attribute set of the selected URL according to the first attribute of the selected URL, the second attribute, the third attribute and the fourth attribute included in the obtained access information records;
if not all URLs appearing in the incremental URL access records exist in the URLs corresponding to the access information records included in the access behavior information set, obtaining at least one URL from the incremental URL access record, wherein the obtained at least one URL is different from the URL corresponding to the access information record included in the access behavior information set, determining a first number of times that the selected URL appears in the acquired webpage pointed by the at least one URL, determining a second attribute of the selected URL according to the first number of times and a second attribute included in the acquired access information record, and generating a dynamic access attribute set of the selected URL according to the first attribute and the second attribute of the selected URL and the third attribute and the fourth attribute included in the acquired access information record.
7. The method of claim 5, wherein after obtaining the delta URL access record for the first host, further comprising:
if the access behavior information set does not have an access information record corresponding to the selected URL, determining the incremental access times of the first host accessing the selected URL from the time of updating the access behavior information set last time to the current time according to the incremental URL access record, and determining the incremental access times as a first attribute of the selected URL;
determining a second number of times that the selected URL appears in the webpage pointed by other URLs except the selected URL in the incremental URL access record;
determining a second attribute of the selected URL according to the second times and a third attribute of the URL corresponding to each access information record in the access behavior information set;
determining a third attribute and a fourth attribute of the selected URL according to the webpage pointed by the selected URL;
and generating a dynamic access attribute set of the selected URL according to the first attribute, the second attribute, the third attribute and the fourth attribute of the selected URL.
8. The method of claim 3, wherein prior to obtaining the set of access behavior information for the first host, further comprising:
after the first host is initialized, judging whether the current time is preset updating time or not, wherein the preset updating time is the time which is reached after every second preset time after the first host is initialized;
if the current time is the preset updating time, acquiring an incremental URL access record of the first host, wherein the incremental URL access record is a URL access record newly added between the time when the access behavior information set is updated last time and the current time by the first host;
and updating the access behavior information set after the last update according to the incremental URL access record, wherein the obtained access behavior information set of the first host is the access behavior information set after the current time is updated.
9. An apparatus for detecting an anomalous uniform resource locator, URL, the apparatus comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring URL (Uniform resource locator) access records of a plurality of hosts in a local area network in a first preset time period, and the first preset time period is before the current time and the duration of the first preset time period is a first preset duration;
a determining module, configured to determine, according to the URL access record, a URL set that a first host accesses within the first preset time period, where URLs in the URL set are different from each other, and the first host is one of the multiple hosts;
a processing module, configured to select a URL from the URL set, and execute the following processing for the selected URL until each URL in the URL set is processed:
determining a dynamic access attribute set of the selected URL, wherein the dynamic access attribute set comprises a first attribute and a second attribute, the first attribute refers to the number of times that the first host accesses the selected URL within the first preset time period, and the second attribute refers to the number of times that the selected URL appears in the webpages pointed to by all accessed URLs of the first host from the initialization time and until the current time;
and carrying out anomaly detection on the selected URL according to the dynamic access attribute set of the selected URL and a first anomaly detection model, wherein the first anomaly detection model is obtained by training in advance according to a plurality of abnormal sample URLs accessed by the first host and the dynamic access attribute set of each abnormal sample URL in the plurality of abnormal sample URLs.
10. The apparatus of claim 9, wherein the set of dynamic access attributes of the selected URL further includes a third attribute and a fourth attribute, the third attribute referring to URLs appearing in web pages pointed to by the selected URL, and the fourth attribute referring to a web page rank PR of web pages pointed to by the selected URL.
11. The apparatus of claim 10, wherein the processing module comprises:
an obtaining unit, configured to obtain an access behavior information set of the first host, where the access behavior information set includes at least one access information record, each access information record in the at least one access information record corresponds to a URL, and URLs corresponding to different access information records are different, a first access information record corresponding to a first URL in the at least one access information record includes multiple access times of the first URL, a second attribute of the first URL, a third attribute of the first URL, and a fourth attribute of the first URL, where the multiple access times refer to times when the first host accesses the first URL in multiple time periods, the multiple time periods are obtained by dividing time periods starting at an initialization time of the first host and ending at a current time, and a duration of each time period in the multiple time periods is a second preset duration, the first preset time length is greater than or equal to the second preset time length;
and the determining unit is used for determining the dynamic access attribute set of the selected URL according to the access behavior information set of the first host.
12. The apparatus according to claim 11, wherein the determining unit is specifically configured to:
if the time for updating the access behavior information set last time is the current time, acquiring an access information record from the access behavior information set, wherein the URL corresponding to the acquired access information record is the selected URL;
searching for the access times of the corresponding time period within the first preset time period from the multiple access times included in the acquired access information record;
accumulating the searched access times to obtain a first attribute of the selected URL;
and determining a dynamic access attribute set of the selected URL according to the first attribute of the selected URL and the second attribute, the third attribute and the fourth attribute included in the acquired access information record.
13. The apparatus of claim 11, wherein the determining unit comprises:
a first obtaining subunit, configured to obtain an incremental URL access record of the first host if a time at which the access behavior information set is updated last time is different from a current time, where the incremental URL access record is a URL access record that is newly added by the first host between the time at which the access behavior information set is updated last time and the current time;
a second obtaining subunit, configured to, if an access information record corresponding to the selected URL exists in the access behavior information set, obtain an access information record from the access behavior information set, where the URL corresponding to the obtained access information record is the selected URL;
a first determining subunit, configured to determine, according to the incremental URL access record, incremental access times for the first host to access the selected URL from time when the access behavior information set was updated last time to current time;
the searching subunit is configured to search, from the multiple access times included in the acquired access information record, the access times of the corresponding time period within the first preset time period;
the accumulation subunit is configured to accumulate the incremental access times and the found access times to obtain a first attribute of the selected URL;
and the second determining subunit is configured to determine a dynamic access attribute set of the selected URL according to the first attribute of the selected URL, the incremental URL access record, and the second attribute, the third attribute, and the fourth attribute included in the acquired access information record.
14. The apparatus of claim 13, wherein the second determining subunit is specifically configured to:
if the URLs appearing in the incremental URL access records exist in URLs corresponding to the access information records included in the access behavior information set, generating a dynamic access attribute set of the selected URL according to the first attribute of the selected URL, the second attribute, the third attribute and the fourth attribute included in the obtained access information records;
if not all URLs appearing in the incremental URL access records exist in the URLs corresponding to the access information records included in the access behavior information set, obtaining at least one URL from the incremental URL access record, wherein the obtained at least one URL is different from the URL corresponding to the access information record included in the access behavior information set, determining a first number of times that the selected URL appears in the acquired webpage pointed by the at least one URL, determining a second attribute of the selected URL according to the first number of times and a second attribute included in the acquired access information record, and generating a dynamic access attribute set of the selected URL according to the first attribute and the second attribute of the selected URL and the third attribute and the fourth attribute included in the acquired access information record.
15. The apparatus of claim 13, wherein the determining unit further comprises:
a third determining subunit, configured to determine, according to the incremental URL access record, incremental access times for the first host to access the selected URL between a time when the access behavior information set is updated last time and a current time if the access behavior information set does not have an access information record corresponding to the selected URL, and determine the incremental access times as a first attribute of the selected URL;
a fourth determining subunit, configured to determine a second number of times that the selected URL appears in a webpage pointed by another URL in the incremental URL access record except the selected URL;
a fifth determining subunit, configured to determine, according to the second number of times and a third attribute of a URL corresponding to each access information record in the access behavior information set, a second attribute of the selected URL;
a sixth determining subunit, configured to determine a third attribute and a fourth attribute of the selected URL according to the webpage to which the selected URL points;
and the seventh determining subunit is configured to generate a dynamic access attribute set of the selected URL according to the first attribute, the second attribute, the third attribute, and the fourth attribute of the selected URL.
16. The apparatus of claim 11, wherein the apparatus further comprises:
the judging module is used for judging whether the current time is preset updating time after the first host is initialized, wherein the preset updating time is the time which is reached after every second preset time after the first host is initialized;
a second obtaining module, configured to obtain an incremental URL access record of the first host if the current time is the predetermined update time, where the incremental URL access record is a URL access record that is newly added to the first host between the time when the access behavior information set is updated last time and the current time;
and the updating module is used for updating the access behavior information set after the last updating according to the incremental URL access record, and the obtained access behavior information set of the first host is the access behavior information set after the current time is updated.
17. An apparatus to detect an anomalous uniform resource locator, URL, the apparatus comprising a memory and a processor;
the memory is used for storing a program for supporting the device to execute the method of any one of claims 1-8 and storing data involved in implementing the method of any one of claims 1-8;
the processor is configured to execute programs stored in the memory.
18. A computer-readable storage medium having stored therein instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1-8.
CN201810368224.6A 2018-04-23 2018-04-23 Method, device and storage medium for detecting abnormal URL Active CN110392032B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810368224.6A CN110392032B (en) 2018-04-23 2018-04-23 Method, device and storage medium for detecting abnormal URL

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810368224.6A CN110392032B (en) 2018-04-23 2018-04-23 Method, device and storage medium for detecting abnormal URL

Publications (2)

Publication Number Publication Date
CN110392032A CN110392032A (en) 2019-10-29
CN110392032B true CN110392032B (en) 2021-03-30

Family

ID=68284486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810368224.6A Active CN110392032B (en) 2018-04-23 2018-04-23 Method, device and storage medium for detecting abnormal URL

Country Status (1)

Country Link
CN (1) CN110392032B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368163B (en) * 2020-02-24 2024-03-26 网宿科技股份有限公司 Crawler data identification method, system and equipment
CN111614614B (en) * 2020-04-14 2022-08-05 瑞数信息技术(上海)有限公司 Safety monitoring method and device applied to Internet of things
CN115208597B (en) * 2021-04-09 2023-07-21 中国移动通信集团辽宁有限公司 Abnormal equipment determining method, device, equipment and computer storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622435A (en) * 2012-02-29 2012-08-01 百度在线网络技术(北京)有限公司 Method and device for detecting black chain
EP2590379A2 (en) * 2011-11-04 2013-05-08 Hitachi Ltd. Filtering system and filtering method
CN103368957A (en) * 2013-07-04 2013-10-23 北京奇虎科技有限公司 Method, system, client and server for processing webpage access behavior
CN104615695A (en) * 2015-01-23 2015-05-13 腾讯科技(深圳)有限公司 Malicious website detecting method and system
CN107508809A (en) * 2017-08-17 2017-12-22 腾讯科技(深圳)有限公司 Identify the method and device of website type

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2590379A2 (en) * 2011-11-04 2013-05-08 Hitachi Ltd. Filtering system and filtering method
CN102622435A (en) * 2012-02-29 2012-08-01 百度在线网络技术(北京)有限公司 Method and device for detecting black chain
CN103368957A (en) * 2013-07-04 2013-10-23 北京奇虎科技有限公司 Method, system, client and server for processing webpage access behavior
CN104615695A (en) * 2015-01-23 2015-05-13 腾讯科技(深圳)有限公司 Malicious website detecting method and system
CN107508809A (en) * 2017-08-17 2017-12-22 腾讯科技(深圳)有限公司 Identify the method and device of website type

Also Published As

Publication number Publication date
CN110392032A (en) 2019-10-29

Similar Documents

Publication Publication Date Title
US10121000B1 (en) System and method to detect premium attacks on electronic networks and electronic devices
US11134094B2 (en) Detection of potential security threats in machine data based on pattern detection
US10609059B2 (en) Graph-based network anomaly detection across time and entities
US10986106B2 (en) Method and system for generating an entities view with risk-level scoring for performing computer security monitoring
US10102372B2 (en) Behavior profiling for malware detection
US8260914B1 (en) Detecting DNS fast-flux anomalies
US20130318603A1 (en) Security threat detection based on indications in big data of access to newly registered domains
US10601847B2 (en) Detecting user behavior activities of interest in a network
US10454967B1 (en) Clustering computer security attacks by threat actor based on attack features
US11178160B2 (en) Detecting and mitigating leaked cloud authorization keys
CN110392032B (en) Method, device and storage medium for detecting abnormal URL
US11423099B2 (en) Classification apparatus, classification method, and classification program
US11533323B2 (en) Computer security system for ingesting and analyzing network traffic
US11133977B2 (en) Anonymizing action implementation data obtained from incident analysis systems
US11770388B1 (en) Network infrastructure detection
US11962618B2 (en) Systems and methods for protection against theft of user credentials by email phishing attacks
CN111385248B (en) Attack defense method and attack defense device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant