CN109302383B - URL monitoring method and device - Google Patents

URL monitoring method and device Download PDF

Info

Publication number
CN109302383B
CN109302383B CN201811018419.4A CN201811018419A CN109302383B CN 109302383 B CN109302383 B CN 109302383B CN 201811018419 A CN201811018419 A CN 201811018419A CN 109302383 B CN109302383 B CN 109302383B
Authority
CN
China
Prior art keywords
target
url
uniform resource
resource locator
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811018419.4A
Other languages
Chinese (zh)
Other versions
CN109302383A (en
Inventor
熊庆昌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811018419.4A priority Critical patent/CN109302383B/en
Publication of CN109302383A publication Critical patent/CN109302383A/en
Application granted granted Critical
Publication of CN109302383B publication Critical patent/CN109302383B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Abstract

The embodiment of the application discloses a URL monitoring method and a device, wherein the method comprises the following steps: the method comprises the steps of obtaining N target characteristics corresponding to target data, obtaining a target URL carried in a received target request, if the target URL does not belong to a monitoring set, detecting whether page content requested by the target request is matched with M target characteristics in the N target characteristics, and if the page content requested by the target request is matched with M target characteristics in the N target characteristics, monitoring the target URL. By adopting the embodiment of the application, the monitoring of the URL can be realized, so that the monitoring of the target data is realized, the monitoring processing efficiency is improved, the monitoring cost is reduced, and the property loss brought to a user by the target data leakage is prevented.

Description

URL monitoring method and device
Technical Field
The application relates to the technical field of internet, in particular to a URL monitoring method and device.
Background
Currently, data in a website is presented through the page content of a web page. However, data (such as policy data, customer information, etc.) existing in the existing website is not monitored, so that the data in the website is likely to be maliciously utilized. For example, when there is an unauthorized hole in the website, the data may be leaked, or a person who has authority to view the data is tempted to actively leak, etc. Once this data is compromised, it can cause serious property damage to the customer.
At present, the monitoring of the data is mainly realized by manpower, but the manual monitoring is time-consuming and labor-consuming, the processing efficiency is low, and the monitoring cost is high.
Disclosure of Invention
The embodiment of the application provides a URL monitoring method and device, which can improve the monitoring processing efficiency and reduce the monitoring cost.
In a first aspect, an embodiment of the present application provides a URL monitoring method, where the method includes:
acquiring N target characteristics corresponding to target data;
acquiring a target Uniform Resource Locator (URL) carried in a received target request;
if the target URL does not belong to a monitoring set, detecting whether the page content requested by the target request is matched with M target features in the N target features, wherein the monitoring set comprises a monitored URL, and the page content corresponding to the monitored URL is matched with at least one target feature in the N target features;
if the page content requested by the target request is matched with M target features in the N target features, monitoring the target URL;
wherein, N and M are integers which are more than or equal to 1, and M is less than or equal to N.
With reference to the first aspect, in a possible implementation manner, after obtaining a target URL carried in a received target request, the method further includes:
detecting whether a filename suffix of the target URL matches a target suffix, the target suffix including a filename suffix of at least one non-monitored file; and if the file name suffix of the target URL does not match with the target suffix, detecting whether the target URL belongs to a monitoring set.
With reference to the first aspect, in one possible implementation manner, the monitoring set includes monitored URLs of historical monitoring records. Detecting whether the target URL belongs to the monitored set comprises: calculating the hash value of the target URL; determining hash values of all monitored URLs in the monitoring set, and detecting whether hash values matched with the hash value of the target URL exist in the hash values of all monitored URLs; and if the hash value of each monitored URL does not have a hash value matched with the hash value of the target URL, determining that the target URL does not belong to the monitoring set.
With reference to the first aspect, in one possible implementation manner, the monitored URLs included in the monitoring set are null. Detecting whether the target URL belongs to the monitored set comprises: and if the monitored URL in the monitoring set is empty, determining that the target URL does not belong to the monitoring set.
With reference to the first aspect, in one possible implementation, the method further includes: if the target URL belongs to the monitoring set, acquiring a request frequency aiming at the target URL; and if the request frequency is out of the target range, outputting alarm prompt information. The alarm prompt information comprises the target URL, and the alarm prompt information is used for prompting that the request frequency of the target URL is abnormal.
With reference to the first aspect, in a possible implementation manner, if the page content requested by the target request matches M target features of the N target features, monitoring the target URL includes: and if the page content requested by the target request is matched with M target features in the N target features, monitoring the request frequency of the target URL, and adding the target URL into the monitoring set.
In a second aspect, an embodiment of the present application provides a URL monitoring apparatus, where the apparatus includes:
the first acquisition module is used for acquiring N target characteristics corresponding to the target data;
the second acquisition module is used for acquiring a target Uniform Resource Locator (URL) carried in the received target request;
a first detection module, configured to detect whether page content requested by the target request matches M target features of the N target features when the target URL does not belong to a monitoring set, where the monitoring set includes a monitored URL, and the page content corresponding to the monitored URL matches at least one target feature of the N target features;
the monitoring module is used for monitoring the target URL when the page content requested by the target request is matched with M target characteristics in the N target characteristics;
wherein, N and M are integers which are more than or equal to 1, and M is less than or equal to N.
With reference to the second aspect, in one possible implementation, the apparatus further includes:
a second detection module for detecting whether a filename suffix of the target URL matches a target suffix, the target suffix including a filename suffix of at least one non-monitored file;
and the third detection module is used for detecting whether the target URL belongs to the monitoring set or not when the file name suffix of the target URL does not match with the target suffix.
With reference to the second aspect, in one possible implementation, the monitoring set includes monitored URLs of historical monitoring records. The third detection module is specifically configured to calculate a hash value of the target URL; determining hash values of all monitored URLs in the monitoring set, and detecting whether hash values matched with the hash value of the target URL exist in the hash values of all monitored URLs; and if the hash value of each monitored URL does not have a hash value matched with the hash value of the target URL, determining that the target URL does not belong to the monitoring set.
With reference to the second aspect, in one possible implementation, the monitored URLs included in the monitoring set are null. The third detection module is further specifically configured to: and when the monitored URL included in the monitoring set is empty, determining that the target URL does not belong to the monitoring set.
With reference to the second aspect, in one possible implementation, the apparatus further includes: a third obtaining module, configured to obtain a request frequency for the target URL when the target URL belongs to the monitoring set; and the output module is used for outputting alarm prompt information when the request frequency is out of the target range. The alarm prompt information comprises the target URL, and the alarm prompt information is used for prompting that the request frequency of the target URL is abnormal.
With reference to the second aspect, in a possible implementation manner, the monitoring module is specifically configured to monitor the request frequency of the target URL and add the target URL to the monitoring set when the page content requested by the target request matches M target features of the N target features.
In a third aspect, an embodiment of the present application provides a server, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is used to store a computer program that supports the server to execute the above method, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the URL monitoring method of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium storing a computer program, where the computer program includes program instructions, and the program instructions, when executed by a processor, cause the processor to execute the URL monitoring method of the first aspect.
According to the embodiment of the application, N target characteristics corresponding to target data are obtained, the target URL carried in the received target request is obtained, if the target URL does not belong to a monitoring set, whether the page content requested by the target request is matched with M target characteristics in the N target characteristics is detected, if the page content requested by the target request is matched with M target characteristics in the N target characteristics, the target URL is monitored, the URL can be monitored, monitoring of the target data is achieved, monitoring processing efficiency is improved, monitoring cost is reduced, and property loss of a user due to target data leakage is prevented.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a URL monitoring method provided by an embodiment of the present application;
FIG. 2 is another schematic flow chart of a URL monitoring method provided by an embodiment of the present application;
FIG. 3 is a schematic block diagram of a URL monitoring apparatus provided in an embodiment of the present application;
fig. 4 is a schematic block diagram of a server according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It is to be understood that the terms "comprises" and "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
It should also be appreciated that reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
The target request in the embodiment of the present application may be a hypertext transfer protocol (HTTP) request. The target data related to the embodiment of the application are all presented by the page content of the webpage. Therefore, the embodiment of the application can find whether the page content includes the target data by detecting whether the page content returned by the HTTP request carries the characteristics of the target data. Since the HTTP request carries a Uniform Resource Locator (URL), the page content returned by the server for the HTTP request is usually the page content corresponding to the URL. Therefore, the embodiment of the application can monitor the URLs corresponding to the page content containing the target data to achieve the purpose of monitoring the target data, and can prevent and find the problem of leakage of the target data to a certain extent by monitoring the URLs, so that the monitoring processing efficiency is improved, the monitoring cost and the monitoring difficulty are reduced, and the property loss of the user caused by leakage of the target data is prevented.
The following describes a URL monitoring method and apparatus provided in an embodiment of the present application with reference to fig. 1 to 4.
Referring to fig. 1, which is a schematic flowchart of a URL monitoring method according to an embodiment of the present disclosure, as shown in fig. 1, the URL monitoring method may include:
s101, obtaining N target characteristics corresponding to target data.
In some possible embodiments, the server may obtain N manually preset target features, where the target features may be used to represent target data, and the target data may be customized data that needs to be monitored in a website, such as policy data, customer information, and the like, and N may be an integer greater than or equal to 1. The target data in the embodiment of the present application generally refers to some data and/or information closely related to the customer, such as policy data, customer information, and the like. The target characteristics may include key fields such as customer name, cell phone, mailbox, ID card, license plate, frame number, warranty number, address, age, gender, user name, password, bank card number, order number, etc.; the target feature may also include a file type, such as a PDF document, a word document, and so on.
In some possible embodiments, the server may obtain at least one preset feature preset manually, analyze, by using big data, which key fields are specifically used by each preset feature in all the contents of the page crawled by the crawler, extract all the key fields representing each preset feature in the at least one preset feature, respectively, and determine N target features from the key fields representing the at least one preset feature. Wherein, a key field may be a target feature, and N may be an integer greater than or equal to 1. The key fields may include fields related to customer information such as customer name, cell phone, mailbox, identification card ID, license plate, frame number, warranty number, address, age, gender, user name, password, bank card number, order number, etc. For example, assuming that the preset feature is "continuous 11 digits", the server analyzes that "continuous 11 digits" in all page contents crawled by the crawler by using big data is represented by 3 key fields of "Mobile phone number", "Tel", and "Mobile", and the server determines the 3 key fields of "Mobile phone number", "Tel", and "Mobile" as 3 target features. If the preset characteristic is ' number/letter (including case) + @ + number/letter, com ', the server analyzes that the characteristic ' number/letter (including case) + @ + number/letter, com ' is preset in all the page contents crawled by the crawler by using big data, and the com ' is represented by 2 key fields ' mailbox ', ' Email ', and the server determines the 2 key fields ' mailbox ', ' Email ' as 2 target characteristics. According to the method and the device, the target characteristics for representing the target data are determined by analyzing the key fields of the preset characteristics in the page content through the big data, and the target characteristics (key fields) for representing the target data can be extracted under the condition that how the page content of the website is designed (such as which fields or which contents in the page content) is not known.
In some possible embodiments, after extracting the key fields representing the respective preset features, the server may output all the key fields representing the respective preset features. The server may determine a key field selected by the user among all key fields representing the respective preset features as a target key field, and may determine the target key field as a target feature. Wherein a key field may be a target feature. Optionally, after extracting the key fields representing each preset feature, the server may screen, based on a preset screening rule, a part of the key fields from the key fields representing each preset feature as the target features representing the target data. For example, the server may count the number of times that 3 key fields, that is, "Mobile phone number", "Tel", and "Mobile", appear, and only retain the key field with the largest number of times of appearance according to the magnitude relationship of the number of times of appearance, and assume that the "Mobile phone number" has the largest number of times of appearance, the key field "Mobile phone number" is taken as the target feature, and the other 2 key fields "Tel" and "Mobile" are discarded. According to the method and the device, the target characteristics of the target data are obtained after the key fields representing the preset characteristics are manually screened or added and deleted, or part of the key fields are screened out based on the preset screening rule to serve as the target characteristics of the target data, so that some key fields with low occurrence frequency and key fields which obviously cannot represent the target data are eliminated, and the accuracy can be improved.
S102, obtaining a target Uniform Resource Locator (URL) carried in the received target request.
In some possible embodiments, the server may receive any one target HTTP request sent by the terminal, and may obtain a target URL carried in the target HTTP request. The target HTTP request is used to request page content corresponding to the target URL. A URL may be generally structured as "protocol:// server name (IP address)/path? Parameter ", where the path in the URL is used to represent a directory or file address on the host.
S103, if the target URL does not belong to the monitoring set, whether the page content requested by the target request is matched with M target features in the N target features is detected.
In some possible embodiments, the server may obtain a preset monitoring set, and may detect whether a monitored URL identical to the target URL exists in the monitoring set. If a monitored URL identical to the target URL exists in the monitoring set, indicating that the target URL belongs to the monitoring set, it may be detected whether the request frequency for the target URL is outside the target range. If there is no monitored URL in the monitoring set that is the same as the target URL, indicating that the target URL does not belong to the monitoring set, it may be detected whether the content of the page requested by the target HTTP request matches with M target features of the N target features. And the monitoring set comprises a monitored URL, and the page content corresponding to the monitored URL is matched with at least one target feature in the N target features. N and M may each be an integer greater than or equal to 1, and M may be less than or equal to N.
For example, assume that the N target features include 9 key fields of a mobile phone, an ID card ID, a license plate, a policy number, an address, a user name, a password, a bank card number, and an order number, and 2 file types of a PDF document and a word document. The monitoring set comprises 4 monitored URLs, namely URL3, URL5, URL7 and URL 8. The target URL carried by the target HTTP request is URL 9. The server can detect whether a monitored URL identical to the target URL exists in the monitoring set by using a character matching method. Because the monitored URL identical to the URL9 does not exist in the monitoring set, the URL9 does not belong to the monitoring set, and the server detects whether the content of the page requested by the target HTTP request contains any key field of 9 key fields, namely a mobile phone, an ID card, a license plate, a policy number, an address, a user name, a password, a bank card number and an order number. If the content of the page requested by the target HTTP request includes at least one key field of the 9 key fields, it indicates that the content of the page requested by the target HTTP request matches with at least one target feature of the N target features. If the page content requested by the target HTTP request does not include any key field of the 9 key fields, the server may obtain the first 3 characters of the header file of the page content requested by the target HTTP request. And judging whether the first 3 characters of the header file are the same as the first 3 characters of the PDF document or word document header file, if so, indicating that the page content requested by the target HTTP request is matched with 1 target feature in the N target features. If not, it indicates that the content of the page requested by the target HTTP request does not match any of the N target features. Optionally, the server may perform, in parallel, detecting whether the page content requested by the target HTTP request includes any key field of the 9 key fields, and detecting whether the first 3 characters of the header file of the page content requested by the target HTTP request are the same as the first 3 characters of the header file of the PDF document or word document. When the page content requested by the target HTTP request comprises any key field in the 9 key fields, and/or the first 3 characters of the header file of the page content requested by the target HTTP request are the same as the first 3 characters of the header file of the PDF document or word document, the page content requested by the target HTTP request is matched with M target features in the N target features.
In some possible embodiments, the monitoring list may be a configurable list, and monitored URLs in the monitoring set may be added and/or deleted through the interface. The initial monitoring set may not contain any monitored URL, or may contain one or more manually set URLs. If the monitored set does not include any monitored URL (i.e., the monitored set is empty), then the target URL does not belong to the monitored set, and the server may detect whether the content of the page requested by the target HTTP request matches with M target features of the N target features.
And S104, monitoring the target URL if the page content requested by the target request is matched with M target characteristics in the N target characteristics.
In some possible embodiments, the server may monitor the target URL when the page content requested by the target request matches one or more of the acquired N target features. For example, the request frequency and the time distribution for the target URL are monitored, or the download frequency of the PDF document or the word document corresponding to the target URL is monitored, and the IP address of each download is recorded, or the request frequency and the time distribution for the target URL of a single IP address (the IP address may be manually set, or may be an IP address whose request frequency is 100 a ago) are monitored. When the page content requested by the target request is not matched with any of the N acquired target features, it indicates that the page content requested by the target request does not include target data, and the server may add the target URL to a non-monitoring set, so that the server directly discards the request including the target URL next time, and does not determine whether the page content corresponding to the target URL includes the target data. The embodiment of the application monitors the target URL, so that the target data is monitored, the monitoring processing efficiency is improved, the monitoring cost is reduced, and the property loss of a user caused by target data leakage is prevented.
In some possible embodiments, after monitoring the target URL, the server may add the target URL to the monitoring set to obtain a new monitoring set. And when receiving the next HTTP request, the server judges whether the URL contained in the next HTTP request belongs to a new monitoring set. According to the embodiment of the application, the monitoring set is automatically updated, the newly added page content containing the target data in the website can be found and/or monitored in real time, and the URL monitoring is more accurate.
In the embodiment of the application, the server acquires N target features corresponding to target data, acquires a target URL carried in a received target request, and if the target URL does not belong to a monitoring set, detects whether page content requested by the target request is matched with M target features in the N target features, and if the page content requested by the target request is matched with M target features in the N target features, monitors the target URL, so that monitoring of the URL can be realized, monitoring of the target data is realized, monitoring processing efficiency is further improved, monitoring cost is reduced, and property loss of a user caused by target data leakage is prevented.
Referring to fig. 2, it is another schematic flowchart of a URL monitoring method provided in an embodiment of the present application, and as shown in fig. 2, the URL monitoring method may include:
s201, acquiring N target characteristics corresponding to target data.
S202, obtaining a target Uniform Resource Locator (URL) carried in the received target request.
The implementation manners of the above steps S201 to S202 in the embodiment of the present application may refer to the implementation manners provided by the steps S101 to S102 in the embodiment shown in fig. 1, and are not described herein again.
S203, detecting whether the file name suffix of the target URL is matched with the target suffix.
S204, if the file name suffix of the target URL is not matched with the target suffix, whether the target URL belongs to the monitoring set is detected.
In some possible embodiments, the server may detect whether the filename suffix of the target URL is the same as a preset target suffix, and if the filename suffix of the target URL is not the same as the preset target suffix, detect whether the target URL belongs to a preset monitoring set. If the file name suffix of the target URL is the same as the preset target suffix, which indicates that the page content corresponding to the target URL does not include the target data, the server may receive a next HTTP request and may determine whether the file name suffix of the URL carried in the next HTTP request is the same as the target suffix. The target suffix may include a file name suffix of at least one non-monitored file, and the non-monitored file may be a file that is preset manually and does not contain target data. Since some page contents which cannot include the target data exist in the website, the page contents which obviously do not include the target data are excluded by the file name suffix of the URL, so that the processing efficiency can be improved.
For example, non-monitoring files include js files, css files, pictures, and videos. The suffix of the file name of the js file is ' js ', the suffix of the file name of the css file is ' css ', the suffix of the file name of the picture is ' jpg or ' png ', and the suffix of the file name of the video is ' mp4 '. The server detects whether the file name suffix of the target URL is the same as any one of the ". js", ". css", ". jpg", ". png", and ". mp 4", so as to judge whether the page content corresponding to the target URL is a js file, a css file, a picture, or a video. If the file name suffix of the target URL is different from each of ". js", ". css", ". jpg", ". png", and ". mp 4", the server may detect whether the target URL belongs to a predetermined monitoring set.
In some possible embodiments, the server may obtain a preset monitoring set, where the monitoring set includes monitored URLs of historical monitoring records. When detecting whether the target URL belongs to a preset monitoring set, the server may calculate the hash value of the target URL using a preset hash function, and may calculate the hash value of each monitored URL in the monitoring set using the preset hash function. The server detects whether the hash value of the monitored URL identical to the hash value of the target URL exists in the hash values of the monitored URLs. If the hash value of the monitored URL that is the same as the hash value of the target URL does not exist in the hash values of the monitored URLs, it may be determined that the target URL does not belong to the monitoring set, and it may be detected whether the content of the page requested by the target HTTP request matches with M target features of the N target features. And if the hash value of the monitored URL which is the same as the hash value of the target URL exists in the hash values of the monitored URLs, determining that the target URL belongs to the monitoring set. Since the hash function is a function for converting data of any length into data of a specific length, and generally, the character length of the URL is long, the problem of low detection rate when comparing URLs character by character can be solved by comparing the hash value of the monitored URL with the hash value of the target URL in the embodiments of the present application, and the detection efficiency is improved.
For example, assume that the monitoring set includes 4 monitored URLs, URL3, URL5, URL7, and URL 8. The target URL is URL 4. The server calculates the hash value of the URL4 to 09 using a preset hash function, and calculates the hash value 01 of the URL3, the hash value 12 of the URL5, the hash value 04 of the URL7, and the hash value 11 of the URL8, respectively. Since the hash value 09 of the URL4 is different from the hash values (01, 12, 04, and 11) of the monitored URLs in the monitoring set, which indicates that the URL4 does not belong to the monitoring set, it can be detected whether the content of the page requested by the target HTTP request matches M target features of the N target features.
In some possible embodiments, if the monitored URL included in the monitoring set is empty, the server may determine that the target URL does not belong to the monitoring set.
S205, if the target URL does not belong to the monitoring set, detecting whether the page content requested by the target request is matched with M target features in the N target features.
S206, if the page content requested by the target request is matched with M target characteristics in the N target characteristics, monitoring the target URL.
The implementation manners of the above steps S205 to S206 in the embodiment of the present application may refer to the implementation manners provided by the steps S103 to S104 in the embodiment shown in fig. 1, and are not described herein again.
And S207, if the target URL belongs to the monitoring set, acquiring the request frequency aiming at the target URL.
And S208, if the request frequency is out of the target range, outputting alarm prompt information.
In some possible embodiments, when the target URL belongs to the monitoring set, indicating that the target URL is a monitored URL, the server may obtain a request frequency of HTTP requests containing the target URL received within a period of time. The server may obtain a preset request frequency range, and may detect whether a request frequency for the target URL is within the request frequency range. If the request frequency is within the request frequency range, it indicates that the request frequency of the target URL is normal, and if no mutation occurs, the target URL can be monitored continuously. If the request frequency is out of the request frequency range, it indicates that the request frequency of the target URL is abnormal, and a sudden change may occur, and an alarm prompt message may be output, where the alarm prompt message may include the target URL, and the alarm prompt message may be used to prompt a monitoring person that the request frequency of the target URL is abnormal, so that the monitoring person may find out a problem of target data leakage that may occur in time. Wherein the request frequency range may be an average of the request frequencies for the target URL over a period of the history plus/minus a standard deviation of the request frequencies, that is, the request frequency range is
Figure BDA0001785815110000111
Figure BDA0001785815110000112
Mean value, δ, representing the frequency of requestsfA standard value representing the requested frequency.
In some possible embodiments, after obtaining the request frequency for the target URL, the server may further obtain a request frequency curve F (used to describe the relationship between the number of received HTTP requests containing the target URL and time) of the target URL in the history. The server may compare whether the obtained request frequency for the target URL meets the trend of the request frequency curve F, or whether the request frequency for the target URL changes abruptly compared with the request frequency curve F. If the request frequency of the target URL does not meet the trend of the request frequency curve F, or the request frequency of the target URL is suddenly changed compared with the request frequency curve F, alarm prompt information can be output. Wherein, the sudden change may refer to a sudden appearance of a spike in the request frequency curve F. For example, if the trend of the request frequency curve F is relatively smooth (fluctuating around 1000 times/min), and the request frequency F1 for the target URL is 10000 times/min (i.e. the sudden peak), the sudden change can be considered to occur.
In some possible embodiments, when the target URL belongs to the monitoring set, if the page content corresponding to the target URL includes a PDF document or a word document, the server may obtain a download frequency of the PDF document or the word document in the page content corresponding to the target URL. The server can obtain a preset downloading frequency range, and can detect whether the downloading frequency of the PDF document or the word document in the page content corresponding to the target URL is in the downloading frequency range. If the download frequency is within the download frequency range, it indicates that the download frequency of the PDF document or word document is normal, and no mutation occurs, and the target URL may be monitored continuously. If the download frequency is out of the download frequency range, it indicates that the download frequency of the target URL is abnormal, and a sudden change may occur, and an alarm prompt message may be output, where the alarm prompt message may include the target URL, and the alarm prompt message may be used to prompt a monitoring person that the download frequency of a PDF document or a word document in page content corresponding to the target URL is abnormal, so that the monitoring person may find out a problem of target data leakage in time. Wherein, the download frequency range can be the average value of the download frequency of the PDF document or word document in a period of the history record plus/minus the standard deviation of the download frequency, that is, the download frequency range is
Figure BDA0001785815110000121
Figure BDA0001785815110000122
Representing the mean value of the download frequency, deltadA standard value representing the download frequency.
In the embodiment of the application, the server acquires N target features corresponding to target data, acquires a target URL carried in a received target request, detects whether a file name suffix of the target URL is matched with a target suffix, if the file name suffix of the target URL is not matched with the target suffix, it is detected whether the target URL belongs to a monitored set, and, when the target URL does not belong to the monitored set, detecting whether the page content requested by the target request is matched with M target characteristics in the N target characteristics, if so, i.e. the content of the page requested by the target request matches M of the N target features, monitoring the target URL, acquiring the request frequency aiming at the target URL when the target URL belongs to the monitoring set, and outputting alarm prompt information when the request frequency is out of the target range. The URL monitoring method and the URL monitoring system can not only realize URL monitoring and target data monitoring, but also obviously discharge page content without target data, improve monitoring efficiency, and timely discover the possible target data leakage condition by monitoring the request frequency.
Fig. 3 is a schematic block diagram of a URL monitoring apparatus according to an embodiment of the present application. The URL monitoring apparatus 300 of the present embodiment includes:
a first obtaining module 10, configured to obtain N target features corresponding to target data;
a second obtaining module 20, configured to obtain a target uniform resource locator URL carried in the received target request;
a first detecting module 30, configured to detect whether page content requested by the target request matches M target features of the N target features when the target URL does not belong to a monitoring set, where the monitoring set includes a monitored URL, and page content corresponding to the monitored URL matches at least one target feature of the N target features;
a monitoring module 40, configured to monitor the target URL when the page content requested by the target request matches M of the N target features;
wherein, N and M are integers which are more than or equal to 1, and M is less than or equal to N.
In some possible embodiments, the URL monitoring apparatus 300 further includes a second detection module 50 and a third detection module 60. The second detecting module 50 is configured to detect whether a filename suffix of the target URL matches a target suffix, where the target suffix includes a filename suffix of at least one non-monitored file; the third detecting module 60 is configured to detect whether the target URL belongs to the monitored set when the filename suffix of the target URL does not match the target suffix.
In some possible implementations, the monitoring set includes monitored URLs of historical monitoring records. The third detecting module 60 is specifically configured to:
calculating the hash value of the target URL; determining hash values of all monitored URLs in the monitoring set, and detecting whether hash values matched with the hash value of the target URL exist in the hash values of all monitored URLs; and if the hash value of each monitored URL does not have a hash value matched with the hash value of the target URL, determining that the target URL does not belong to the monitoring set.
In some possible implementations, the monitored URLs included in the monitoring set are null. The third detecting module 60 is further specifically configured to: and when the monitored URL included in the monitoring set is empty, determining that the target URL does not belong to the monitoring set.
In some possible embodiments, the URL monitoring apparatus 300 further includes a third obtaining module 70 and an outputting module 80. The third obtaining module 70, configured to obtain a request frequency for the target URL when the target URL belongs to the monitoring set; and the output module 80 is used for outputting alarm prompt information when the request frequency is out of the target range. The alarm prompt information comprises the target URL, and the alarm prompt information is used for prompting that the request frequency of the target URL is abnormal.
In some possible embodiments, the monitoring module 40 is specifically configured to monitor the request frequency of the target URL when the page content requested by the target request matches M target features of the N target features, and add the target URL to the monitoring set.
In a specific implementation, the URL monitoring apparatus may execute, through the modules, the implementation manners provided in the steps in the implementation manners provided in fig. 1 or fig. 2 to implement the functions implemented in the embodiments, which may specifically refer to the corresponding descriptions provided in the steps in the method embodiment shown in fig. 1 or fig. 2, and are not described herein again.
In the embodiment of the application, the URL monitoring device acquires N target features corresponding to target data and acquires a target URL carried in a received target request, and if the target URL does not belong to a monitoring set, detects whether page content requested by the target request matches with M target features of the N target features, and if the page content requested by the target request matches with M target features of the N target features, monitors the target URL, and can monitor the URL, thereby achieving monitoring of the target data, further improving monitoring processing efficiency, reducing monitoring cost, and preventing the target data from being leaked to cause property loss to a user.
Referring to fig. 4, a schematic block diagram of a server provided in the embodiment of the present application is shown. As shown in fig. 4, the server 400 in the embodiment of the present application may include: one or more processors 401 and memory 402. The processor 401 and the memory 402 are connected by a bus 403. The memory 402 is used to store computer programs comprising program instructions and the processor 401 is used to execute the program instructions stored by the memory 402. Wherein the processor 401 is configured to call the program instruction to execute:
acquiring N target characteristics corresponding to target data;
acquiring a target Uniform Resource Locator (URL) carried in a received target request;
if the target URL does not belong to a monitoring set, detecting whether the page content requested by the target request is matched with M target features in the N target features, wherein the monitoring set comprises a monitored URL, and the page content corresponding to the monitored URL is matched with at least one target feature in the N target features;
if the page content requested by the target request is matched with M target features in the N target features, monitoring the target URL;
wherein, N and M are integers which are more than or equal to 1, and M is less than or equal to N.
It should be appreciated that in some possible implementations, the processor 401 may be a Central Processing Unit (CPU), and may be other general purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 402 may include both read-only memory and random access memory, and provides instructions and data to the processor 401. A portion of the memory 402 may also include non-volatile random access memory. For example, the memory 402 may also store device type information.
In a specific implementation, the processor 401 described in this embodiment may execute the implementation manner described in the URL monitoring method provided in this embodiment, and may also execute the implementation manner of the URL monitoring apparatus described in this embodiment, which is not described herein again.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a processor, the URL monitoring method shown in fig. 1 or fig. 2 is implemented, for details, please refer to the description of the embodiment shown in fig. 1 or fig. 2, which is not described herein again.
The computer-readable storage medium may be the URL deduplication apparatus described in any of the foregoing embodiments or an internal storage unit of an electronic device, such as a hard disk or a memory of the electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, which are provided on the electronic device. Further, the computer readable storage medium may also include both an internal storage unit and an external storage device of the electronic device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the electronic device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (terminals) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although the present application has been described in conjunction with specific features and embodiments thereof, it will be evident that various modifications and combinations can be made thereto without departing from the spirit and scope of the application. Accordingly, the specification and figures are merely exemplary of the present application as defined in the appended claims and are intended to cover any and all modifications, variations, combinations, or equivalents within the scope of the present application. It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A method for monitoring a Uniform Resource Locator (URL), comprising:
the method comprises the steps of obtaining at least one preset feature, analyzing which key fields are specifically used in all page contents crawled by a crawler for each preset feature by utilizing big data, respectively extracting all key fields representing each preset feature in the at least one preset feature, and determining N target features from the key fields representing the at least one preset feature, wherein one key field is one target feature;
acquiring a target Uniform Resource Locator (URL) carried in a received target request;
if the target uniform resource locator URL does not belong to a monitoring set, detecting whether the page content requested by the target request is matched with M target features in the N target features, wherein the monitoring set comprises a monitored uniform resource locator URL, and the page content corresponding to the monitored uniform resource locator URL is matched with at least one target feature in the N target features;
if the page content requested by the target request is matched with M target features in the N target features, monitoring the target Uniform Resource Locator (URL);
if the target uniform resource locator URL belongs to a monitoring set, acquiring a request frequency for the target uniform resource locator URL, acquiring a request frequency curve of the target uniform resource locator URL in a history record, and comparing whether the acquired request frequency for the target uniform resource locator URL meets the trend of the request frequency curve or not, or comparing with the request frequency curve, whether the request frequency of the target uniform resource locator URL is mutated or not, wherein the request frequency curve is used for describing the relation between the number of received HTTP requests containing the target uniform resource locator URL and time;
if the request frequency of the target uniform resource locator URL does not meet the trend of the request frequency curve, or the request frequency of the target uniform resource locator URL is suddenly changed compared with the request frequency curve, outputting alarm prompt information;
wherein N and M are integers greater than or equal to 1, and M is less than or equal to N.
2. The method of claim 1, wherein after obtaining the target uniform resource locator URL carried in the received target request, the method further comprises:
detecting whether a filename suffix of the target Uniform Resource Locator (URL) matches a target suffix, the target suffix comprising a filename suffix of at least one non-monitored file;
and if the file name suffix of the target uniform resource locator URL is not matched with the target suffix, detecting whether the target uniform resource locator URL belongs to a monitoring set.
3. The method of claim 2, wherein the monitoring set includes a monitored uniform resource locator, URL, of a historical monitoring record;
the detecting whether the target uniform resource locator URL belongs to a monitoring set comprises:
calculating a hash value of the target uniform resource locator URL;
determining hash values of all monitored Uniform Resource Locators (URLs) in the monitoring set, and detecting whether hash values matched with the hash value of the target URL exist in the hash values of all monitored URLs;
and if the hash value of each monitored uniform resource locator URL does not have a hash value matched with the hash value of the target uniform resource locator URL, determining that the target uniform resource locator URL does not belong to the monitoring set.
4. The method of claim 2, wherein a monitored URL included in the monitoring set is null;
the detecting whether the target uniform resource locator URL belongs to a monitoring set comprises:
and if the monitored uniform resource locator URL in the monitoring set is empty, determining that the target uniform resource locator URL does not belong to the monitoring set.
5. The method of claim 1, further comprising:
if the target uniform resource locator URL belongs to the monitoring set, acquiring a request frequency aiming at the target uniform resource locator URL;
and if the request frequency is out of the target range, outputting alarm prompt information, wherein the alarm prompt information comprises the target uniform resource locator URL, and the alarm prompt information is used for prompting that the request frequency of the target uniform resource locator URL is abnormal.
6. The method according to any one of claims 1-5, wherein the monitoring the target uniform resource locator URL if the page content requested by the target request matches M of the N target features comprises:
and if the page content requested by the target request is matched with M target features in the N target features, monitoring the request frequency of the target Uniform Resource Locator (URL), and adding the target URL into the monitoring set.
7. A uniform resource locator, URL, monitoring apparatus, comprising:
the crawler crawling system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring at least one preset feature, analyzing which key fields are specifically used by each preset feature in all page contents crawled by a crawler by using big data, respectively extracting all key fields representing each preset feature in the at least one preset feature, and determining N target features from the key fields representing the at least one preset feature, wherein one key field is one target feature;
the second acquisition module is used for acquiring a target Uniform Resource Locator (URL) carried in the received target request;
a first detection module, configured to detect whether page content requested by the target request matches M target features of the N target features when the target uniform resource locator URL does not belong to a monitoring set, where the monitoring set includes a monitored uniform resource locator URL, and page content corresponding to the monitored uniform resource locator URL matches at least one target feature of the N target features;
the monitoring module is used for monitoring the target uniform resource locator URL when the page content requested by the target request is matched with M target characteristics in the N target characteristics;
a third obtaining module, configured to, when the target uniform resource locator URL belongs to a monitoring set, obtain a request frequency for the target uniform resource locator URL, obtain a request frequency curve of the target uniform resource locator URL in a history, compare whether the obtained request frequency for the target uniform resource locator URL meets a trend of the request frequency curve, or compare with the request frequency curve, compare whether the request frequency of the target uniform resource locator URL has a sudden change, where the request frequency curve is used to describe a relationship between the number of received HTTP requests including the target uniform resource locator URL and time;
the output module is used for outputting alarm prompt information when the request frequency of the target uniform resource locator URL does not meet the trend of the request frequency curve or the request frequency of the target uniform resource locator URL is suddenly changed compared with the request frequency curve;
wherein N and M are integers greater than or equal to 1, and M is less than or equal to N.
8. The apparatus of claim 7, further comprising:
a second detection module for detecting whether a filename suffix of the target uniform resource locator URL matches a target suffix, the target suffix including a filename suffix of at least one non-monitored file;
a third detecting module, configured to detect whether the target uniform resource locator URL belongs to a monitoring set when the filename suffix of the target uniform resource locator URL does not match the target suffix.
9. A server, comprising a processor and a memory, the processor and the memory being interconnected, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any one of claims 1-6.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-6.
CN201811018419.4A 2018-08-31 2018-08-31 URL monitoring method and device Active CN109302383B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811018419.4A CN109302383B (en) 2018-08-31 2018-08-31 URL monitoring method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811018419.4A CN109302383B (en) 2018-08-31 2018-08-31 URL monitoring method and device

Publications (2)

Publication Number Publication Date
CN109302383A CN109302383A (en) 2019-02-01
CN109302383B true CN109302383B (en) 2022-04-29

Family

ID=65166081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811018419.4A Active CN109302383B (en) 2018-08-31 2018-08-31 URL monitoring method and device

Country Status (1)

Country Link
CN (1) CN109302383B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111240948A (en) * 2019-11-18 2020-06-05 北京博睿宏远数据科技股份有限公司 Experience data processing method and device, computer equipment and storage medium
CN112437356B (en) * 2020-11-13 2021-09-28 珠海大横琴科技发展有限公司 Streaming media data processing method and device
CN112561715A (en) * 2020-12-22 2021-03-26 海腾保险代理有限公司 Electronic policy management method, electronic policy management device, electronic device and storage medium
CN113904879A (en) * 2021-12-10 2022-01-07 北京指掌易科技有限公司 Mobile terminal file tracking method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101534306A (en) * 2009-04-14 2009-09-16 深圳市腾讯计算机系统有限公司 Detecting method and a device for fishing website
CN102769632A (en) * 2012-07-30 2012-11-07 珠海市君天电子科技有限公司 Method and system for grading detection and prompt of fishing website
CN103685307A (en) * 2013-12-25 2014-03-26 北京奇虎科技有限公司 Method, system, client and server for detecting phishing fraud webpage based on feature library
CN106874165A (en) * 2015-12-14 2017-06-20 北京国双科技有限公司 Page detection method and device
CN107943954A (en) * 2017-11-24 2018-04-20 杭州安恒信息技术有限公司 Detection method, device and the electronic equipment of webpage sensitive information

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2505370C (en) * 2004-04-26 2015-12-01 Watchfire Corporation Method and system for website analysis
US8943039B1 (en) * 2006-08-25 2015-01-27 Riosoft Holdings, Inc. Centralized web-based software solution for search engine optimization
US9172712B2 (en) * 2009-10-07 2015-10-27 At&T Intellectual Property I, L.P. Method and system for improving website security

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101534306A (en) * 2009-04-14 2009-09-16 深圳市腾讯计算机系统有限公司 Detecting method and a device for fishing website
CN102769632A (en) * 2012-07-30 2012-11-07 珠海市君天电子科技有限公司 Method and system for grading detection and prompt of fishing website
CN103685307A (en) * 2013-12-25 2014-03-26 北京奇虎科技有限公司 Method, system, client and server for detecting phishing fraud webpage based on feature library
CN106874165A (en) * 2015-12-14 2017-06-20 北京国双科技有限公司 Page detection method and device
CN107943954A (en) * 2017-11-24 2018-04-20 杭州安恒信息技术有限公司 Detection method, device and the electronic equipment of webpage sensitive information

Also Published As

Publication number Publication date
CN109302383A (en) 2019-02-01

Similar Documents

Publication Publication Date Title
CN109302383B (en) URL monitoring method and device
CN107729352B (en) Page resource loading method and terminal equipment
CN110798472B (en) Data leakage detection method and device
CN113489713B (en) Network attack detection method, device, equipment and storage medium
CN112003838B (en) Network threat detection method, device, electronic device and storage medium
CN109391673B (en) Method, system and terminal equipment for managing update file
WO2020000749A1 (en) Method and apparatus for detecting unauthorized vulnerabilities
CN110798488B (en) Web application attack detection method
CN108156121B (en) Traffic hijacking monitoring method and device and traffic hijacking alarm method and device
CN112703496B (en) Content policy based notification to application users regarding malicious browser plug-ins
CN109598131B (en) File uploading and downloading method and device, electronic equipment and storage medium
CN109684878B (en) Privacy information tamper-proofing method and system based on block chain technology
CN107070873B (en) Webpage illegal data screening method and system, data screening server and browser
CN109067794B (en) Network behavior detection method and device
CN109145651B (en) Data processing method and device
US11062019B2 (en) System and method for webpages scripts validation
CN107180194B (en) Method and device for vulnerability detection based on visual analysis system
CN111988644B (en) Anti-stealing-link method, device, equipment and storage medium for network video
CN112069033A (en) Page monitoring method and device, electronic equipment and storage medium
CN111181979B (en) Access control method, device, computer equipment and computer readable storage medium
CN112491650A (en) Method for dynamically analyzing call loop condition between services and related equipment
CN108650249B (en) POC attack detection method and device, computer equipment and storage medium
CN107332856B (en) Address information detection method and device, storage medium and electronic device
CN112769792B (en) ISP attack detection method and device, electronic equipment and storage medium
KR101565942B1 (en) Method and Apparatus for detecting ID theft

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant