CN109547421A - A kind of method and device for the URL that audits - Google Patents

A kind of method and device for the URL that audits Download PDF

Info

Publication number
CN109547421A
CN109547421A CN201811324566.4A CN201811324566A CN109547421A CN 109547421 A CN109547421 A CN 109547421A CN 201811324566 A CN201811324566 A CN 201811324566A CN 109547421 A CN109547421 A CN 109547421A
Authority
CN
China
Prior art keywords
url
request message
referer
domain
thresholding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811324566.4A
Other languages
Chinese (zh)
Inventor
魏逢
魏逢一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ruijie Networks Co Ltd
Original Assignee
Ruijie Networks Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ruijie Networks Co Ltd filed Critical Ruijie Networks Co Ltd
Priority to CN201811324566.4A priority Critical patent/CN109547421A/en
Publication of CN109547421A publication Critical patent/CN109547421A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers

Abstract

The invention discloses the method and devices of URL that audits a kind of, this method comprises: obtaining HTTP request message, it include the domain Referer in the heading of HTTP request message, when the thresholding that HTTP request message meets first condition and the domain Referer is not sky, after the thresholding for determining the domain Referer is not in the first url list, the thresholding in the domain Referer is judged whether in URL cache pool, if it is not, the URL for HTTP request message request of then auditing.The technical solution filters out the URL for not needing audit largely, promotes the efficiency of URL audit, reduces the burden of audit device.

Description

A kind of method and device for the URL that audits
Technical field
The present embodiments relate to network communication technology field more particularly to a kind of audit URL (Uniform Resource Locator, uniform resource locator) method and device.
Background technique
With the development of network, Internet application has penetrated into each corner of social life, become people study, It works, indispensable tool of living.By auditing to URL, the internet behavior transparence at family can be used.
But when in the prior art, user accesses the homepage of a website, browser may not only initiate a URL request, Such as user by browser access " http://www.xxxxxx.org/ " when, browser obtain homepage while, browsing Device can also obtain a large amount of picture simultaneously, to generate a large amount of URL request, the part URL request of generation is as shown in table 1.
1 browser of table accesses the part URL request list that " www.xxxxxx.org/ " is generated
And for URL audit, the URL that browser access automatically generates does not have practical significance, if in audit process, It audits the URL requests of all generations, then audited a large amount of URL without audit, and burden can be not only brought to audit device, can also Reduce the efficiency of URL audit.
Summary of the invention
The embodiment of the present invention provides the method and device of URL that audits a kind of, filters out the URL for not needing audit largely, mentions The efficiency for rising URL audit, reduces the burden of audit device.
The method of URL that audits provided in an embodiment of the present invention a kind of, comprising:
Obtain HTTP (HyperText Transfer Protocol, hypertext transfer protocol) request message, the HTTP It include Referer (source) domain in the heading of request message;
When the thresholding that the HTTP request message meets first condition and the domain Referer is not sky, determining After the thresholding in the domain Referer is stated not in the first url list, judge the thresholding in the domain Referer whether in URL cache pool In, if it is not, the URL for the HTTP request message request of then auditing;
Wherein, the first condition includes that the suffix of the URL of the HTTP request message request is not audited preset In suffix list;First url list is the url list determined according to website is preset.
Optionally, the first condition further includes the content type of the corresponding http response message of the HTTP request message In preset type of audit and/or the size of contents of the corresponding http response message of the HTTP request message is not less than default Size.
Optionally, after the URL of the audit HTTP request message request, further includes:
The URL of the HTTP request message request is stored in the URL cache pool, and in the HTTP request message The URL of the HTTP request message request is deleted after the caching duration of the URL of request is expired.
Optionally, further includes:
When the thresholding that the HTTP request message does not meet first condition and the domain Referer is not sky, institute is judged The thresholding in the domain Referer is stated whether in the URL cache pool, if so, recalculating described in the URL cache pool The caching duration of the thresholding in the domain Referer;Otherwise, the thresholding in the domain Referer is stored in the URL cache pool, and The thresholding in the domain Referer is deleted after the caching duration of the thresholding in the domain Referer is expired.
Optionally, further includes:
After the thresholding for determining the domain Referer is in the first url list, the HTTP request message request is judged Whether URL is in the second url list, if so, the URL for the HTTP request message request of not auditing;Otherwise, described in audit The URL of HTTP request message request;Second url list is determined after browser accesses URL in first url list Url list, wherein not including the URL in first url list in second url list.
Optionally, the acquisition HTTP request message, comprising:
It obtains user by DPI (Deep Packet Inspection, deep message detection) identification technology and clicks behavior The HTTP request message generated.
Correspondingly, the embodiment of the invention also provides the devices of URL that audits a kind of, comprising:
Acquiring unit includes Referer in the heading of the HTTP request message for obtaining HTTP request message Domain;
Processing unit, the thresholding for meeting first condition and the domain Referer when the HTTP request message are not When empty, after the thresholding for determining the domain Referer is not in the first url list, judge the domain Referer thresholding whether In URL cache pool, if it is not, the URL for the HTTP request message request of then auditing;
Wherein, the first condition includes that the suffix of the URL of the HTTP request message request is not audited preset In suffix list;First url list is the url list determined according to website is preset.
Optionally, the first condition further includes the content type of the corresponding http response message of the HTTP request message In preset type of audit and/or the size of contents of the corresponding http response message of the HTTP request message is not less than default Size.
Optionally, the processing unit is also used to:
After the URL of the audit HTTP request message request, the URL of the HTTP request message request is deposited Storage in the URL cache pool, and the URL in the HTTP request message request caching duration it is expired after the HTTP is asked The URL of message request is asked to delete.
Optionally, the processing unit is also used to:
When the thresholding that the HTTP request message does not meet first condition and the domain Referer is not sky, institute is judged The thresholding in the domain Referer is stated whether in the URL cache pool, if so, recalculating described in the URL cache pool The caching duration of the thresholding in the domain Referer;Otherwise, the thresholding in the domain Referer is stored in the URL cache pool, and The thresholding in the domain Referer is deleted after the caching duration of the thresholding in the domain Referer is expired.
Optionally, the processing unit is also used to:
After the thresholding for determining the domain Referer is in the first url list, the HTTP request message request is judged Whether URL is in the second url list, if so, the URL for the HTTP request message request of not auditing;Otherwise, described in audit The URL of HTTP request message request;Second url list is determined after browser accesses URL in first url list Url list, wherein not including the URL in first url list in second url list.
Optionally, the acquiring unit is specifically used for:
User, which is obtained, by DPI identification technology clicks the HTTP request message that behavior generates.
Correspondingly, the embodiment of the invention also provides a kind of calculating equipment, comprising:
Memory, for storing program instruction;
Processor executes above-mentioned audit according to the program of acquisition for calling the program instruction stored in the memory The method of URL.
Correspondingly, the embodiment of the invention also provides a kind of computer-readable non-volatile memory medium, including computer Readable instruction, when computer is read and executes the computer-readable instruction, so that computer executes the side of above-mentioned audit URL Method.
The embodiment of the present invention shows to judge by four steps: 1, the suffix of the URL of HTTP request message request whether Preset suffix list of not auditing;2, whether the thresholding in the domain Referer is empty;3, whether the thresholding in the domain Referer is in the first URL In list;4, whether in URL cache pool, whether the final URL for determining HTTP request message request needs the thresholding in the domain Referer It audits.Through the above steps, the url filtering that can not will need the HTTP request message request of audit, promotes URL audit Efficiency reduces the burden of audit device.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill in field, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is a kind of schematic diagram of system architecture provided in an embodiment of the present invention;
Fig. 2 is a kind of flow diagram of the method for URL that audits provided in an embodiment of the present invention;
Fig. 3 is the flow diagram of the method for another audit URL provided in an embodiment of the present invention;
Fig. 4 is URL audit of the suffix provided in an embodiment of the present invention in preset suffix list of not auditing;
Fig. 5 is a kind of structural schematic diagram of the device of URL that audits provided in an embodiment of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention make into It is described in detail to one step, it is clear that described embodiments are only a part of the embodiments of the present invention, rather than whole implementation Example.Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts All other embodiment, shall fall within the protection scope of the present invention.
The system architecture that the method that Fig. 1 illustratively shows audit URL provided in an embodiment of the present invention is applicable in, this is Framework of uniting may include terminal 110, audit device 120.
Wherein, audit device 120 includes processor 122, communication interface 123 and memory 121.Communication interface 123 is used for It is communicated with terminal 110, the HTTP request message that transceiver terminal 110 transmits.
Processor 122 is the control centre of audit device 120, utilizes various interfaces and the entire audit device of connection 120 various pieces by running or execute the software program or module that are stored in memory 121, and are called and are stored in Data in memory 121, the various functions and processing data of equipment of auditing 120.Optionally, processor 122 can wrap Include one or more processing units.
Memory 121 can be used for storing software program and module, and processor 122 is stored in memory 121 by operation Software program and module, thereby executing various function application and data processing.Memory 121 can mainly include storage journey Sequence area and storage data area, wherein storing program area can application program needed for storage program area, at least one function etc.; Storage data area can store the data etc. created according to business processing.In addition, memory 121 may include high random access Memory, can also include nonvolatile memory, a for example, at least disk memory, flush memory device or other are volatile Property solid-state memory.
It should be noted that above-mentioned structure shown in FIG. 1 is only a kind of example, it is not limited in the embodiment of the present invention.
Based on foregoing description, Fig. 2 illustratively shows a kind of stream of the method for URL that audits provided in an embodiment of the present invention Journey, the process can be executed by the device for the URL that audits, which can be located in audit device, be also possible to the audit device. As shown in Fig. 2, the process specifically includes:
Step 201, HTTP request message is obtained.
It wherein, include the domain Referer in the heading of HTTP request message, the thresholding in the domain Referer shows HTTP and asks The source address for seeking the URL of message request can generally take Referer when browser is sent to server requests, and accuse Tell that the server browser is come from which page link, server takes this to obtain some information for handling.Into One step, when URL request is that user is directly manually entered URL generation in a browser, the thresholding in the domain Referer is sky;When When URL request is that user clicks the generation of certain network address, the thresholding in the domain Referer is not empty.
For example, the heading of HTTP request message is as shown in table 2, represented by URL be " http: // Www.xxxxxx.org/news/2010/12/07/0003.html ", there are the thresholding in the domain Referer be " http: // Www.xxxxxx.org/ " shows that the URL is from " http://www.xxxxxx.org/ " link.It may be browsing Device is automatically generated automatically according to the data of " http://www.xxxxxx.org/ " homepage, it is also possible to be that user clicks In " http://www.xxxxxx.org/ " homepage some link and generate.
The heading list of 2 HTTP request message of table
Preferably, user can be obtained by DPI identification technology and clicks the HTTP request message that behavior generates.Wherein, DPI Identification technology is a kind of deep packet inspection technology based on data packet, for different network application layer load (such as HTTP, DNS (Domain Name System, domain name system)) depth detection is carried out, it is determined by the payload detection to message Legitimacy.It can be identified by DPI identification technology and the HTTP request message that behavior generates is clicked by user, it can filtering example As the non-user such as non-browser application or APP (Application, application program) triggering click the HTTP request report that behavior generates Text.It avoids and gets the HTTP request message that a large amount of non-user click behavior generation, to influence URL audit efficiency.
Step 202, when the thresholding that the HTTP request message meets first condition and the domain Referer is not sky, After the thresholding for determining the domain Referer is not in the first url list, judge the thresholding in the domain Referer whether in URL In cache pool, if it is not, the URL for the HTTP request message request of then auditing.
Wherein, first condition includes the suffix of the URL of HTTP request message request not in preset suffix list of not auditing In;First url list is the url list determined according to website is preset.
First condition, the first url list, URL cache pool are further explained respectively.
(1) description below is done to first condition:
Due to containing the data such as a large amount of picture in website, browser has been initiated multiple URL and has been asked when accessing website It asks, wherein a big chunk URL request is all the URL request for being used to obtain the data such as picture that browser automatically generates, in order to By these to URL audit not practical significance URL request filter, need to preset a suffix list of not auditing, the column In table comprising do not need audit suffix, for example, do not need audit suffix be " .jpg ", " .gif ", " .ico ", " .css ", " .js ", " .png " etc..The suffix that HTTP request message meets the URL that first condition is equivalent to HTTP request message request does not exist In preset suffix list of not auditing, i.e., URL request of the suffix in suffix list of not auditing is filtered by first condition.Table URL in 2 obtains the URL for meeting first condition after first condition filters, and as shown in table 3, that is, filters and largely audits to URL Nonsensical URL request.
Table 3 meets the url list of first condition
In addition, the suffix that first condition not only may include the URL of HTTP request message request is not audited preset Suffix list can also include the content type (Content-Type) of the corresponding http response message of HTTP request message pre- If type of audit in and/or the corresponding http response message of HTTP request message size of contents (Content-Size) it is not small In pre-set dimension.
Wherein, http response message and HTTP request message are in corresponding relationship, after getting HTTP request message, it is also necessary to After obtaining the corresponding http response message of the HTTP request message, judge the content type of http response message whether preset In type of audit, which can rule of thumb be set, and may include HTML (HyperText Markup Language, HyperText Markup Language) format, XML (extensible markup language, Extensible Markup Language) Format, plain text format etc.;And/or judge whether the size of contents of http response message is not less than pre-set dimension, the content ruler It is very little rule of thumb to set, it can be set as 30 bytes.
(2) description below is done to the first url list:
First url list is the url list determined according to website is preset, wherein default website can be access times and be greater than N number of website of certain threshold value, is also possible to rule of thumb preset N number of website.The html file of N number of website is obtained, According to the respective corresponding URL in website of html grammar extraction, for example, obtaining the value of href according to grammer<a href=" ">, obtain The href value i.e. URL of the website arrived.
(3) description below is done to URL cache pool:
Preset URL can be put into URL cache pool, can also after getting HTTP request message, according to The URL of HTTP request message request determines URL to be cached, and the URL to be cached is put into URL cache pool.For URL to be cached can be obtained by following two mode:
First, it is not empty that the HTTP request message got, which meets first condition and the thresholding in the domain Referer, and determining should The URL of HTTP request message request is not the URL generated behind access preset website, i.e., in the heading of the HTTP request message For the thresholding in the domain Referer not in the first url list, then needing to judge to whether there is in URL cache pool has the domain Referer The URL of the HTTP request message request is determined as URL to be cached if not having by thresholding;If so, the HTTP is not asked then The URL of message request is asked to be determined as URL to be cached.
Second, it is not sky that the HTTP request message got, which does not meet first condition and the thresholding in the domain Referer, then need Judging, which whether there is in URL cache pool, has the thresholding in the domain Referer to be determined as the thresholding in the domain Referer if not having URL to be cached;If so, will then recalculate the caching duration of the thresholding in the domain Referer in URL cache pool.It needs to illustrate , the thresholding in the corresponding domain Referer URL is substantially also a URL, recalculates the domain Referer in URL cache pool The specific embodiment of caching duration of thresholding be explained in detail below.
This is got after the URL cached by above two mode, which is put into URL cache pool.
Further, each URL in URL cache pool is provided with caching duration, and URL caching duration is understood that It is present in the time span in cache pool for the URL, that is, after the caching duration for reaching the URL, by the URL from URL cache pool It deletes.In other words, after URL being put into URL cache pool, it is buffered at current time to need to calculate the URL in URL cache pool Duration then the URL can be delayed from URL after the currently buffered duration of the URL reaches the caching duration of the URL It deposits in pond and deletes.The caching duration of the URL can be rule of thumb set, can be 2s, is i.e. URL is present in URL cache pool After the time of caching reaches 2s, which is deleted from URL cache pool.
When the thresholding that HTTP request message meets first condition and the domain Referer is not sky, there is also the domains Referer Situation of the thresholding in the first url list.After the thresholding for determining the domain Referer is in the first url list, HTTP request is judged Whether the URL of message request is in the second url list, if so, the URL for HTTP request message request of not auditing;Otherwise, it audits The URL of HTTP request message request;Second url list is the URL column of determination after the URL in browser the first url list of access Table, wherein not including the URL in the first url list in the second url list.
As a kind of achievable mode, the obtaining step of the second url list are as follows: browser passes through reptile instrument or browser URL in extender simulation the first url list of artificial access;Record all URL requests generated when browser access;It will The URL in the first url list in all URL requests generated is deleted;The remaining URL list formed is determined as second Url list.By the obtaining step of the second url list it is found that the second url list is given birth to automatically when can be understood as browser access At url list, further, the URL that automatically generates is present in second without audit, that is, determination when browser accesses URL in url list is not necessarily to audit.
After the thresholding for determining the domain Referer is in the first url list, judge HTTP request message request URL whether In the second url list, it can be understood as whether the URL for judging HTTP request message request automatically generates when being browser access URL.If so, the URL for the HTTP request message request that do not need to audit can be determined;Otherwise, the HTTP that needs to audit is asked Seek the URL of message request.
In addition, when HTTP request message meets first condition, it, can be true when the thresholding for further including the domain Referer is sky The URL of the fixed HTTP request message request is that user is directly manually entered URL generation in a browser, it is determined that the HTTP is asked The URL of message request is asked to need to audit.
Above embodiment described the streams for determining whether URL needs to audit when HTTP request message meets first condition The process flow when HTTP request message does not meet first condition is detailed below in journey.
When HTTP request message does not meet first condition, it can determine the suffix of the URL of HTTP request message request In preset suffix list of not auditing, it is determined that the URL of the HTTP request message request does not need audit.Further , it is also necessary to judge whether the domain Referer in the heading of HTTP request message is empty:
If the thresholding in the domain Referer be not it is empty, judge the thresholding in the domain Referer whether in URL cache pool, if so, Then recalculate the caching duration of the thresholding in the domain Referer in URL cache pool;Otherwise, the thresholding in the domain Referer is stored in In URL cache pool, and the thresholding in the domain Referer is deleted after the caching duration of the thresholding in the domain Referer is expired.
It is construed to, when HTTP request message does not meet first condition, i.e. the suffix of the URL of HTTP request message request exists In preset suffix list of not auditing, such as the suffix of URL of HTTP request message request is " gif " or " png ", then carries out the The judgement of one step: determine the domain Referer thresholding whether be it is empty, if the thresholding in the domain Referer is sky, this can be directly determined The URL of HTTP request message request does not need to audit;The thresholding in the domain Referer is not sky, then carries out second step judgement: Whether the thresholding in the domain Referer is in URL cache pool, if so, recalculating the thresholding in the domain Referer in URL cache pool Caching duration otherwise the thresholding in the domain Referer is stored in URL cache pool, and the caching of the thresholding in the domain Referer The thresholding in the domain Referer is deleted after duration is expired.
Following explanation is done to the judgement of above-mentioned second step, the suffix of the URL of HTTP request message request is not audited preset When in suffix list, need to extract the thresholding in the domain Referer of the URL.Whether to judge the thresholding in the domain Referer of the URL It is buffered in URL cache pool, if not having, the thresholding in the domain Referer of the URL is buffered in URL cache pool, and is counted Calculate the caching duration of the thresholding in the domain Referer.If the thresholding in the domain Referer has been buffered in URL cache pool, need After the caching duration of the thresholding in the originally buffered domain Referer is reset, caching duration is recalculated.
For example, the URL of HTTP request message request is " http://www.xxxxxx.org/images/colour/ The suffix " .gif " of yellow.gif ", the URL extract the domain Referer of the URL in preset suffix list of not auditing Thresholding be " http://www.xxxxxx.org ", judge in URL cache pool with the presence or absence of have " http: // www.xxxxxx.org".When there is no " http://www.xxxxxx.org " in URL cache pool, then incite somebody to action " http: // Www.xxxxxx.org " is buffered in URL cache pool, and calculates the caching duration of " http://www.xxxxxx.org ".When When having " http://www.xxxxxx.org " in URL cache pool, it is assumed that be somebody's turn to do " http://www.xxxxxx.org " when current Quarter has cached 2s (it is assumed that when the caching of the thresholding in the domain Referer being buffered in URL cache pool in URL cache pool After long arrival 3s, the thresholding in the domain Referer is deleted), then recalculate in URL cache pool " http: // The caching duration 2s of " http://www.xxxxxx.org " is reset, is opened again by the caching duration of www.xxxxxx.org " Begin to calculate.
When second step judges, the thresholding in the domain Referer is judged whether in URL cache pool, if so, recalculating The caching duration of the thresholding in the domain Referer in URL cache pool, is equivalent to the caching duration for the thresholding for having refreshed the domain Referer, into And the thresholding for extending the domain Referer is present in the total duration in URL cache pool.Existing beneficial effect can combine above-mentioned example Son does following analysis:
Since " http://www.xxxxxx.org/images/colour/yellow.gif " has refreshed in URL cache pool The caching duration of " http://www.xxxxxx.org ", extending " http://www.xxxxxx.org " is present in URL relatively Total duration in cache pool.When the URL for getting HTTP request message request is " https: //www.xxxxxx.org/ When threads/new-install-help.20492/ " (suffix of the URL is not in preset suffix list of not auditing), then Judge that " https: //www.xxxxxx.org/threads/new-install-help.20492/ " there are the domains in the domain Referer Value is after " https: //www.xxxxxx.org/ ", whether judgement " https: //www.xxxxxx.org/ " is in URL cache pool In, if " https: //www.xxxxxx.org/ " is present in URL cache pool, the URL of not auditing " https: // www.xxxxxx.org/threads/new-install-help.20492/".In above-mentioned example, because extending Referer The thresholding in domain is present in the total duration in URL cache pool, then filtered without audit URL " https: // Www.xxxxxx.org/threads/new-install-help.20492/ " will not need the HTTP request message of audit The url filtering of request promotes the efficiency of URL audit, reduces the burden of audit device.Certainly, if " https: // Www.xxxxxx.org/ " is not present in URL cache pool, then by https in URL: //www.xxxxxx.org/threads/ New-install-help.20492/ " is cached into URL cache pool.
In addition, when HTTP request message does not meet first condition, no matter the Referer of the heading of HTTP request message Whether the thresholding in domain is empty, the URL for the HTTP request message request that all do not need to audit.By this way, HTTP can be asked Ask the suffix of message request for the url filtering of " .jpg ", " .gif " etc., it is preliminary to delete the URL for not needing audit.
In above-described embodiment, when HTTP request message does not meet first condition, in the message for determining HTTP request message After the thresholding in the domain Referer in head is not sky, by the thresholding in the domain Referer in the heading of HTTP request message cache to In URL cache pool or the caching duration of the thresholding in the refreshing domain Referer, the thresholding for extending the domain Referer are present in URL cache pool In total duration, filter out largely do not need audit URL, promoted URL audit efficiency, reduce the burden of audit device.
Embodiment in order to preferably explain the present invention will describe the stream of audit URL under specific implement scene below Journey, as shown in figure 3, specific as follows:
Step 301, user is obtained by DPI identification technology and clicks the HTTP request message that behavior generates.
Step 302, judge URL suffix whether in preset suffix list of not auditing.If so, the URL that do not audit, no Then, step 303 is turned to.
Judge the suffix of URL whether in preset suffix list of not auditing, that is, judge URL suffix whether be " .jpg ", " .gif " etc., if so, the URL that do not audit.Otherwise, judge the corresponding http response message of HTTP request message Whether content type is in preset type of audit.
Step 303, judge content type whether in preset type of audit.It is not otherwise examined if so, turning to step 304 Count the URL.
Judge the content type of the corresponding http response message of HTTP request message whether in preset type of audit, i.e., Whether the content type for judging the corresponding http response message of HTTP request message is html format, XML format, plain text format Deng if so, continuing to judge whether the size of contents of the corresponding http response message of HTTP request message is not less than pre-set dimension. Otherwise, do not audit the URL.
Step 304, judge whether size of contents is not less than pre-set dimension.If so, otherwise turning to step 305 does not audit this URL。
Judge whether the size of contents of the corresponding http response message of HTTP request message is not less than pre-set dimension, if so, Then continue to judge the thresholding in the domain Referer in the heading of the HTTP request message whether not for sky;Otherwise, this is not audited URL。
Step 305, judge the thresholding in the domain Referer whether not for sky.If so, turning to step 306, otherwise, audit should URL。
Judge the thresholding in the domain Referer whether not for sky, that is, judges whether URL is that certain page link comes.If so, sentencing Whether the thresholding in the disconnected domain Referer is in the first url list.Otherwise, it determines the URL is that user is directly defeated manually in a browser Enter URL generation, audit the URL.
Step 306, judge the thresholding in the domain Referer whether in the first url list.If so, step 309 is turned to, otherwise, Turn to step 307.
The thresholding in the domain Referer is judged whether in the first url list, that is, judges whether the URL is from default website Whether chain is taken in URL, if so, judging URL in the second url list.Otherwise, it determines the URL is from except default website Whether chain is taken in the URL of website in addition, further determine that the thresholding in the domain Referer in URL cache pool.
Step 307, judge the thresholding in the domain Referer whether in URL cache pool.If so, otherwise the URL that do not audit turns To step 308.
Judge the thresholding in the domain Referer whether in URL cache pool.If it is corresponding to have existed the URL in URL cache pool The domain Referer thresholding, it is determined that the thresholding in the domain Referer is less than caching duration, and do not audit the URL.If slow in URL Depositing Chi Zhongwei, there are the thresholdings in the domain Referer, then need URL cache pool is added, and carry out caching timing, and auditing should URL。
Step 308, cache pool is added in URL, carries out caching timing.
After URL cache pool is added in the URL, caching timing is carried out, until when the URL is buffered in the caching of URL cache pool After length is expired, which is deleted from URL cache pool.
Step 309, judge URL whether in the second url list.If so, the URL that do not audit, otherwise, audit the URL.
URL is judged whether in the second url list, that is, judges whether the URL is that browser accesses in the first url list When URL, the URL that automatically generates.If so, determining that the URL is automatically generated, do not need to audit.Otherwise, audit the URL.
In above-described embodiment, after determining the URL suffix in preset suffix list of not auditing, the stream of Fig. 4 can be executed Journey, specific as follows:
Step 401, judge the thresholding in the domain Referer whether not for sky.If so, otherwise turning to step 402 does not audit this URL。
The thresholding in the domain Referer is judged whether not for sky, i.e., in the suffix for determining the URL in preset suffix column of not auditing After in table, judge whether the URL is that certain page link comes.If the URL is that certain page link comes, judge the URL's Whether the thresholding in the domain Referer is in URL cache pool.If the URL is not that certain page link comes, do not audit the URL.
Step 402, judge the thresholding in the domain Referer whether in URL cache pool.If so, turning to step 403, otherwise, turn To step 404.
Judge the thresholding in the domain Referer whether in URL cache pool.It is cached in URL cache pool if so, determining The thresholding in the domain Referer, and the thresholding caching duration in the domain Referer is not out of date, further, refreshes the domain Referer Caching timing of the thresholding in URL cache pool.Otherwise, the thresholding in the domain Referer is cached into cache pool, carries out caching meter When.
Step 403, after the caching timing of the thresholding in the domain Referer is reset in URL cache pool, reclocking.
Step 404, the thresholding in the domain Referer is cached into cache pool, carries out caching timing.
Since the specific implementation of the present embodiment has described in other embodiments, therefore not to repeat here.
In above-described embodiment, judged by four steps: 1, whether the suffix of the URL of HTTP request message request is default Suffix list of not auditing;2, whether the thresholding in the domain Referer is empty;3, whether the thresholding in the domain Referer is in the first url list In;4, whether the thresholding in the domain Referer is in URL cache pool, and exists in the suffix for the URL for determining HTTP request message request When in preset suffix list of not auditing, the thresholding in the domain Referer is cached into URL cache pool to or is refreshed the domain Referer Whether the caching duration of thresholding, the final URL for determining HTTP request message request need to audit, largely be not required in this way, filtering out The URL to be audited promotes the efficiency of URL audit, reduces the burden of audit device.
In addition, it can also be directed to the URL of each audit device taken at regular intervals as a kind of implementation of the invention, it will The URL numerical digit of each audit device acquisition is sent to given server in URL (TOPN URL) fixed cycle of top n, the service Device summarizes the URL for the respective top n that all audit devices are fed back, and determines that URL numerical digit is in preceding in the URL after summarizing M URL (TOPM URL).Introduce human assistance, by artificial judgment should be located at preceding M URL (TOPM URL) whether be The URL for needing to audit.If it is not, then these artificial determining URL not audited are added in the library rubbish URL.By the library rubbish URL In rubbish url list be sent to each audit device so that audit device can also be according to rubbish URL when auditing URL List is compared, if hit, audit device determine that the URL does not audit, otherwise carry out other processing.
Based on the same inventive concept, Fig. 5 illustratively shows the device of URL that audits provided in an embodiment of the present invention a kind of Structure, the device can audit URL method process.
Acquiring unit 501 includes in the heading of the HTTP request message for obtaining HTTP request message The domain Referer;
Processing unit 502, for meeting the thresholding in first condition and the domain Referer when the HTTP request message not When for sky, after the thresholding for determining the domain Referer is not in the first url list, judge that the thresholding in the domain Referer is It is no in URL cache pool, if it is not, the URL for the HTTP request message request of then auditing;
Wherein, the first condition includes that the suffix of the URL of the HTTP request message request is not audited preset In suffix list;First url list is the url list determined according to website is preset.
Optionally, the first condition further includes the content type of the corresponding http response message of the HTTP request message In preset type of audit and/or the size of contents of the corresponding http response message of the HTTP request message is not less than default Size.
Optionally, the processing unit 502 is also used to:
After the URL of the audit HTTP request message request, the URL of the HTTP request message request is deposited Storage in the URL cache pool, and the URL in the HTTP request message request caching duration it is expired after the HTTP is asked The URL of message request is asked to delete.
Optionally, the processing unit 502 is also used to:
When the thresholding that the HTTP request message does not meet first condition and the domain Referer is not sky, institute is judged The thresholding in the domain Referer is stated whether in the URL cache pool, if so, recalculating described in the URL cache pool The caching duration of the thresholding in the domain Referer;Otherwise, the thresholding in the domain Referer is stored in the URL cache pool, and The thresholding in the domain Referer is deleted after the caching duration of the thresholding in the domain Referer is expired.
Optionally, the processing unit 502 is also used to:
After the thresholding for determining the domain Referer is in the first url list, the HTTP request message request is judged Whether URL is in the second url list, if so, the URL for the HTTP request message request of not auditing;Otherwise, described in audit The URL of HTTP request message request;Second url list is determined after browser accesses URL in first url list Url list, wherein not including the URL in first url list in second url list.
Optionally, the acquiring unit 501 is specifically used for:
User, which is obtained, by DPI identification technology clicks the HTTP request message that behavior generates.
Based on the same inventive concept, the embodiment of the invention also provides a kind of calculating equipment, comprising:
Memory, for storing program instruction;
Processor executes above-mentioned audit according to the program of acquisition for calling the program instruction stored in the memory The method of URL.
Based on the same inventive concept, the embodiment of the invention also provides a kind of computer-readable non-volatile memory medium, Including computer-readable instruction, when computer is read and executes the computer-readable instruction, so that computer execution is above-mentioned The method of audit URL.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (14)

1. a kind of method for uniform resource position mark URL of auditing characterized by comprising
HTTP request message is obtained, includes source Referer in the heading of the HTTP request message Domain;
When the thresholding that the HTTP request message meets first condition and the domain Referer is not sky, described in determination After the thresholding in the domain Referer is not in the first url list, judge the thresholding in the domain Referer whether in URL cache pool, If it is not, the URL for the HTTP request message request of then auditing;
Wherein, the first condition includes the suffix of the URL of the HTTP request message request not in preset suffix of not auditing In list;First url list is the url list determined according to website is preset.
2. the method as described in claim 1, which is characterized in that the first condition further includes that the HTTP request message is corresponding Http response message content type in preset type of audit and/or the corresponding http response of the HTTP request message The size of contents of message is not less than pre-set dimension.
3. the method as described in claim 1, which is characterized in that the URL of the audit HTTP request message request it Afterwards, further includes:
The URL of the HTTP request message request is stored in the URL cache pool, and in the HTTP request message request URL caching duration it is expired after the URL of the HTTP request message request is deleted.
4. the method as described in claim 1, which is characterized in that further include:
When the thresholding that the HTTP request message does not meet first condition and the domain Referer is not sky, described in judgement Whether the thresholding in the domain Referer is in the URL cache pool, if so, recalculating described in the URL cache pool The caching duration of the thresholding in the domain Referer;Otherwise, the thresholding in the domain Referer is stored in the URL cache pool, and The thresholding in the domain Referer is deleted after the caching duration of the thresholding in the domain Referer is expired.
5. the method as described in claim 1, which is characterized in that further include:
After the thresholding for determining the domain Referer is in the first url list, the URL of the HTTP request message request is judged Whether in the second url list, if so, the URL for the HTTP request message request of not auditing;Otherwise, audit the HTTP The URL of request message request;Second url list is determined after browser accesses URL in first url list Url list, wherein not including the URL in first url list in second url list.
6. such as method described in any one of claim 1 to 5, which is characterized in that the acquisition HTTP request message, comprising:
DPI identification technology, which is detected, by deep message obtains the HTTP request message that user clicks behavior generation.
7. a kind of device for uniform resource position mark URL of auditing characterized by comprising
Acquiring unit is wrapped in the heading of the HTTP request message for obtaining HTTP request message Include the source domain Referer;
Processing unit, when the thresholding for meeting first condition and the domain Referer when the HTTP request message is not sky, After the thresholding for determining the domain Referer is not in the first url list, judge the thresholding in the domain Referer whether in URL In cache pool, if it is not, the URL for the HTTP request message request of then auditing;
Wherein, the first condition includes the suffix of the URL of the HTTP request message request not in preset suffix of not auditing In list;First url list is the url list determined according to website is preset.
8. device as claimed in claim 7, which is characterized in that the first condition further includes that the HTTP request message is corresponding Http response message content type in preset type of audit and/or the corresponding http response of the HTTP request message The size of contents of message is not less than pre-set dimension.
9. device as claimed in claim 7, which is characterized in that the processing unit is also used to:
After the URL of the audit HTTP request message request, the URL of the HTTP request message request is stored in In the URL cache pool, and the URL in the HTTP request message request caching duration it is expired after by the HTTP request report The URL of text request is deleted.
10. device as claimed in claim 7, which is characterized in that the processing unit is also used to:
When the thresholding that the HTTP request message does not meet first condition and the domain Referer is not sky, described in judgement Whether the thresholding in the domain Referer is in the URL cache pool, if so, recalculating described in the URL cache pool The caching duration of the thresholding in the domain Referer;Otherwise, the thresholding in the domain Referer is stored in the URL cache pool, and The thresholding in the domain Referer is deleted after the caching duration of the thresholding in the domain Referer is expired.
11. device as claimed in claim 7, which is characterized in that the processing unit is also used to:
After the thresholding for determining the domain Referer is in the first url list, the URL of the HTTP request message request is judged Whether in the second url list, if so, the URL for the HTTP request message request of not auditing;Otherwise, audit the HTTP The URL of request message request;Second url list is determined after browser accesses URL in first url list Url list, wherein not including the URL in first url list in second url list.
12. such as the described in any item devices of claim 7 to 11, which is characterized in that the acquiring unit is specifically used for:
DPI identification technology, which is detected, by deep message obtains the HTTP request message that user clicks behavior generation.
13. a kind of calculating equipment characterized by comprising
Memory, for storing program instruction;
Processor requires 1 to 6 according to the program execution benefit of acquisition for calling the program instruction stored in the memory Described in any item methods.
14. a kind of computer-readable non-volatile memory medium, which is characterized in that including computer-readable instruction, work as computer When reading and executing the computer-readable instruction, so that computer executes such as method as claimed in any one of claims 1 to 6.
CN201811324566.4A 2018-11-08 2018-11-08 A kind of method and device for the URL that audits Pending CN109547421A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811324566.4A CN109547421A (en) 2018-11-08 2018-11-08 A kind of method and device for the URL that audits

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811324566.4A CN109547421A (en) 2018-11-08 2018-11-08 A kind of method and device for the URL that audits

Publications (1)

Publication Number Publication Date
CN109547421A true CN109547421A (en) 2019-03-29

Family

ID=65845283

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811324566.4A Pending CN109547421A (en) 2018-11-08 2018-11-08 A kind of method and device for the URL that audits

Country Status (1)

Country Link
CN (1) CN109547421A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112311823A (en) * 2019-07-29 2021-02-02 百度(中国)有限公司 Flow control method and device of auditing system and server

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102098229A (en) * 2011-03-04 2011-06-15 北京星网锐捷网络技术有限公司 Method and device for optimizing and auditing uniform resource locator (URL) as well as network device
CN102780681A (en) * 2011-05-11 2012-11-14 中兴通讯股份有限公司 URL (Uniform Resource Locator) filtering system and URL filtering method
CN104239353A (en) * 2013-06-20 2014-12-24 上海博达数据通信有限公司 WEB classification control and log auditing method
CN105991634A (en) * 2015-04-29 2016-10-05 杭州迪普科技有限公司 Access control method and apparatus
CN106534243A (en) * 2015-09-14 2017-03-22 阿里巴巴集团控股有限公司 Caching, requesting and responding method based on HTTP protocol and corresponding device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102098229A (en) * 2011-03-04 2011-06-15 北京星网锐捷网络技术有限公司 Method and device for optimizing and auditing uniform resource locator (URL) as well as network device
CN102780681A (en) * 2011-05-11 2012-11-14 中兴通讯股份有限公司 URL (Uniform Resource Locator) filtering system and URL filtering method
CN104239353A (en) * 2013-06-20 2014-12-24 上海博达数据通信有限公司 WEB classification control and log auditing method
CN105991634A (en) * 2015-04-29 2016-10-05 杭州迪普科技有限公司 Access control method and apparatus
CN106534243A (en) * 2015-09-14 2017-03-22 阿里巴巴集团控股有限公司 Caching, requesting and responding method based on HTTP protocol and corresponding device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112311823A (en) * 2019-07-29 2021-02-02 百度(中国)有限公司 Flow control method and device of auditing system and server
CN112311823B (en) * 2019-07-29 2023-01-31 百度(中国)有限公司 Flow control method and device of auditing system and server

Similar Documents

Publication Publication Date Title
TWI738720B (en) Page jump method and device
JP6488508B2 (en) Web page access method, apparatus, device, and program
CN103338249B (en) Caching method and device
US20080244740A1 (en) Browser-independent editing of content
CN106339398A (en) Pre-reading method and device for webpage and intelligent terminal device
CN106294648A (en) A kind of processing method and processing device for page access path
CN104994139B (en) A kind of system and method to high concurrent network request quick response
CN104735062B (en) A kind of network user register method and server
CN104052809B (en) A kind of flow-dividing control method and apparatus of website test
CN102857369A (en) Website log saving system, method and apparatus
US8713368B2 (en) Methods for testing OData services
CN106126693A (en) The sending method of the related data of a kind of webpage and device
CN103118007A (en) Method and system of acquiring user access behavior
CN106326261A (en) Pre-reading method and device for webpage and intelligent terminal device
CN106599270B (en) Network data capturing method and crawler
CN107992529A (en) A kind of key word association method and apparatus
CN106886545A (en) The caching method and device of page display method, page resource
KR20180074774A (en) How to identify malicious websites, devices and computer storage media
CN103678639A (en) Method and device for reminding information updating in browser
CN103455492B (en) A kind of method and apparatus of search and webpage
CN103312692B (en) Chained address safety detecting method and device
CN109508437A (en) A kind of search website auditing method, system and gateway and storage medium
CN104657359A (en) Webpage content and style recording method by using website
CN109547421A (en) A kind of method and device for the URL that audits
CN110321510A (en) Page rendering method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190329

RJ01 Rejection of invention patent application after publication