CN109547421A - A kind of method and device for the URL that audits - Google Patents
A kind of method and device for the URL that audits Download PDFInfo
- Publication number
- CN109547421A CN109547421A CN201811324566.4A CN201811324566A CN109547421A CN 109547421 A CN109547421 A CN 109547421A CN 201811324566 A CN201811324566 A CN 201811324566A CN 109547421 A CN109547421 A CN 109547421A
- Authority
- CN
- China
- Prior art keywords
- url
- request message
- referer
- domain
- thresholding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/22—Parsing or analysis of headers
Abstract
The invention discloses the method and devices of URL that audits a kind of, this method comprises: obtaining HTTP request message, it include the domain Referer in the heading of HTTP request message, when the thresholding that HTTP request message meets first condition and the domain Referer is not sky, after the thresholding for determining the domain Referer is not in the first url list, the thresholding in the domain Referer is judged whether in URL cache pool, if it is not, the URL for HTTP request message request of then auditing.The technical solution filters out the URL for not needing audit largely, promotes the efficiency of URL audit, reduces the burden of audit device.
Description
Technical field
The present embodiments relate to network communication technology field more particularly to a kind of audit URL (Uniform Resource
Locator, uniform resource locator) method and device.
Background technique
With the development of network, Internet application has penetrated into each corner of social life, become people study,
It works, indispensable tool of living.By auditing to URL, the internet behavior transparence at family can be used.
But when in the prior art, user accesses the homepage of a website, browser may not only initiate a URL request,
Such as user by browser access " http://www.xxxxxx.org/ " when, browser obtain homepage while, browsing
Device can also obtain a large amount of picture simultaneously, to generate a large amount of URL request, the part URL request of generation is as shown in table 1.
1 browser of table accesses the part URL request list that " www.xxxxxx.org/ " is generated
And for URL audit, the URL that browser access automatically generates does not have practical significance, if in audit process,
It audits the URL requests of all generations, then audited a large amount of URL without audit, and burden can be not only brought to audit device, can also
Reduce the efficiency of URL audit.
Summary of the invention
The embodiment of the present invention provides the method and device of URL that audits a kind of, filters out the URL for not needing audit largely, mentions
The efficiency for rising URL audit, reduces the burden of audit device.
The method of URL that audits provided in an embodiment of the present invention a kind of, comprising:
Obtain HTTP (HyperText Transfer Protocol, hypertext transfer protocol) request message, the HTTP
It include Referer (source) domain in the heading of request message;
When the thresholding that the HTTP request message meets first condition and the domain Referer is not sky, determining
After the thresholding in the domain Referer is stated not in the first url list, judge the thresholding in the domain Referer whether in URL cache pool
In, if it is not, the URL for the HTTP request message request of then auditing;
Wherein, the first condition includes that the suffix of the URL of the HTTP request message request is not audited preset
In suffix list;First url list is the url list determined according to website is preset.
Optionally, the first condition further includes the content type of the corresponding http response message of the HTTP request message
In preset type of audit and/or the size of contents of the corresponding http response message of the HTTP request message is not less than default
Size.
Optionally, after the URL of the audit HTTP request message request, further includes:
The URL of the HTTP request message request is stored in the URL cache pool, and in the HTTP request message
The URL of the HTTP request message request is deleted after the caching duration of the URL of request is expired.
Optionally, further includes:
When the thresholding that the HTTP request message does not meet first condition and the domain Referer is not sky, institute is judged
The thresholding in the domain Referer is stated whether in the URL cache pool, if so, recalculating described in the URL cache pool
The caching duration of the thresholding in the domain Referer;Otherwise, the thresholding in the domain Referer is stored in the URL cache pool, and
The thresholding in the domain Referer is deleted after the caching duration of the thresholding in the domain Referer is expired.
Optionally, further includes:
After the thresholding for determining the domain Referer is in the first url list, the HTTP request message request is judged
Whether URL is in the second url list, if so, the URL for the HTTP request message request of not auditing;Otherwise, described in audit
The URL of HTTP request message request;Second url list is determined after browser accesses URL in first url list
Url list, wherein not including the URL in first url list in second url list.
Optionally, the acquisition HTTP request message, comprising:
It obtains user by DPI (Deep Packet Inspection, deep message detection) identification technology and clicks behavior
The HTTP request message generated.
Correspondingly, the embodiment of the invention also provides the devices of URL that audits a kind of, comprising:
Acquiring unit includes Referer in the heading of the HTTP request message for obtaining HTTP request message
Domain;
Processing unit, the thresholding for meeting first condition and the domain Referer when the HTTP request message are not
When empty, after the thresholding for determining the domain Referer is not in the first url list, judge the domain Referer thresholding whether
In URL cache pool, if it is not, the URL for the HTTP request message request of then auditing;
Wherein, the first condition includes that the suffix of the URL of the HTTP request message request is not audited preset
In suffix list;First url list is the url list determined according to website is preset.
Optionally, the first condition further includes the content type of the corresponding http response message of the HTTP request message
In preset type of audit and/or the size of contents of the corresponding http response message of the HTTP request message is not less than default
Size.
Optionally, the processing unit is also used to:
After the URL of the audit HTTP request message request, the URL of the HTTP request message request is deposited
Storage in the URL cache pool, and the URL in the HTTP request message request caching duration it is expired after the HTTP is asked
The URL of message request is asked to delete.
Optionally, the processing unit is also used to:
When the thresholding that the HTTP request message does not meet first condition and the domain Referer is not sky, institute is judged
The thresholding in the domain Referer is stated whether in the URL cache pool, if so, recalculating described in the URL cache pool
The caching duration of the thresholding in the domain Referer;Otherwise, the thresholding in the domain Referer is stored in the URL cache pool, and
The thresholding in the domain Referer is deleted after the caching duration of the thresholding in the domain Referer is expired.
Optionally, the processing unit is also used to:
After the thresholding for determining the domain Referer is in the first url list, the HTTP request message request is judged
Whether URL is in the second url list, if so, the URL for the HTTP request message request of not auditing;Otherwise, described in audit
The URL of HTTP request message request;Second url list is determined after browser accesses URL in first url list
Url list, wherein not including the URL in first url list in second url list.
Optionally, the acquiring unit is specifically used for:
User, which is obtained, by DPI identification technology clicks the HTTP request message that behavior generates.
Correspondingly, the embodiment of the invention also provides a kind of calculating equipment, comprising:
Memory, for storing program instruction;
Processor executes above-mentioned audit according to the program of acquisition for calling the program instruction stored in the memory
The method of URL.
Correspondingly, the embodiment of the invention also provides a kind of computer-readable non-volatile memory medium, including computer
Readable instruction, when computer is read and executes the computer-readable instruction, so that computer executes the side of above-mentioned audit URL
Method.
The embodiment of the present invention shows to judge by four steps: 1, the suffix of the URL of HTTP request message request whether
Preset suffix list of not auditing;2, whether the thresholding in the domain Referer is empty;3, whether the thresholding in the domain Referer is in the first URL
In list;4, whether in URL cache pool, whether the final URL for determining HTTP request message request needs the thresholding in the domain Referer
It audits.Through the above steps, the url filtering that can not will need the HTTP request message request of audit, promotes URL audit
Efficiency reduces the burden of audit device.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this
For the those of ordinary skill in field, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is a kind of schematic diagram of system architecture provided in an embodiment of the present invention;
Fig. 2 is a kind of flow diagram of the method for URL that audits provided in an embodiment of the present invention;
Fig. 3 is the flow diagram of the method for another audit URL provided in an embodiment of the present invention;
Fig. 4 is URL audit of the suffix provided in an embodiment of the present invention in preset suffix list of not auditing;
Fig. 5 is a kind of structural schematic diagram of the device of URL that audits provided in an embodiment of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention make into
It is described in detail to one step, it is clear that described embodiments are only a part of the embodiments of the present invention, rather than whole implementation
Example.Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts
All other embodiment, shall fall within the protection scope of the present invention.
The system architecture that the method that Fig. 1 illustratively shows audit URL provided in an embodiment of the present invention is applicable in, this is
Framework of uniting may include terminal 110, audit device 120.
Wherein, audit device 120 includes processor 122, communication interface 123 and memory 121.Communication interface 123 is used for
It is communicated with terminal 110, the HTTP request message that transceiver terminal 110 transmits.
Processor 122 is the control centre of audit device 120, utilizes various interfaces and the entire audit device of connection
120 various pieces by running or execute the software program or module that are stored in memory 121, and are called and are stored in
Data in memory 121, the various functions and processing data of equipment of auditing 120.Optionally, processor 122 can wrap
Include one or more processing units.
Memory 121 can be used for storing software program and module, and processor 122 is stored in memory 121 by operation
Software program and module, thereby executing various function application and data processing.Memory 121 can mainly include storage journey
Sequence area and storage data area, wherein storing program area can application program needed for storage program area, at least one function etc.;
Storage data area can store the data etc. created according to business processing.In addition, memory 121 may include high random access
Memory, can also include nonvolatile memory, a for example, at least disk memory, flush memory device or other are volatile
Property solid-state memory.
It should be noted that above-mentioned structure shown in FIG. 1 is only a kind of example, it is not limited in the embodiment of the present invention.
Based on foregoing description, Fig. 2 illustratively shows a kind of stream of the method for URL that audits provided in an embodiment of the present invention
Journey, the process can be executed by the device for the URL that audits, which can be located in audit device, be also possible to the audit device.
As shown in Fig. 2, the process specifically includes:
Step 201, HTTP request message is obtained.
It wherein, include the domain Referer in the heading of HTTP request message, the thresholding in the domain Referer shows HTTP and asks
The source address for seeking the URL of message request can generally take Referer when browser is sent to server requests, and accuse
Tell that the server browser is come from which page link, server takes this to obtain some information for handling.Into
One step, when URL request is that user is directly manually entered URL generation in a browser, the thresholding in the domain Referer is sky;When
When URL request is that user clicks the generation of certain network address, the thresholding in the domain Referer is not empty.
For example, the heading of HTTP request message is as shown in table 2, represented by URL be " http: //
Www.xxxxxx.org/news/2010/12/07/0003.html ", there are the thresholding in the domain Referer be " http: //
Www.xxxxxx.org/ " shows that the URL is from " http://www.xxxxxx.org/ " link.It may be browsing
Device is automatically generated automatically according to the data of " http://www.xxxxxx.org/ " homepage, it is also possible to be that user clicks
In " http://www.xxxxxx.org/ " homepage some link and generate.
The heading list of 2 HTTP request message of table
Preferably, user can be obtained by DPI identification technology and clicks the HTTP request message that behavior generates.Wherein, DPI
Identification technology is a kind of deep packet inspection technology based on data packet, for different network application layer load (such as HTTP,
DNS (Domain Name System, domain name system)) depth detection is carried out, it is determined by the payload detection to message
Legitimacy.It can be identified by DPI identification technology and the HTTP request message that behavior generates is clicked by user, it can filtering example
As the non-user such as non-browser application or APP (Application, application program) triggering click the HTTP request report that behavior generates
Text.It avoids and gets the HTTP request message that a large amount of non-user click behavior generation, to influence URL audit efficiency.
Step 202, when the thresholding that the HTTP request message meets first condition and the domain Referer is not sky,
After the thresholding for determining the domain Referer is not in the first url list, judge the thresholding in the domain Referer whether in URL
In cache pool, if it is not, the URL for the HTTP request message request of then auditing.
Wherein, first condition includes the suffix of the URL of HTTP request message request not in preset suffix list of not auditing
In;First url list is the url list determined according to website is preset.
First condition, the first url list, URL cache pool are further explained respectively.
(1) description below is done to first condition:
Due to containing the data such as a large amount of picture in website, browser has been initiated multiple URL and has been asked when accessing website
It asks, wherein a big chunk URL request is all the URL request for being used to obtain the data such as picture that browser automatically generates, in order to
By these to URL audit not practical significance URL request filter, need to preset a suffix list of not auditing, the column
In table comprising do not need audit suffix, for example, do not need audit suffix be " .jpg ", " .gif ", " .ico ", " .css ",
" .js ", " .png " etc..The suffix that HTTP request message meets the URL that first condition is equivalent to HTTP request message request does not exist
In preset suffix list of not auditing, i.e., URL request of the suffix in suffix list of not auditing is filtered by first condition.Table
URL in 2 obtains the URL for meeting first condition after first condition filters, and as shown in table 3, that is, filters and largely audits to URL
Nonsensical URL request.
Table 3 meets the url list of first condition
In addition, the suffix that first condition not only may include the URL of HTTP request message request is not audited preset
Suffix list can also include the content type (Content-Type) of the corresponding http response message of HTTP request message pre-
If type of audit in and/or the corresponding http response message of HTTP request message size of contents (Content-Size) it is not small
In pre-set dimension.
Wherein, http response message and HTTP request message are in corresponding relationship, after getting HTTP request message, it is also necessary to
After obtaining the corresponding http response message of the HTTP request message, judge the content type of http response message whether preset
In type of audit, which can rule of thumb be set, and may include HTML (HyperText Markup
Language, HyperText Markup Language) format, XML (extensible markup language, Extensible Markup Language)
Format, plain text format etc.;And/or judge whether the size of contents of http response message is not less than pre-set dimension, the content ruler
It is very little rule of thumb to set, it can be set as 30 bytes.
(2) description below is done to the first url list:
First url list is the url list determined according to website is preset, wherein default website can be access times and be greater than
N number of website of certain threshold value, is also possible to rule of thumb preset N number of website.The html file of N number of website is obtained,
According to the respective corresponding URL in website of html grammar extraction, for example, obtaining the value of href according to grammer<a href=" ">, obtain
The href value i.e. URL of the website arrived.
(3) description below is done to URL cache pool:
Preset URL can be put into URL cache pool, can also after getting HTTP request message, according to
The URL of HTTP request message request determines URL to be cached, and the URL to be cached is put into URL cache pool.For
URL to be cached can be obtained by following two mode:
First, it is not empty that the HTTP request message got, which meets first condition and the thresholding in the domain Referer, and determining should
The URL of HTTP request message request is not the URL generated behind access preset website, i.e., in the heading of the HTTP request message
For the thresholding in the domain Referer not in the first url list, then needing to judge to whether there is in URL cache pool has the domain Referer
The URL of the HTTP request message request is determined as URL to be cached if not having by thresholding;If so, the HTTP is not asked then
The URL of message request is asked to be determined as URL to be cached.
Second, it is not sky that the HTTP request message got, which does not meet first condition and the thresholding in the domain Referer, then need
Judging, which whether there is in URL cache pool, has the thresholding in the domain Referer to be determined as the thresholding in the domain Referer if not having
URL to be cached;If so, will then recalculate the caching duration of the thresholding in the domain Referer in URL cache pool.It needs to illustrate
, the thresholding in the corresponding domain Referer URL is substantially also a URL, recalculates the domain Referer in URL cache pool
The specific embodiment of caching duration of thresholding be explained in detail below.
This is got after the URL cached by above two mode, which is put into URL cache pool.
Further, each URL in URL cache pool is provided with caching duration, and URL caching duration is understood that
It is present in the time span in cache pool for the URL, that is, after the caching duration for reaching the URL, by the URL from URL cache pool
It deletes.In other words, after URL being put into URL cache pool, it is buffered at current time to need to calculate the URL in URL cache pool
Duration then the URL can be delayed from URL after the currently buffered duration of the URL reaches the caching duration of the URL
It deposits in pond and deletes.The caching duration of the URL can be rule of thumb set, can be 2s, is i.e. URL is present in URL cache pool
After the time of caching reaches 2s, which is deleted from URL cache pool.
When the thresholding that HTTP request message meets first condition and the domain Referer is not sky, there is also the domains Referer
Situation of the thresholding in the first url list.After the thresholding for determining the domain Referer is in the first url list, HTTP request is judged
Whether the URL of message request is in the second url list, if so, the URL for HTTP request message request of not auditing;Otherwise, it audits
The URL of HTTP request message request;Second url list is the URL column of determination after the URL in browser the first url list of access
Table, wherein not including the URL in the first url list in the second url list.
As a kind of achievable mode, the obtaining step of the second url list are as follows: browser passes through reptile instrument or browser
URL in extender simulation the first url list of artificial access;Record all URL requests generated when browser access;It will
The URL in the first url list in all URL requests generated is deleted;The remaining URL list formed is determined as second
Url list.By the obtaining step of the second url list it is found that the second url list is given birth to automatically when can be understood as browser access
At url list, further, the URL that automatically generates is present in second without audit, that is, determination when browser accesses
URL in url list is not necessarily to audit.
After the thresholding for determining the domain Referer is in the first url list, judge HTTP request message request URL whether
In the second url list, it can be understood as whether the URL for judging HTTP request message request automatically generates when being browser access
URL.If so, the URL for the HTTP request message request that do not need to audit can be determined;Otherwise, the HTTP that needs to audit is asked
Seek the URL of message request.
In addition, when HTTP request message meets first condition, it, can be true when the thresholding for further including the domain Referer is sky
The URL of the fixed HTTP request message request is that user is directly manually entered URL generation in a browser, it is determined that the HTTP is asked
The URL of message request is asked to need to audit.
Above embodiment described the streams for determining whether URL needs to audit when HTTP request message meets first condition
The process flow when HTTP request message does not meet first condition is detailed below in journey.
When HTTP request message does not meet first condition, it can determine the suffix of the URL of HTTP request message request
In preset suffix list of not auditing, it is determined that the URL of the HTTP request message request does not need audit.Further
, it is also necessary to judge whether the domain Referer in the heading of HTTP request message is empty:
If the thresholding in the domain Referer be not it is empty, judge the thresholding in the domain Referer whether in URL cache pool, if so,
Then recalculate the caching duration of the thresholding in the domain Referer in URL cache pool;Otherwise, the thresholding in the domain Referer is stored in
In URL cache pool, and the thresholding in the domain Referer is deleted after the caching duration of the thresholding in the domain Referer is expired.
It is construed to, when HTTP request message does not meet first condition, i.e. the suffix of the URL of HTTP request message request exists
In preset suffix list of not auditing, such as the suffix of URL of HTTP request message request is " gif " or " png ", then carries out the
The judgement of one step: determine the domain Referer thresholding whether be it is empty, if the thresholding in the domain Referer is sky, this can be directly determined
The URL of HTTP request message request does not need to audit;The thresholding in the domain Referer is not sky, then carries out second step judgement:
Whether the thresholding in the domain Referer is in URL cache pool, if so, recalculating the thresholding in the domain Referer in URL cache pool
Caching duration otherwise the thresholding in the domain Referer is stored in URL cache pool, and the caching of the thresholding in the domain Referer
The thresholding in the domain Referer is deleted after duration is expired.
Following explanation is done to the judgement of above-mentioned second step, the suffix of the URL of HTTP request message request is not audited preset
When in suffix list, need to extract the thresholding in the domain Referer of the URL.Whether to judge the thresholding in the domain Referer of the URL
It is buffered in URL cache pool, if not having, the thresholding in the domain Referer of the URL is buffered in URL cache pool, and is counted
Calculate the caching duration of the thresholding in the domain Referer.If the thresholding in the domain Referer has been buffered in URL cache pool, need
After the caching duration of the thresholding in the originally buffered domain Referer is reset, caching duration is recalculated.
For example, the URL of HTTP request message request is " http://www.xxxxxx.org/images/colour/
The suffix " .gif " of yellow.gif ", the URL extract the domain Referer of the URL in preset suffix list of not auditing
Thresholding be " http://www.xxxxxx.org ", judge in URL cache pool with the presence or absence of have " http: //
www.xxxxxx.org".When there is no " http://www.xxxxxx.org " in URL cache pool, then incite somebody to action " http: //
Www.xxxxxx.org " is buffered in URL cache pool, and calculates the caching duration of " http://www.xxxxxx.org ".When
When having " http://www.xxxxxx.org " in URL cache pool, it is assumed that be somebody's turn to do " http://www.xxxxxx.org " when current
Quarter has cached 2s (it is assumed that when the caching of the thresholding in the domain Referer being buffered in URL cache pool in URL cache pool
After long arrival 3s, the thresholding in the domain Referer is deleted), then recalculate in URL cache pool " http: //
The caching duration 2s of " http://www.xxxxxx.org " is reset, is opened again by the caching duration of www.xxxxxx.org "
Begin to calculate.
When second step judges, the thresholding in the domain Referer is judged whether in URL cache pool, if so, recalculating
The caching duration of the thresholding in the domain Referer in URL cache pool, is equivalent to the caching duration for the thresholding for having refreshed the domain Referer, into
And the thresholding for extending the domain Referer is present in the total duration in URL cache pool.Existing beneficial effect can combine above-mentioned example
Son does following analysis:
Since " http://www.xxxxxx.org/images/colour/yellow.gif " has refreshed in URL cache pool
The caching duration of " http://www.xxxxxx.org ", extending " http://www.xxxxxx.org " is present in URL relatively
Total duration in cache pool.When the URL for getting HTTP request message request is " https: //www.xxxxxx.org/
When threads/new-install-help.20492/ " (suffix of the URL is not in preset suffix list of not auditing), then
Judge that " https: //www.xxxxxx.org/threads/new-install-help.20492/ " there are the domains in the domain Referer
Value is after " https: //www.xxxxxx.org/ ", whether judgement " https: //www.xxxxxx.org/ " is in URL cache pool
In, if " https: //www.xxxxxx.org/ " is present in URL cache pool, the URL of not auditing " https: //
www.xxxxxx.org/threads/new-install-help.20492/".In above-mentioned example, because extending Referer
The thresholding in domain is present in the total duration in URL cache pool, then filtered without audit URL " https: //
Www.xxxxxx.org/threads/new-install-help.20492/ " will not need the HTTP request message of audit
The url filtering of request promotes the efficiency of URL audit, reduces the burden of audit device.Certainly, if " https: //
Www.xxxxxx.org/ " is not present in URL cache pool, then by https in URL: //www.xxxxxx.org/threads/
New-install-help.20492/ " is cached into URL cache pool.
In addition, when HTTP request message does not meet first condition, no matter the Referer of the heading of HTTP request message
Whether the thresholding in domain is empty, the URL for the HTTP request message request that all do not need to audit.By this way, HTTP can be asked
Ask the suffix of message request for the url filtering of " .jpg ", " .gif " etc., it is preliminary to delete the URL for not needing audit.
In above-described embodiment, when HTTP request message does not meet first condition, in the message for determining HTTP request message
After the thresholding in the domain Referer in head is not sky, by the thresholding in the domain Referer in the heading of HTTP request message cache to
In URL cache pool or the caching duration of the thresholding in the refreshing domain Referer, the thresholding for extending the domain Referer are present in URL cache pool
In total duration, filter out largely do not need audit URL, promoted URL audit efficiency, reduce the burden of audit device.
Embodiment in order to preferably explain the present invention will describe the stream of audit URL under specific implement scene below
Journey, as shown in figure 3, specific as follows:
Step 301, user is obtained by DPI identification technology and clicks the HTTP request message that behavior generates.
Step 302, judge URL suffix whether in preset suffix list of not auditing.If so, the URL that do not audit, no
Then, step 303 is turned to.
Judge the suffix of URL whether in preset suffix list of not auditing, that is, judge URL suffix whether be
" .jpg ", " .gif " etc., if so, the URL that do not audit.Otherwise, judge the corresponding http response message of HTTP request message
Whether content type is in preset type of audit.
Step 303, judge content type whether in preset type of audit.It is not otherwise examined if so, turning to step 304
Count the URL.
Judge the content type of the corresponding http response message of HTTP request message whether in preset type of audit, i.e.,
Whether the content type for judging the corresponding http response message of HTTP request message is html format, XML format, plain text format
Deng if so, continuing to judge whether the size of contents of the corresponding http response message of HTTP request message is not less than pre-set dimension.
Otherwise, do not audit the URL.
Step 304, judge whether size of contents is not less than pre-set dimension.If so, otherwise turning to step 305 does not audit this
URL。
Judge whether the size of contents of the corresponding http response message of HTTP request message is not less than pre-set dimension, if so,
Then continue to judge the thresholding in the domain Referer in the heading of the HTTP request message whether not for sky;Otherwise, this is not audited
URL。
Step 305, judge the thresholding in the domain Referer whether not for sky.If so, turning to step 306, otherwise, audit should
URL。
Judge the thresholding in the domain Referer whether not for sky, that is, judges whether URL is that certain page link comes.If so, sentencing
Whether the thresholding in the disconnected domain Referer is in the first url list.Otherwise, it determines the URL is that user is directly defeated manually in a browser
Enter URL generation, audit the URL.
Step 306, judge the thresholding in the domain Referer whether in the first url list.If so, step 309 is turned to, otherwise,
Turn to step 307.
The thresholding in the domain Referer is judged whether in the first url list, that is, judges whether the URL is from default website
Whether chain is taken in URL, if so, judging URL in the second url list.Otherwise, it determines the URL is from except default website
Whether chain is taken in the URL of website in addition, further determine that the thresholding in the domain Referer in URL cache pool.
Step 307, judge the thresholding in the domain Referer whether in URL cache pool.If so, otherwise the URL that do not audit turns
To step 308.
Judge the thresholding in the domain Referer whether in URL cache pool.If it is corresponding to have existed the URL in URL cache pool
The domain Referer thresholding, it is determined that the thresholding in the domain Referer is less than caching duration, and do not audit the URL.If slow in URL
Depositing Chi Zhongwei, there are the thresholdings in the domain Referer, then need URL cache pool is added, and carry out caching timing, and auditing should
URL。
Step 308, cache pool is added in URL, carries out caching timing.
After URL cache pool is added in the URL, caching timing is carried out, until when the URL is buffered in the caching of URL cache pool
After length is expired, which is deleted from URL cache pool.
Step 309, judge URL whether in the second url list.If so, the URL that do not audit, otherwise, audit the URL.
URL is judged whether in the second url list, that is, judges whether the URL is that browser accesses in the first url list
When URL, the URL that automatically generates.If so, determining that the URL is automatically generated, do not need to audit.Otherwise, audit the URL.
In above-described embodiment, after determining the URL suffix in preset suffix list of not auditing, the stream of Fig. 4 can be executed
Journey, specific as follows:
Step 401, judge the thresholding in the domain Referer whether not for sky.If so, otherwise turning to step 402 does not audit this
URL。
The thresholding in the domain Referer is judged whether not for sky, i.e., in the suffix for determining the URL in preset suffix column of not auditing
After in table, judge whether the URL is that certain page link comes.If the URL is that certain page link comes, judge the URL's
Whether the thresholding in the domain Referer is in URL cache pool.If the URL is not that certain page link comes, do not audit the URL.
Step 402, judge the thresholding in the domain Referer whether in URL cache pool.If so, turning to step 403, otherwise, turn
To step 404.
Judge the thresholding in the domain Referer whether in URL cache pool.It is cached in URL cache pool if so, determining
The thresholding in the domain Referer, and the thresholding caching duration in the domain Referer is not out of date, further, refreshes the domain Referer
Caching timing of the thresholding in URL cache pool.Otherwise, the thresholding in the domain Referer is cached into cache pool, carries out caching meter
When.
Step 403, after the caching timing of the thresholding in the domain Referer is reset in URL cache pool, reclocking.
Step 404, the thresholding in the domain Referer is cached into cache pool, carries out caching timing.
Since the specific implementation of the present embodiment has described in other embodiments, therefore not to repeat here.
In above-described embodiment, judged by four steps: 1, whether the suffix of the URL of HTTP request message request is default
Suffix list of not auditing;2, whether the thresholding in the domain Referer is empty;3, whether the thresholding in the domain Referer is in the first url list
In;4, whether the thresholding in the domain Referer is in URL cache pool, and exists in the suffix for the URL for determining HTTP request message request
When in preset suffix list of not auditing, the thresholding in the domain Referer is cached into URL cache pool to or is refreshed the domain Referer
Whether the caching duration of thresholding, the final URL for determining HTTP request message request need to audit, largely be not required in this way, filtering out
The URL to be audited promotes the efficiency of URL audit, reduces the burden of audit device.
In addition, it can also be directed to the URL of each audit device taken at regular intervals as a kind of implementation of the invention, it will
The URL numerical digit of each audit device acquisition is sent to given server in URL (TOPN URL) fixed cycle of top n, the service
Device summarizes the URL for the respective top n that all audit devices are fed back, and determines that URL numerical digit is in preceding in the URL after summarizing
M URL (TOPM URL).Introduce human assistance, by artificial judgment should be located at preceding M URL (TOPM URL) whether be
The URL for needing to audit.If it is not, then these artificial determining URL not audited are added in the library rubbish URL.By the library rubbish URL
In rubbish url list be sent to each audit device so that audit device can also be according to rubbish URL when auditing URL
List is compared, if hit, audit device determine that the URL does not audit, otherwise carry out other processing.
Based on the same inventive concept, Fig. 5 illustratively shows the device of URL that audits provided in an embodiment of the present invention a kind of
Structure, the device can audit URL method process.
Acquiring unit 501 includes in the heading of the HTTP request message for obtaining HTTP request message
The domain Referer;
Processing unit 502, for meeting the thresholding in first condition and the domain Referer when the HTTP request message not
When for sky, after the thresholding for determining the domain Referer is not in the first url list, judge that the thresholding in the domain Referer is
It is no in URL cache pool, if it is not, the URL for the HTTP request message request of then auditing;
Wherein, the first condition includes that the suffix of the URL of the HTTP request message request is not audited preset
In suffix list;First url list is the url list determined according to website is preset.
Optionally, the first condition further includes the content type of the corresponding http response message of the HTTP request message
In preset type of audit and/or the size of contents of the corresponding http response message of the HTTP request message is not less than default
Size.
Optionally, the processing unit 502 is also used to:
After the URL of the audit HTTP request message request, the URL of the HTTP request message request is deposited
Storage in the URL cache pool, and the URL in the HTTP request message request caching duration it is expired after the HTTP is asked
The URL of message request is asked to delete.
Optionally, the processing unit 502 is also used to:
When the thresholding that the HTTP request message does not meet first condition and the domain Referer is not sky, institute is judged
The thresholding in the domain Referer is stated whether in the URL cache pool, if so, recalculating described in the URL cache pool
The caching duration of the thresholding in the domain Referer;Otherwise, the thresholding in the domain Referer is stored in the URL cache pool, and
The thresholding in the domain Referer is deleted after the caching duration of the thresholding in the domain Referer is expired.
Optionally, the processing unit 502 is also used to:
After the thresholding for determining the domain Referer is in the first url list, the HTTP request message request is judged
Whether URL is in the second url list, if so, the URL for the HTTP request message request of not auditing;Otherwise, described in audit
The URL of HTTP request message request;Second url list is determined after browser accesses URL in first url list
Url list, wherein not including the URL in first url list in second url list.
Optionally, the acquiring unit 501 is specifically used for:
User, which is obtained, by DPI identification technology clicks the HTTP request message that behavior generates.
Based on the same inventive concept, the embodiment of the invention also provides a kind of calculating equipment, comprising:
Memory, for storing program instruction;
Processor executes above-mentioned audit according to the program of acquisition for calling the program instruction stored in the memory
The method of URL.
Based on the same inventive concept, the embodiment of the invention also provides a kind of computer-readable non-volatile memory medium,
Including computer-readable instruction, when computer is read and executes the computer-readable instruction, so that computer execution is above-mentioned
The method of audit URL.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.
Claims (14)
1. a kind of method for uniform resource position mark URL of auditing characterized by comprising
HTTP request message is obtained, includes source Referer in the heading of the HTTP request message
Domain;
When the thresholding that the HTTP request message meets first condition and the domain Referer is not sky, described in determination
After the thresholding in the domain Referer is not in the first url list, judge the thresholding in the domain Referer whether in URL cache pool,
If it is not, the URL for the HTTP request message request of then auditing;
Wherein, the first condition includes the suffix of the URL of the HTTP request message request not in preset suffix of not auditing
In list;First url list is the url list determined according to website is preset.
2. the method as described in claim 1, which is characterized in that the first condition further includes that the HTTP request message is corresponding
Http response message content type in preset type of audit and/or the corresponding http response of the HTTP request message
The size of contents of message is not less than pre-set dimension.
3. the method as described in claim 1, which is characterized in that the URL of the audit HTTP request message request it
Afterwards, further includes:
The URL of the HTTP request message request is stored in the URL cache pool, and in the HTTP request message request
URL caching duration it is expired after the URL of the HTTP request message request is deleted.
4. the method as described in claim 1, which is characterized in that further include:
When the thresholding that the HTTP request message does not meet first condition and the domain Referer is not sky, described in judgement
Whether the thresholding in the domain Referer is in the URL cache pool, if so, recalculating described in the URL cache pool
The caching duration of the thresholding in the domain Referer;Otherwise, the thresholding in the domain Referer is stored in the URL cache pool, and
The thresholding in the domain Referer is deleted after the caching duration of the thresholding in the domain Referer is expired.
5. the method as described in claim 1, which is characterized in that further include:
After the thresholding for determining the domain Referer is in the first url list, the URL of the HTTP request message request is judged
Whether in the second url list, if so, the URL for the HTTP request message request of not auditing;Otherwise, audit the HTTP
The URL of request message request;Second url list is determined after browser accesses URL in first url list
Url list, wherein not including the URL in first url list in second url list.
6. such as method described in any one of claim 1 to 5, which is characterized in that the acquisition HTTP request message, comprising:
DPI identification technology, which is detected, by deep message obtains the HTTP request message that user clicks behavior generation.
7. a kind of device for uniform resource position mark URL of auditing characterized by comprising
Acquiring unit is wrapped in the heading of the HTTP request message for obtaining HTTP request message
Include the source domain Referer;
Processing unit, when the thresholding for meeting first condition and the domain Referer when the HTTP request message is not sky,
After the thresholding for determining the domain Referer is not in the first url list, judge the thresholding in the domain Referer whether in URL
In cache pool, if it is not, the URL for the HTTP request message request of then auditing;
Wherein, the first condition includes the suffix of the URL of the HTTP request message request not in preset suffix of not auditing
In list;First url list is the url list determined according to website is preset.
8. device as claimed in claim 7, which is characterized in that the first condition further includes that the HTTP request message is corresponding
Http response message content type in preset type of audit and/or the corresponding http response of the HTTP request message
The size of contents of message is not less than pre-set dimension.
9. device as claimed in claim 7, which is characterized in that the processing unit is also used to:
After the URL of the audit HTTP request message request, the URL of the HTTP request message request is stored in
In the URL cache pool, and the URL in the HTTP request message request caching duration it is expired after by the HTTP request report
The URL of text request is deleted.
10. device as claimed in claim 7, which is characterized in that the processing unit is also used to:
When the thresholding that the HTTP request message does not meet first condition and the domain Referer is not sky, described in judgement
Whether the thresholding in the domain Referer is in the URL cache pool, if so, recalculating described in the URL cache pool
The caching duration of the thresholding in the domain Referer;Otherwise, the thresholding in the domain Referer is stored in the URL cache pool, and
The thresholding in the domain Referer is deleted after the caching duration of the thresholding in the domain Referer is expired.
11. device as claimed in claim 7, which is characterized in that the processing unit is also used to:
After the thresholding for determining the domain Referer is in the first url list, the URL of the HTTP request message request is judged
Whether in the second url list, if so, the URL for the HTTP request message request of not auditing;Otherwise, audit the HTTP
The URL of request message request;Second url list is determined after browser accesses URL in first url list
Url list, wherein not including the URL in first url list in second url list.
12. such as the described in any item devices of claim 7 to 11, which is characterized in that the acquiring unit is specifically used for:
DPI identification technology, which is detected, by deep message obtains the HTTP request message that user clicks behavior generation.
13. a kind of calculating equipment characterized by comprising
Memory, for storing program instruction;
Processor requires 1 to 6 according to the program execution benefit of acquisition for calling the program instruction stored in the memory
Described in any item methods.
14. a kind of computer-readable non-volatile memory medium, which is characterized in that including computer-readable instruction, work as computer
When reading and executing the computer-readable instruction, so that computer executes such as method as claimed in any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811324566.4A CN109547421A (en) | 2018-11-08 | 2018-11-08 | A kind of method and device for the URL that audits |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811324566.4A CN109547421A (en) | 2018-11-08 | 2018-11-08 | A kind of method and device for the URL that audits |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109547421A true CN109547421A (en) | 2019-03-29 |
Family
ID=65845283
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811324566.4A Pending CN109547421A (en) | 2018-11-08 | 2018-11-08 | A kind of method and device for the URL that audits |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109547421A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112311823A (en) * | 2019-07-29 | 2021-02-02 | 百度(中国)有限公司 | Flow control method and device of auditing system and server |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102098229A (en) * | 2011-03-04 | 2011-06-15 | 北京星网锐捷网络技术有限公司 | Method and device for optimizing and auditing uniform resource locator (URL) as well as network device |
CN102780681A (en) * | 2011-05-11 | 2012-11-14 | 中兴通讯股份有限公司 | URL (Uniform Resource Locator) filtering system and URL filtering method |
CN104239353A (en) * | 2013-06-20 | 2014-12-24 | 上海博达数据通信有限公司 | WEB classification control and log auditing method |
CN105991634A (en) * | 2015-04-29 | 2016-10-05 | 杭州迪普科技有限公司 | Access control method and apparatus |
CN106534243A (en) * | 2015-09-14 | 2017-03-22 | 阿里巴巴集团控股有限公司 | Caching, requesting and responding method based on HTTP protocol and corresponding device |
-
2018
- 2018-11-08 CN CN201811324566.4A patent/CN109547421A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102098229A (en) * | 2011-03-04 | 2011-06-15 | 北京星网锐捷网络技术有限公司 | Method and device for optimizing and auditing uniform resource locator (URL) as well as network device |
CN102780681A (en) * | 2011-05-11 | 2012-11-14 | 中兴通讯股份有限公司 | URL (Uniform Resource Locator) filtering system and URL filtering method |
CN104239353A (en) * | 2013-06-20 | 2014-12-24 | 上海博达数据通信有限公司 | WEB classification control and log auditing method |
CN105991634A (en) * | 2015-04-29 | 2016-10-05 | 杭州迪普科技有限公司 | Access control method and apparatus |
CN106534243A (en) * | 2015-09-14 | 2017-03-22 | 阿里巴巴集团控股有限公司 | Caching, requesting and responding method based on HTTP protocol and corresponding device |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112311823A (en) * | 2019-07-29 | 2021-02-02 | 百度(中国)有限公司 | Flow control method and device of auditing system and server |
CN112311823B (en) * | 2019-07-29 | 2023-01-31 | 百度(中国)有限公司 | Flow control method and device of auditing system and server |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI738720B (en) | Page jump method and device | |
JP6488508B2 (en) | Web page access method, apparatus, device, and program | |
CN103338249B (en) | Caching method and device | |
US20080244740A1 (en) | Browser-independent editing of content | |
CN106339398A (en) | Pre-reading method and device for webpage and intelligent terminal device | |
CN106294648A (en) | A kind of processing method and processing device for page access path | |
CN104994139B (en) | A kind of system and method to high concurrent network request quick response | |
CN104735062B (en) | A kind of network user register method and server | |
CN104052809B (en) | A kind of flow-dividing control method and apparatus of website test | |
CN102857369A (en) | Website log saving system, method and apparatus | |
US8713368B2 (en) | Methods for testing OData services | |
CN106126693A (en) | The sending method of the related data of a kind of webpage and device | |
CN103118007A (en) | Method and system of acquiring user access behavior | |
CN106326261A (en) | Pre-reading method and device for webpage and intelligent terminal device | |
CN106599270B (en) | Network data capturing method and crawler | |
CN107992529A (en) | A kind of key word association method and apparatus | |
CN106886545A (en) | The caching method and device of page display method, page resource | |
KR20180074774A (en) | How to identify malicious websites, devices and computer storage media | |
CN103678639A (en) | Method and device for reminding information updating in browser | |
CN103455492B (en) | A kind of method and apparatus of search and webpage | |
CN103312692B (en) | Chained address safety detecting method and device | |
CN109508437A (en) | A kind of search website auditing method, system and gateway and storage medium | |
CN104657359A (en) | Webpage content and style recording method by using website | |
CN109547421A (en) | A kind of method and device for the URL that audits | |
CN110321510A (en) | Page rendering method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190329 |
|
RJ01 | Rejection of invention patent application after publication |