CN103118007B - A kind of acquisition methods of user access activity and system - Google Patents

A kind of acquisition methods of user access activity and system Download PDF

Info

Publication number
CN103118007B
CN103118007B CN201310003709.2A CN201310003709A CN103118007B CN 103118007 B CN103118007 B CN 103118007B CN 201310003709 A CN201310003709 A CN 201310003709A CN 103118007 B CN103118007 B CN 103118007B
Authority
CN
China
Prior art keywords
url
url information
address
time
access data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310003709.2A
Other languages
Chinese (zh)
Other versions
CN103118007A (en
Inventor
田海燕
练书成
丁毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Raisecom Technology Co Ltd
Original Assignee
Raisecom Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Raisecom Technology Co Ltd filed Critical Raisecom Technology Co Ltd
Priority to CN201310003709.2A priority Critical patent/CN103118007B/en
Publication of CN103118007A publication Critical patent/CN103118007A/en
Application granted granted Critical
Publication of CN103118007B publication Critical patent/CN103118007B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention provides a kind of acquisition methods and system of user access activity.Described method, comprising: obtain web page access data; According to the strategy pre-set, the field in the http head of network access data is filtered; URL information in message after filtration is processed, obtains the access to netwoks behavior of user.

Description

A kind of acquisition methods of user access activity and system
Technical field
The present invention relates to data processing field, particularly relate to a kind of acquisition methods and system of user access activity.
Background technology
The develop rapidly of current information technology and improving constantly of IT application in enterprises degree, the use of user is also more and more abundanter.A lot of user is learnt by Internet resources, lie fallow, amusement etc.Some businessman, in order to obtain economic benefit, starts to do a large amount of advertising pictures in each website, also some advertisement promotion etc.The integration of three networks etc. that simultaneous country advocates, the fusion of each business has entered the operation of enterprise.In order to ensure stable, a safety, efficiently network operation environment, keeper or enterprise boss have to usually face following problem---the how internet behavior of supervisory user? the how service condition of tracking network application resource?
In order to solve the problem, the internet behavior of recording user is inevitable.Especially the behavior of enterprise staff browsing pages is recorded.Because by analyzing the content of pages browsed of user, we can understand the interested aspect of employee, or whether have done some illegal speeches and some illegal websites of access etc.These information also can provide important foundation for public security bureau solves a case etc.
In existing technology, the scheme of recording user internet behavior is simply to be extracted by the URL that every bar links, and sends.Due to the development of modern network technology, we attempt click page, and so this page will attempt link advertisement associated with it, picture etc.The so last daily record that we see will be have many unnecessary log audits out.These unnecessary daily records accumulate for a long time, will the real daily record required for us to after being flushed to, and keeper seems to confuse very much, how also can not find the daily record oneself needed.The appearance of these a large amount of unnecessary daily records simultaneously also can take a large amount of memory spaces, and last phenomenon is that we waste many memory spaces, saves many daily records useless.Keeper seems also can have a bad headache, and does not know that is real required log information.
Summary of the invention
Provided by the invention, the technical problem that solve be how to filter out user access webpage in link advertisement associated with it or the network linking of picture.
For solving the problems of the technologies described above, the invention provides following technical scheme:
An acquisition methods for access to netwoks behavior, comprising:
Obtain web page access data;
According to the strategy pre-set, the field in the http head of network access data is filtered;
URL information in message after filtration is processed, obtains the access to netwoks behavior of user.
Preferably, described method also has following features: described in the strategy that pre-sets comprise and select http entity to be by the entity that compresses or the un-compressed entity containing title feature, wherein met following condition by the field in the http head of entity selected:
Content-Type field is the type of text/html;
Content-Length field is less than or equal to 1024 bytes;
The type of transfer-encoding stem is chunked, and the physical length of this response bag is greater than zero and the entity of this response bag ends up with " .0d0a0d0a ";
The length of URL is less than 130 bytes;
URL file suffixes is not .js .png .css .dif .klz .ico .xml .xsl .ani or .dll.
Preferably, described method also has following features:
Described method also comprises:
Record the URL information that same IP address is corresponding, using the access to netwoks behavior of the URL information of record as user;
Described to filter after message in URL information process, obtain the access to netwoks behavior of user, comprising:
The URL information that in the network access data of a certain IP address filtration obtained, URL information is corresponding with this IP address of local record is mated;
If URL information corresponding to this IP address has match objects, export the URL information in described network access data; Otherwise, first the URL information in network access data is increased in URL information corresponding to this IP address, then exports the URL information in described network access data.
Preferably, described method also has following features: the URL information that in the network access data of described a certain IP address filtration obtained, URL information is corresponding with this IP address of local record is mated, and comprising:
Adopt the content of the last N number of byte in the URL information that in network access data, URL information is corresponding with this IP address to compare, wherein the span of N is 20 ~ 1000.
Preferably, described method also has following features:
Describedly record URL information corresponding to same IP address, also comprise:
Record and describedly record same IP address corresponding URL and this URL accessed time;
Described URL information in network access data being increased in URL information corresponding to this IP address also comprises:
After the number of URL information corresponding to this IP address reaches the number threshold value pre-set, the time accessed according to each URL in this IP address, delete the information of URL the earliest of accessed time.
Preferably, described method also has following features:
Described to filter after message in URL information process, the access to netwoks behavior obtaining user also comprises:
If URL information corresponding to this IP address has match objects, then obtain the time that this URL is accessed; The time accessed according to this URL, initiate the operation that the access time of this URL in this IP address is upgraded.
Preferably, described method also has following features:
Described initiation also comprises the operation that the access time of this URL in this IP address upgrades:
If the difference of the accessed time of this URL and this match objects accessed time is more than or equal to the time threshold pre-set, is then updated to the initiation time of described network linking the time accessed for match objects.
Preferably, described method also has following features: described method also comprises:
If after a certain URL is accessed, this URL link is to one or more URL, before URL information then in output network visit data, predefined keyword whether is had in URL information in Network Search visit data, the URL not comprising this keyword is carried out output function as the URL information in final network access data, the keyword of other URL that wherein said keyword is linked to for this URL.
An acquisition system for access to netwoks behavior, is characterized in that, comprising:
Acquisition device, for obtaining web page access data;
Filter, is connected with described acquisition device, for according to the strategy pre-set, filters the field in the http head of network access data;
Processing unit, is connected with described filter, for processing the URL information in the message after filtration, obtains the access to netwoks behavior of user.
Preferably, described system also has following features: described in the strategy that pre-sets comprise and select http entity to be by the entity that compresses or the un-compressed entity containing title feature, wherein met following condition by the field in the http head of entity selected:
Content-Type field is the type of text/html;
Content-Length field is less than or equal to 1024 bytes;
The type of transfer-encoding stem is chunked, and the physical length of this response bag is greater than zero and the entity of this response bag ends up with " .0d0a0d0a ";
The length of URL is less than 130 bytes;
URL file suffixes is not .js .png .css .dif .klz .ico .xml .xsl .ani or .dll.
Preferably, described system also has following features:
Described system also comprises:
First tape deck, for recording URL information corresponding to same IP address, using the access to netwoks behavior of the URL information of record as user;
Described processing unit comprises:
Matching module, is connected with described tape deck, for mating filtering the URL information that in the network access data of a certain IP address that obtains, URL information is corresponding with this IP address of local record;
Processing module, is connected with described matching module, if do not have match objects for the URL information that this IP address is corresponding, the URL information in network access data is increased in URL information corresponding to this IP address;
Output module, is connected with described matching module, if having match objects for the URL information that this IP address is corresponding, exports the URL information in described network access data; And, being connected with described processing module, for the URL information in network access data being increased to after in URL information corresponding to this IP address in processing module, then exporting the URL information in described network access data.
Preferably, described system also has following features: described matching module adopts the content of the last N number of byte in the URL information that in network access data, URL information is corresponding with this IP address to compare, and wherein the span of N is 20 ~ 1000.
Preferably, described system also has following features:
The time that same IP address corresponding URL and this URL is accessed is recorded described in described first recording device records;
Described processing module also comprises:
Delete cells, when being increased in URL information corresponding to this IP address for the URL information in network access data, after the number of URL information corresponding to this IP address reaches the number threshold value pre-set, the time accessed according to each URL in this IP address, delete the information of URL the earliest of accessed time.
Preferably, described system also has following features: described processing module also comprises:
Updating block, is connected with described delete cells, if having match objects for the URL information that this IP address is corresponding, then obtains the time that this URL is accessed; The time accessed according to this URL, initiate the operation that the access time of this URL in this IP address is upgraded.
Preferably, described system also has following features: described updating block is used for:
If the difference of the accessed time of this URL and this match objects accessed time is more than or equal to the time threshold pre-set, is then updated to the initiation time of described network linking the time accessed for match objects.
Preferably, described system also has following features: described processing unit also comprises:
Filtering module, be connected with described output module, if after accessed for a certain URL, this URL link is to one or more URL, before URL information then in output network visit data, whether there is predefined keyword in URL information in Network Search visit data, the URL not comprising this keyword is carried out output function as the URL information in final network access data, the keyword of other URL that wherein said keyword is linked to for this URL.
Compared with prior art, embodiment of the method provided by the invention is by filtering the field in the http head of network access data, filter out a part of network access data irrelevant with network management, then obtain the real access to netwoks behavior needed according to remaining network access data.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the acquisition methods embodiment of access to netwoks behavior provided by the invention;
Fig. 2 is the schematic flow sheet of the acquisition methods application example of access to netwoks behavior provided by the invention;
Fig. 3 is the schematic flow sheet of step 209 in Application Example of the present invention;
Fig. 4 is the structural representation of the acquisition system embodiment of access to netwoks behavior provided by the invention.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, the present invention is described in further detail below in conjunction with the accompanying drawings and the specific embodiments.It should be noted that, when not conflicting, the embodiment in the application and the feature in embodiment can combination in any mutually.
Fig. 1 is the schematic flow sheet of the acquisition methods embodiment of access to netwoks behavior provided by the invention.Embodiment of the method shown in Fig. 1, comprising:
Step 101, acquisition web page access data;
The strategy that step 102, basis pre-set, filters the field in the http head of network access data;
Step 103, to filter after message in URL information process, obtain the access to netwoks behavior of user.
Compared with prior art, embodiment of the method provided by the invention is by filtering the field in the http head of network access data, filter out a part of network access data irrelevant with network management, then obtain the real access to netwoks behavior needed according to remaining network access data.
Below embodiment of the method provided by the invention is described further:
The described strategy pre-set comprises selects http entity to be by the entity that compresses or the un-compressed entity containing title feature, is wherein met following condition by the field in the http head of entity selected:
Content-Type field is the type of text/html;
Content-Length field is less than or equal to 1024 bytes;
The type of transfer-encoding stem is chunked, and the physical length of this response bag is greater than zero and the entity of this response bag ends up with " .0d0a0d0a ";
The length of URL is less than 130 bytes;
URL file suffixes is not .js .png .css .dif .klz .ico .xml .xsl .ani or .dll.
It should be noted that, why selecting the length of URL to be less than 130 bytes is because through detecting, show that the byte number of the URL of unwanted daily record is long, majority is all about 200 multibytes, and therefore the length of control URL is 130; And limiting for URL file suffixes, is because user is when opening required network address, because this required network address can link some publicity page or advertising page, and the file of these publicity page or advertising page is with above-mentioned suffix, and webpage itself does not have suffix.Therefore by the filtration to suffix, effectively can be filled into some alternative documents of web page interlinkage, such as, user just attaches some URL with suffix such as .xsl .css .xml when opening www.163.com and produces, by the filtration of suffix, what can draw user's actual access be URL is www.163.com.
This shows, by above-mentioned filter condition, can effectively filter out the data recording access to netwoks behavior, remove irrelevant log information, reach the object of simplifying daily record memory space, in addition, because network access data is after filtering, quantity obviously reduces, and reduces the process quantity obtaining access to netwoks behavior.
In actual applications, because user frequently can access a website usually, if all record is carried out in this access behavior at every turn, will certainly produce a lot of information repeated, therefore described method also comprises:
Record the URL information that same IP address is corresponding, using the access to netwoks behavior of the URL information of record as user;
Described to filter after message in URL information process, obtain the access to netwoks behavior of user, comprising:
The URL information that in the network access data of a certain IP address filtration obtained, URL information is corresponding with this IP address of local record is mated;
If URL information corresponding to this IP address has match objects, export the URL information in described network access data; Otherwise, first the URL information in network access data is increased in URL information corresponding to this IP address, then exports the URL information in described network access data.
Specifically, when the access behavior getting user, by the content of field in network access data and the content recorded are compared, and then determine whether to need write, prevent the write of duplicate message, simplify the data capacity of access to netwoks behavior.
Because referer and URL is long, what have even reaches 2000 multibytes, so when both compare, processing pressure is comparatively large, therefore only remains the URL information in network access data and the content of last N number of byte of URL that recorded compares.Wherein N number of value will guarantee the coupling of the information that realizes on the one hand, and on the other hand, and the length of byte should not be long, controls in 20 ~ 1000 bytes range.In the present invention, last 20 bytes are adopted to process.
Considering that gateway server needs the access to netwoks behavior of multiple user, therefore, when the number of the referer field recorded under to same IP address is safeguarded, by performing following scheme, can comprise:
Describedly record URL information corresponding to same IP address, also comprise:
Record and describedly record same IP address corresponding URL and this URL accessed time;
Described URL information in network access data to be increased in URL information corresponding to this IP address, also to comprise:
After the number of URL information corresponding to this IP address reaches the number threshold value pre-set, the time accessed according to each URL in this IP address, delete the information of URL the earliest of accessed time.
Wherein, this number threshold value is that server can to process higher limit when URL mates under an IP address, being the up-to-date access behavior of user by deleting URL the earliest of accessed time, the URL recorded under this IP address of real-time ensuring under this IP address, facilitating network operation.
For the webpage that some users frequently access, owing to being access time reason the earliest, can delete from URL corresponding to this IP address, but the very fast access again due to user, increased by with newer access time again, cause the problem that same URL frequently deletes or increases, therefore in order to avoid the appearance of the problems referred to above, described to filter after message in URL information process, the access to netwoks behavior obtaining user also comprises: if URL information corresponding to this IP address has match objects, then obtain the time that this URL is accessed; The time accessed according to this URL, initiate the operation that the access time of this URL in this IP address is upgraded.
Certainly, in order to effectively control the update times of the access time to same URL, described initiation also comprises the operation that the access time of this URL in this IP address upgrades:
If the difference of the accessed time of this URL and this match objects accessed time is more than or equal to the time threshold pre-set, is then updated to the initiation time of described network linking the time accessed for match objects.
In actual applications, when clicking a network address, except the real URL daily record that this is clicked, can along with some unnecessary URL daily records of generation, therefore the process on carry out is that filtration is incomplete, finally with the addition of one, if the URL of some special keyword is not again required for us.Such as, click www.taobao.com time except produce http: can produce some unnecessary URL as acookie.taobao.com, www.taobao.com/go/act/sale etc. except www.taobao.com daily record simultaneously, therefore, in order to the URL making the IP address of record corresponding is more accurate, described method also comprises:
If after a certain URL is accessed, one or more URL that this URL link arrives, before URL information then in output network visit data, predefined keyword whether is had in URL information in Network Search visit data, the URL not comprising this keyword is carried out output function as the URL information in final network access data, the keyword of other URL that wherein said keyword is linked to for this URL.
Below embodiment of the method provided by the invention is described further:
Fig. 2 is the schematic flow sheet of the acquisition methods application example of access to netwoks behavior provided by the invention.The application example of method shown in Fig. 2, comprises step 201 ~ step 209, wherein:
Step 201: identify http message accurately from Client-initiated tcp message.
Step 202: judge that whether the stem Content-Type of http is the type of text/html, if so, then perform step 203; Otherwise flow process terminates.
Step 203: judge whether http stem Content-Encoding is that gzip/deflate type is if then perform step 205 and process; Otherwise, perform step 204 and process.
Step 204: judge to search title character string in the entity of http, if do not had, flow process terminates; If there is this character string to enter step 205 ~ s108, wherein there is no obvious tandem between s205 ~ s208.
Step 205: judge that the content-length of the stem Content-Length of http is whether between 0-1024.
Step 206: judge whether the transfer-encoding that http responds bag meets following feature, comprising: the type of stem is chunked, and the physical length of this response bag is greater than zero and the entity of this response bag ends up with " .0d0a0d0a ";
Whether the length of step 207:URL is less than 130;
Step 208: check that the suffix of URL is the suffix except .js .png .css .dif .klz, ico .xml .xsl .ani or .dll;
When the execution result of step 205 ~ s108 is affirmative, perform step 209.
Step 209: filter out unnecessary URL daily record by the referer of http stem, concrete process comprises steps A 01 ~ steps A 06:
Fig. 3 is the schematic flow sheet of step 209 in Application Example of the present invention.Comprising steps A 01 ~ steps A 06, wherein:
Steps A 01: check whether http stem referer is empty, enters steps A 02 process, if do not processed for sky enters steps A 03 if referer is sky.
Steps A 02: be that hash table set up in keyword with ip, this hash table comprises a chained list to store URL in the particular content of rear 20 bytes (in order to save internal memory) of URL in referer and this referer accessed time.Such as, this chain list processing can preserve at most URL in 5 referer and each URL accessed time, because referer and URL is long, what have even reaches 2000 multibytes, so rear 20 bytes only remaining referer and URL process, certainly, also can expand according to the needs of oneself, as the number of referer can preserved in chained list, or the comparison length of referer and URL can lengthen or shorten.If there is the hash node that this ip is keyword in chained list, so the URL of request is inserted in the referer array of this ip node, if when the number that referer stores equals 5, after so being deleted by the URL inserted at first, the URL newly arrived is inserted ip node.If there is not the hash node that this ip is keyword in chained list, so set up an ip node, URL is inserted this ip node, simultaneously by this ip Knots inserting chained list.
Steps A 03: the referer array content during the referer header contents in this link and hash show compares, if completely matching check matches the difference storing the timestamp of this referer in the timestamp of referer and chained list and whether is less than 10 seconds, if be less than 10 seconds to return, do not carry out transmission log processing; If enter steps A 02 after being not less than 10 seconds record access time to process.
Certainly, the URL of request can also carry out keyword filtration, be described as follows before inserting the referer array of this ip node:
Steps A 04: when clicking a network address, except the real URL daily record that this is clicked, can along with some unnecessary URL daily records of generation, only the process carried out above is that filtration is incomplete, finally with the addition of one, if the URL of some special keyword is not again required for us, just these keywords are mated, if the match is successful, so return, do not carry out transmission log processing; If mate the unsuccessful steps A 05 that enters to process, wherein this keyword used herein is obtained by packet capturing analysis, the keyword that will filter is added in an array, these keywords are mated with filtering rear remaining URL, if matched, so this URL does not just send daily record, otherwise sends daily record.
Steps A 05: the URL also retained through many condition process above, is sent to database storage by this URL.
Steps A 06: return this function and be left intact.
Method application examples provided by the invention, by analyzing the stem of http agreement: the method that the length of Content-Type, Content-Encoding, Content-Length, transfer-encoding and URL, the filtration of URL file suffixes, URL characteristic filter, referer and ip address combine, filter out a large amount of unnecessary URL daily records, to make full use of the memory space of internal memory and to show the URL daily record that user really needs to user.
Fig. 4 is the structural representation of the acquisition system embodiment of access to netwoks behavior provided by the invention.System shown in Figure 4 embodiment comprises:
Acquisition device 401, for obtaining web page access data;
Filter 402, is connected with described acquisition device 401, for according to the strategy pre-set, filters the field in the http head of network access data;
Processing unit 403, is connected with described filter 402, for processing the URL information in the message after filtration, obtains the access to netwoks behavior of user.
Wherein, described in the strategy that pre-sets comprise and select http entity to be by the entity that compresses or the un-compressed entity containing title feature, wherein met following condition by the field in the http head of entity selected:
Content-Type field is the type of text/html;
Content-Length field is less than or equal to 1024 bytes;
The type of transfer-encoding stem is chunked, and the physical length of this response bag is greater than zero and the entity of this response bag ends up with " .0d0a0d0a ";
The length of URL is less than 130 bytes;
URL file suffixes is not .js .png .css .dif .klz .ico .xml .xsl .ani or .dll.
Described system also comprises:
First tape deck, for recording URL information corresponding to same IP address, using the access to netwoks behavior of the URL information of record as user;
Described processing unit comprises:
Matching module, is connected with described tape deck, for mating filtering the URL information that in the network access data of a certain IP address that obtains, URL information is corresponding with this IP address of local record;
Processing module, is connected with described matching module, if do not have match objects for the URL information that this IP address is corresponding, the URL information in network access data is increased in URL information corresponding to this IP address;
Output module, is connected with described matching module, if having match objects for the URL information that this IP address is corresponding, exports the URL information in described network access data; And, being connected with described processing module, for the URL information in network access data being increased to after in URL information corresponding to this IP address in processing module, then exporting the URL information in described network access data.
Optionally, described matching module adopts the content of the last N number of byte in the URL information that in network access data, URL information is corresponding with this IP address to compare, and wherein the span of N is 20 ~ 1000.
Optionally, same IP address corresponding URL and this URL accessed time is recorded described in described first recording device records;
Optionally, described processing module also comprises:
Delete cells, when being increased in URL information corresponding to this IP address for the URL information in network access data, after the number of URL information corresponding to this IP address reaches the number threshold value pre-set, the time accessed according to each URL in this IP address, delete the information of URL the earliest of accessed time.
Optionally, described processing module also comprises:
Updating block, is connected with described delete cells, if having match objects for the URL information that this IP address is corresponding, then obtains the time that this URL is accessed; The time accessed according to this URL, initiate the operation that the access time of this URL in this IP address is upgraded.
Wherein, described updating block is used for:
If the difference of the accessed time of this URL and this match objects accessed time is more than or equal to the time threshold pre-set, is then updated to the initiation time of described network linking the time accessed for match objects.
Optionally, described processing unit also comprises:
Filtering module, be connected with described output module, if after accessed for a certain URL, this URL link is to one or more URL, before URL information then in output network visit data, whether there is predefined keyword in URL information in Network Search visit data, the URL not comprising this keyword is carried out output function as the URL information in final network access data, the keyword of other URL that wherein said keyword is linked to for this URL.
Compared with prior art, system embodiment provided by the invention is by filtering the field in the http head of network access data, filter out a part of network access data irrelevant with network management, then obtain the real access to netwoks behavior needed according to remaining network access data.
The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; change can be expected easily or replace, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection range described in claim.

Claims (14)

1. an acquisition methods for access to netwoks behavior, is characterized in that, comprising:
Obtain web page access data;
According to the strategy pre-set, the field in the http head of network access data is filtered;
URL information in message after filtration is processed, obtains the access to netwoks behavior of user;
The described strategy pre-set comprises selects http entity to be by the entity that compresses or the un-compressed entity containing title feature, is wherein met following condition by the field in the http head of entity selected:
Content-Type field is the type of text/html;
Content-Length field is less than or equal to 1024 bytes;
The type of transfer-encoding stem is chunked, and the physical length responding bag is greater than zero and the entity of this response bag ends up with " .0d0a0d0a ";
The length of URL is less than 130 bytes;
URL file suffixes is not .js .png .css .dif .klz .ico .xml .xsl .ani or .dll.
2. method according to claim 1, is characterized in that:
Described method also comprises:
Record the URL information that same IP address is corresponding, using the access to netwoks behavior of the URL information of record as user;
Described to filter after message in URL information process, obtain the access to netwoks behavior of user, comprising:
The URL information that in the network access data of a certain IP address filtration obtained, URL information is corresponding with this IP address of local record is mated;
If URL information corresponding to this IP address has match objects, export the URL information in described network access data; Otherwise, first the URL information in network access data is increased in URL information corresponding to this IP address, then exports the URL information in described network access data.
3. method according to claim 2, is characterized in that, the URL information that in the network access data of described a certain IP address filtration obtained, URL information is corresponding with this IP address of local record is mated, and comprising:
Adopt the content of the last N number of byte in the URL information that in network access data, URL information is corresponding with this IP address to compare, wherein the span of N is 20 ~ 1000.
4. method according to claim 2, is characterized in that:
Describedly record URL information corresponding to same IP address, also comprise:
Record and describedly record same IP address corresponding URL and this URL accessed time;
Described URL information in network access data being increased in URL information corresponding to this IP address also comprises:
After the number of URL information corresponding to this IP address reaches the number threshold value pre-set, the time accessed according to each URL in this IP address, delete the information of URL the earliest of accessed time.
5. method according to claim 4, is characterized in that, described to filter after message in URL information process, the access to netwoks behavior obtaining user also comprises:
If URL information corresponding to this IP address has match objects, then obtain the time that this URL is accessed; The time accessed according to this URL, initiate the operation that the access time of this URL in this IP address is upgraded.
6. method according to claim 5, is characterized in that, described initiation also comprises the operation that the access time of this URL in this IP address upgrades:
If the difference of the accessed time of this URL and this match objects accessed time is more than or equal to the time threshold pre-set, is then updated to the initiation time of described network linking the time accessed for match objects.
7. method according to claim 2, is characterized in that, described method also comprises:
If after a certain URL is accessed, this URL link is to one or more URL, before URL information then in output network visit data, predefined keyword whether is had in URL information in Network Search visit data, the URL not comprising this keyword is carried out output function as the URL information in final network access data, the keyword of other URL that wherein said keyword is linked to for this URL.
8. an acquisition system for access to netwoks behavior, is characterized in that, comprising:
Acquisition device, for obtaining web page access data;
Filter, is connected with described acquisition device, for according to the strategy pre-set, filters the field in the http head of network access data;
Processing unit, is connected with described filter, for processing the URL information in the message after filtration, obtains the access to netwoks behavior of user;
The described strategy pre-set comprises selects http entity to be by the entity that compresses or the un-compressed entity containing title feature, is wherein met following condition by the field in the http head of entity selected:
Content-Type field is the type of text/html;
Content-Length field is less than or equal to 1024 bytes;
The type of transfer-encoding stem is chunked, and the physical length responding bag is greater than zero and the entity of this response bag ends up with " .0d0a0d0a ";
The length of URL is less than 130 bytes;
URL file suffixes is not .js .png .css .dif .klz .ico .xml .xsl .ani or .dll.
9. system according to claim 8, is characterized in that:
Described system also comprises:
First tape deck, for recording URL information corresponding to same IP address, using the access to netwoks behavior of the URL information of record as user;
Described processing unit comprises:
Matching module, is connected with described tape deck, for mating filtering the URL information that in the network access data of a certain IP address that obtains, URL information is corresponding with this IP address of local record;
Processing module, is connected with described matching module, if do not have match objects for the URL information that this IP address is corresponding, the URL information in network access data is increased in URL information corresponding to this IP address;
Output module, is connected with described matching module, if having match objects for the URL information that this IP address is corresponding, exports the URL information in described network access data; And, being connected with described processing module, for the URL information in network access data being increased to after in URL information corresponding to this IP address in processing module, then exporting the URL information in described network access data.
10. system according to claim 9, is characterized in that, described matching module adopts the content of the last N number of byte in the URL information that in network access data, URL information is corresponding with this IP address to compare, and wherein the span of N is 20 ~ 1000.
11. systems according to claim 9, is characterized in that:
The time that same IP address corresponding URL and this URL is accessed is recorded described in described first recording device records;
Described processing module also comprises:
Delete cells, when being increased in URL information corresponding to this IP address for the URL information in network access data, after the number of URL information corresponding to this IP address reaches the number threshold value pre-set, the time accessed according to each URL in this IP address, delete the information of URL the earliest of accessed time.
12. systems according to claim 11, is characterized in that, described processing module also comprises:
Updating block, is connected with described delete cells, if having match objects for the URL information that this IP address is corresponding, then obtains the time that this URL is accessed; The time accessed according to this URL, initiate the operation that the access time of this URL in this IP address is upgraded.
13. systems according to claim 12, is characterized in that, described updating block is used for:
If the difference of the accessed time of this URL and this match objects accessed time is more than or equal to the time threshold pre-set, is then updated to the initiation time of described network linking the time accessed for match objects.
14. systems according to claim 9, is characterized in that, described processing unit also comprises:
Filtering module, be connected with described output module, if after accessed for a certain URL, this URL link is to one or more URL, before URL information then in output network visit data, whether there is predefined keyword in URL information in Network Search visit data, the URL not comprising this keyword is carried out output function as the URL information in final network access data, the keyword of other URL that wherein said keyword is linked to for this URL.
CN201310003709.2A 2013-01-06 2013-01-06 A kind of acquisition methods of user access activity and system Active CN103118007B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310003709.2A CN103118007B (en) 2013-01-06 2013-01-06 A kind of acquisition methods of user access activity and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310003709.2A CN103118007B (en) 2013-01-06 2013-01-06 A kind of acquisition methods of user access activity and system

Publications (2)

Publication Number Publication Date
CN103118007A CN103118007A (en) 2013-05-22
CN103118007B true CN103118007B (en) 2016-02-03

Family

ID=48416281

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310003709.2A Active CN103118007B (en) 2013-01-06 2013-01-06 A kind of acquisition methods of user access activity and system

Country Status (1)

Country Link
CN (1) CN103118007B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239353B (en) * 2013-06-20 2019-12-31 上海博达数据通信有限公司 WEB classification control and log audit method
CN103530160A (en) * 2013-10-21 2014-01-22 迈普通信技术股份有限公司 Page loading method and device
CN103593484A (en) * 2013-12-03 2014-02-19 南京安讯科技有限责任公司 Method for filtering garbage logs during mobile phone internet surfing
CN104021143A (en) * 2014-05-14 2014-09-03 北京网康科技有限公司 Method and device for recording webpage access behavior
CN104270358B (en) * 2014-09-25 2018-10-26 同济大学 Trustable network transaction system client monitor and its implementation
CN105677657A (en) * 2014-11-19 2016-06-15 杭州华三通信技术有限公司 Recoding method and device for access behaviors of uniform resource locators
CN105991369B (en) * 2015-03-23 2020-03-06 杭州迪普科技股份有限公司 Message information extraction method and device
CN105049446A (en) * 2015-08-20 2015-11-11 中国联合网络通信集团有限公司 Method and system for filtering URL (Uniform Resource Locator)
CN105827522A (en) * 2015-11-10 2016-08-03 广东亿迅科技有限公司 Gateway equipment for processing log files
CN106411944B (en) * 2016-11-25 2019-09-20 锐捷网络股份有限公司 A kind of management method and device of network access
CN108121749A (en) * 2016-11-30 2018-06-05 北京国双科技有限公司 Website user's behavior analysis method and device
CN106357482B (en) * 2016-11-30 2019-10-29 四川秘无痕科技有限责任公司 A method of based on network protocol implementing monitoring web page access
CN107480190A (en) * 2017-07-11 2017-12-15 国家计算机网络与信息安全管理中心 A kind of filter method and device of non-artificial access log

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1121792A1 (en) * 1998-10-15 2001-08-08 Computer Associates Think, Inc. Method and system for the prevention of undesirable activities of executable objects
CN102004770A (en) * 2010-11-16 2011-04-06 杭州迪普科技有限公司 Webpage auditing method and device
CN102098229A (en) * 2011-03-04 2011-06-15 北京星网锐捷网络技术有限公司 Method and device for optimizing and auditing uniform resource locator (URL) as well as network device
CN102158499A (en) * 2011-06-02 2011-08-17 国家计算机病毒应急处理中心 Trojan-embedded website detection method based on hyper text transfer protocol (HTTP) traffic analysis
CN102254004A (en) * 2011-07-14 2011-11-23 北京邮电大学 Method and system for modeling Web in weblog excavation
CN102857572A (en) * 2012-09-14 2013-01-02 北京星网锐捷网络技术有限公司 Method and device for processing HTTP (hyper text transport protocol) access request and gateway equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1121792A1 (en) * 1998-10-15 2001-08-08 Computer Associates Think, Inc. Method and system for the prevention of undesirable activities of executable objects
CN102004770A (en) * 2010-11-16 2011-04-06 杭州迪普科技有限公司 Webpage auditing method and device
CN102098229A (en) * 2011-03-04 2011-06-15 北京星网锐捷网络技术有限公司 Method and device for optimizing and auditing uniform resource locator (URL) as well as network device
CN102158499A (en) * 2011-06-02 2011-08-17 国家计算机病毒应急处理中心 Trojan-embedded website detection method based on hyper text transfer protocol (HTTP) traffic analysis
CN102254004A (en) * 2011-07-14 2011-11-23 北京邮电大学 Method and system for modeling Web in weblog excavation
CN102857572A (en) * 2012-09-14 2013-01-02 北京星网锐捷网络技术有限公司 Method and device for processing HTTP (hyper text transport protocol) access request and gateway equipment

Also Published As

Publication number Publication date
CN103118007A (en) 2013-05-22

Similar Documents

Publication Publication Date Title
CN103118007B (en) A kind of acquisition methods of user access activity and system
CN105608134B (en) A kind of network crawler system and its web page crawl method based on multithreading
US9218482B2 (en) Method and device for detecting phishing web page
US10652265B2 (en) Method and apparatus for network forensics compression and storage
CA2865187C (en) Method and system relating to salient content extraction for electronic content
CN104714965B (en) Static resource De-weight method, static resource management method and device
CN106874778B (en) Intelligent terminal file acquisition and data recovery system and method based on android system
US8365241B1 (en) Method and apparatus for archiving web content based on a policy
CN102356390A (en) Flexible logging, such as for a web server
CN102098229B (en) Method and device for optimizing and auditing uniform resource locator (URL) as well as network device
CN102436564A (en) Method and device for identifying falsified webpage
CN103577482B (en) A kind of webpage collection method, device and browser
CN109688097A (en) Website protection method, website protective device, website safeguard and storage medium
CN102946320B (en) Distributed supervision method and system for user behavior log forecasting network
WO2014180130A1 (en) Method and system for recommending contents
CN103530429B (en) Webpage content extracting method
CN102868719A (en) Network access method and server based on cache
CN104239353B (en) WEB classification control and log audit method
CN105843852A (en) Log storage management method and system
CN108900554B (en) HTTP asset detection method, system, device and computer medium
CN104462096B (en) Public sentiment method for monitoring and analyzing and device
CN102130791A (en) Method, device and gateway server for detecting agent on gateway server
CN103152387B (en) A kind of apparatus and method obtaining HTTP user behavior track
CN101887463B (en) Virtual domain-based HTTP reduction display method
CN106326280B (en) Data processing method, device and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant