CN103118007A - Method and system of acquiring user access behavior - Google Patents

Method and system of acquiring user access behavior Download PDF

Info

Publication number
CN103118007A
CN103118007A CN2013100037092A CN201310003709A CN103118007A CN 103118007 A CN103118007 A CN 103118007A CN 2013100037092 A CN2013100037092 A CN 2013100037092A CN 201310003709 A CN201310003709 A CN 201310003709A CN 103118007 A CN103118007 A CN 103118007A
Authority
CN
China
Prior art keywords
url
address
url information
time
information corresponding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013100037092A
Other languages
Chinese (zh)
Other versions
CN103118007B (en
Inventor
田海燕
练书成
丁毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Raisecom Technology Co Ltd
Original Assignee
Raisecom Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Raisecom Technology Co Ltd filed Critical Raisecom Technology Co Ltd
Priority to CN201310003709.2A priority Critical patent/CN103118007B/en
Publication of CN103118007A publication Critical patent/CN103118007A/en
Application granted granted Critical
Publication of CN103118007B publication Critical patent/CN103118007B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a method and a system of acquiring a user access behavior. The method comprises acquiring webpage access data, filtering the fields of the http head portion of network access data according to a preset strategy, processing uniform resource locator (URL) information of filtered messages and obtaining the network access behavior of a user.

Description

A kind of acquisition methods of user access activity and system
Technical field
The present invention relates to data processing field, relate in particular to a kind of acquisition methods and system of user access activity.
Background technology
The develop rapidly of information technology at present and improving constantly of IT application in enterprises degree, user's use is also more and more abundanter.A lot of users by Internet resources learn, leisure, amusement etc.Some businessman is in order to obtain economic benefit, and beginning is done a large amount of advertising pictures in each website, goes back some advertisement promotion etc.The integration of three networks that simultaneous country advocates etc., the fusion of each business has entered the operation of enterprise.In order to ensure stable a, safety, network operation environment efficiently, keeper or the boss of enterprise have to usually face following problem---the internet behavior of supervisory user how? the operating position of tracking network application resource how?
In order to address the above problem, the internet behavior of recording user is inevitable.Especially record the behavior of enterprise staff browsing pages.Because we can understand the interested aspect of employee the content of pages of browsing by analysis user, some illegal speeches and access some illegal websites etc. perhaps whether have been done.These information also can provide important foundation for public security bureau solves a case etc.
In existing technology, the scheme of recording user internet behavior just simply extracts the URL of every link, sends.Due to the development of modern network technology, we attempt clicking a page, and this page will attempt linking advertisement associated with it, picture etc. so.The so last daily record that we see will be many unnecessary log audits to be arranged out.These unnecessary daily records accumulate for a long time, will be flushed to the back to our needed real daily record, and the keeper seems to confuse very much, how also can not find the daily record that oneself needs.The appearance of these a large amount of unnecessary daily records simultaneously also can take a large amount of memory spaces, and last phenomenon is that we have wasted many memory spaces, has preserved many daily records useless.The keeper seems also can have a bad headache, and does not know that is real needed log information.
Summary of the invention
Provided by the invention, the technical problem that solve is how to filter out the advertisement associated with it of link in the webpage of user access or the network linking of picture.
For solving the problems of the technologies described above, the invention provides following technical scheme:
A kind of acquisition methods of access to netwoks behavior comprises:
Obtain web page access data;
According to the strategy that sets in advance, the field in the http head of network access data is filtered;
URL information in message after filtering is processed, obtained user's access to netwoks behavior.
Preferably, described method also has following features: the described strategy that sets in advance comprises that selecting the http entity is compressed entity or the not compressed entity that contains the title feature, and wherein the field in the http head of selecteed entity meets following condition:
The Content-Type field is the type of text/html;
The Content-Length field is less than or equal to 1024 bytes;
The type of transfer-encoding stem is chunked, and the physical length of this response bag ends up with " .0d0a0d0a " greater than entity zero and this response bag;
The length of URL is less than 130 bytes;
The URL file suffixes is not .js .png .css .dif .klz .ico .xml .xsl .ani or .dll.
Preferably, described method also has following features:
Described method also comprises:
Record URL information corresponding to same IP address, with the URL information of record as user's access to netwoks behavior;
Described URL information in message after filtering is processed, is obtained user's access to netwoks behavior, comprising:
In the network access data of a certain IP address that filtration is obtained, this URL information corresponding to IP address of URL information and local record is mated;
If this URL information corresponding to IP address has match objects, export the URL information in described network access data; Otherwise, first the URL information in network access data is increased in URL information corresponding to this IP address, then exports URL information in described network access data.
Preferably, described method also has following features: in the network access data of the described a certain IP address that filtration is obtained, this URL information corresponding to IP address of URL information and local record is mated, and comprising:
The content of last N byte in the Adoption Network visit data in the URL information URL information corresponding with this IP address compares, and wherein the span of N is 20~1000.
Preferably, described method also has following features:
Described URL information corresponding to same IP address that records also comprises:
Record described the record same IP corresponding URL and the accessed time of this URL of address;
Described URL information in network access data is increased in URL information corresponding to this IP address also comprises:
After the number of URL information corresponding to this IP address reaches the number threshold value that sets in advance, according to accessed time of each URL in this IP address, the information of deleting URL the earliest of accessed time.
Preferably, described method also has following features:
Described URL information in message after filtering is processed, the access to netwoks behavior that obtains the user also comprises:
If this URL information corresponding to IP address has match objects, obtain the accessed time of this URL; The time accessed according to this URL, initiate the operation that the access time of this URL in this IP address is upgraded.
Preferably, described method also has following features:
The operation that described initiation was upgraded the access time of this URL in this IP address also comprises:
If the difference of the time that the accessed time of this URL and this match objects are accessed is more than or equal to the time threshold that sets in advance, the time that match objects is accessed is updated to the initiation time of described network linking.
Preferably, described method also has following features: described method also comprises:
After if a certain URL is accessed, this URL is linked to one or more URL, before the URL information in the output network visit data, in URL information in the Network Search visit data, whether predefined keyword is arranged, the URL that does not comprise this keyword is carried out output function as the URL information in final network access data, the keyword of other URL that wherein said keyword is linked to for this URL.
A kind of system that obtains of access to netwoks behavior is characterized in that, comprising:
Deriving means is used for obtaining web page access data;
Filter is connected with described deriving means, is used for according to the strategy that sets in advance, the field in the http head of network access data being filtered;
Processing unit is connected with described filter, is used for the URL information of the message after filtering is processed, and obtains user's access to netwoks behavior.
Preferably, described system also has following features: the described strategy that sets in advance comprises that selecting the http entity is compressed entity or the not compressed entity that contains the title feature, and wherein the field in the http head of selecteed entity meets following condition:
The Content-Type field is the type of text/html;
The Content-Length field is less than or equal to 1024 bytes;
The type of transfer-encoding stem is chunked, and the physical length of this response bag ends up with " .0d0a0d0a " greater than entity zero and this response bag;
The length of URL is less than 130 bytes;
The URL file suffixes is not .js .png .css .dif .klz .ico .xml .xsl .ani or .dll.
Preferably, described system also has following features:
Described system also comprises:
The first tape deck is used for recording URL information corresponding to same IP address, with the access to netwoks behavior as the user of the URL information of record;
Described processing unit comprises:
Matching module is connected with described tape deck, and this URL information corresponding to IP address of the network access data URL information of a certain IP address that is used for filtration is obtained and local record is mated;
Processing module is connected with described matching module, there is no match objects if be used for this URL information corresponding to IP address, and the URL information in network access data is increased in URL information corresponding to this IP address;
Output module is connected with described matching module, if be used for this URL information corresponding to IP address, match objects is arranged, and exports the URL information in described network access data; And, be connected with described processing module, after being used in processing module, the URL information of network access data being increased in URL information corresponding to this IP address, then export URL information in described network access data.
Preferably, described system also has following features: the content of last N byte in described matching module Adoption Network visit data in the URL information URL information corresponding with this IP address compares, and wherein the span of N is 20~1000.
Preferably, described system also has following features:
Described the first recording device records is described to be recorded same IP corresponding URL and the accessed time of this URL of address;
Described processing module also comprises:
Delete cells, be used for when the URL of network access data information is increased in URL information corresponding to this IP address, after the number of URL information corresponding to this IP address reaches the number threshold value that sets in advance, according to accessed time of each URL in this IP address, the information of deleting URL the earliest of accessed time.
Preferably, described system also has following features: described processing module also comprises:
Updating block is connected with described delete cells, if be used for this URL information corresponding to IP address, match objects is arranged, and obtains the accessed time of this URL; The time accessed according to this URL, initiate the operation that the access time of this URL in this IP address is upgraded.
Preferably, described system also has following features: described updating block is used for:
If the difference of the time that the accessed time of this URL and this match objects are accessed is more than or equal to the time threshold that sets in advance, the time that match objects is accessed is updated to the initiation time of described network linking.
Preferably, described system also has following features: described processing unit also comprises:
Filtering module, be connected with described output module, if after accessed for a certain URL, this URL is linked to one or more URL, before the URL information in the output network visit data, in URL information in the Network Search visit data, whether predefined keyword is arranged, the URL that does not comprise this keyword is carried out output function as the URL information in final network access data, the keyword of other URL that wherein said keyword is linked to for this URL.
Compared with prior art, embodiment of the method provided by the invention is by filtering the field in the http head of network access data, filter out a part of network access data irrelevant with network management, then obtain the access to netwoks behavior of real needs according to remaining network access data.
Description of drawings
Fig. 1 is the schematic flow sheet of the acquisition methods embodiment of access to netwoks behavior provided by the invention;
Fig. 2 is the schematic flow sheet of the acquisition methods application example of access to netwoks behavior provided by the invention;
Fig. 3 is the schematic flow sheet of step 209 in Application Example of the present invention;
Fig. 4 is the structural representation that obtains system embodiment of access to netwoks behavior provided by the invention.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, the present invention is described in further detail below in conjunction with the accompanying drawings and the specific embodiments.Need to prove, in the situation that do not conflict, the embodiment in the application and the feature in embodiment be combination in any mutually.
Fig. 1 is the schematic flow sheet of the acquisition methods embodiment of access to netwoks behavior provided by the invention.Embodiment of the method shown in Figure 1 comprises:
Step 101, obtain web page access data;
The strategy that step 102, basis set in advance filters the field in the http head of network access data;
Step 103, the URL information in the message after filtering is processed, obtained user's access to netwoks behavior.
Compared with prior art, embodiment of the method provided by the invention is by filtering the field in the http head of network access data, filter out a part of network access data irrelevant with network management, then obtain the access to netwoks behavior of real needs according to remaining network access data.
The below is described further embodiment of the method provided by the invention:
The described strategy that sets in advance comprises that selecting the http entity is compressed entity or the not compressed entity that contains the title feature, and wherein the field in the http head of selecteed entity meets following condition:
The Content-Type field is the type of text/html;
The Content-Length field is less than or equal to 1024 bytes;
The type of transfer-encoding stem is chunked, and the physical length of this response bag ends up with " .0d0a0d0a " greater than entity zero and this response bag;
The length of URL is less than 130 bytes;
The URL file suffixes is not .js .png .css .dif .klz .ico .xml .xsl .ani or .dll.
Need to prove, the length of why selecting URL is because through detecting less than 130 bytes, and the byte number of URL that draws unwanted daily record is long, and majority is all 200 multibytes left and right, and the length of therefore controlling URL is 130; And, limit for the URL file suffixes, be because the user is when opening required network address, because this required network address can link some publicity page or advertising page, and the file of these publicity page or advertising page is with above-mentioned suffix, and webpage itself is there is no suffix.Therefore by the filtration to suffix, can effectively be filled into some alternative documents of web page interlinkage, such as, just more subsidiary URL generations with suffix such as .xsl .css, the .xml when user opens www.163.com, by the filtration of suffix, what can draw user's actual access is that URL is www.163.com.
This shows, by above-mentioned filtercondition, can effectively filter out the data that record the access to netwoks behavior, remove irrelevant log information, reach the purpose of simplifying the log store space, in addition, after the filtration of network access data process, quantity obviously reduces, and has reduced the processing quantity of obtaining the access to netwoks behavior.
In actual applications, because the user can frequently access a website usually, if all should the access behavior carry out record at every turn, will certainly produce the information of a lot of repetitions, therefore described method also comprises:
Record URL information corresponding to same IP address, with the URL information of record as user's access to netwoks behavior;
Described URL information in message after filtering is processed, is obtained user's access to netwoks behavior, comprising:
In the network access data of a certain IP address that filtration is obtained, this URL information corresponding to IP address of URL information and local record is mated;
If this URL information corresponding to IP address has match objects, export the URL information in described network access data; Otherwise, first the URL information in network access data is increased in URL information corresponding to this IP address, then exports URL information in described network access data.
Specifically, when the access behavior that gets the user, compare with the content that has recorded by the content with field in network access data, and then determine whether and to write, prevent writing of duplicate message, simplify the data capacity of access to netwoks behavior.
Due to referer and URL long, what have even reaches 2000 multibytes, so when both compared, processing pressure was larger, has therefore only kept the URL information in the network access data and the content of last N the byte of the URL that recorded compares.Wherein N value will guarantee the coupling of the information that realizes on the one hand, and on the other hand, and the length of byte should be not long, is controlled in 20~1000 bytes range.In the present invention, adopt last 20 bytes to process.
Consider that gateway server needs a plurality of users' access to netwoks behavior, therefore, when the number of the referer field that records under same IP address is safeguarded, can by carrying out following scheme, comprise:
Described URL information corresponding to same IP address that records also comprises:
Record described the record same IP corresponding URL and the accessed time of this URL of address;
Described URL information in network access data is increased in URL information corresponding to this IP address, also comprises:
After the number of URL information corresponding to this IP address reaches the number threshold value that sets in advance, according to accessed time of each URL in this IP address, the information of deleting URL the earliest of accessed time.
Wherein, this number threshold value is the processing higher limit that server can be when under an IP address, URL mates, by deleting URL the earliest of accessed time under this IP address, the URL that records under this IP of real-time ensuring address is the up-to-date access behavior of user, facilitates network operation.
Webpage for the frequent access of some users, owing to being access time reason the earliest, can delete from URL corresponding to this IP address, but very fast access again due to the user, increased with a newer access time again, the problem that has caused the frequent deletion of same URL or increased, therefore for fear of the appearance of the problems referred to above, described URL information in message after filtering is processed, the access to netwoks behavior that obtains the user also comprises: if this URL information corresponding to IP address has match objects, obtain the accessed time of this URL; The time accessed according to this URL, initiate the operation that the access time of this URL in this IP address is upgraded.
Certainly, in order effectively to control the update times to the access time of same URL, the operation that described initiation was upgraded the access time of this URL in this IP address also comprises:
If the difference of the time that the accessed time of this URL and this match objects are accessed is more than or equal to the time threshold that sets in advance, the time that match objects is accessed is updated to the initiation time of described network linking.
In actual applications, when clicking a network address, except the real URL daily record of this click, can be accompanied by and produce some unnecessary URL daily records, therefore the processing on carry out is that filtration is incomplete, added at last one, if the URL of some special keyword is not again that we are needed.For example, when clicking www.taobao.com except produce http: can produce simultaneously some unnecessary URL such as acookie.taobao.com, www.taobao.com/go/act/sale etc. the www.taobao.com daily record, therefore, the URL corresponding for the IP address that makes record is more accurate, and described method also comprises:
After if a certain URL is accessed, one or more URL that this URL is linked to, before the URL information in the output network visit data, in URL information in the Network Search visit data, whether predefined keyword is arranged, the URL that does not comprise this keyword is carried out output function as the URL information in final network access data, the keyword of other URL that wherein said keyword is linked to for this URL.
The below is described further embodiment of the method provided by the invention:
Fig. 2 is the schematic flow sheet of the acquisition methods application example of access to netwoks behavior provided by the invention.Method application example shown in Figure 2 comprises step 201~step 209, wherein:
Step 201: identify accurately the http message from Client-initiated tcp message.
Step 202: whether the stem Content-Type that judges http is the type of text/html, and if so, execution in step 203; Otherwise flow process finishes.
Step 203: judge whether http stem Content-Encoding is that the gzip/deflate type is if execution in step 205 is processed; Otherwise execution in step 204 is processed.
Step 204: search the title character string in the entity of judgement http, if do not have, flow process finishes; If there is this character string to enter step 205~s108, wherein there is no significantly front and back order between s205~s208.
Step 205: judge that the content-length of stem Content-Length of http is whether between 0-1024.
Step 206: judge that http responds the transfer-encoding that wraps and whether meets following feature, comprising: the type of stem is chunked, and the physical length of this response bag ends up with " .0d0a0d0a " greater than entity zero and this response bag;
Whether the length of step 207:URL is less than 130;
Step 208: check that the suffix of URL is the suffix except .js .png .css .dif .klz, ico .xml .xsl .ani or .dll;
In the situation that the execution result of step 205~s108 is is sure, execution in step 209.
Step 209: the referer by the http stem filters out unnecessary URL daily record, and concrete processing comprises steps A 01~steps A 06:
Fig. 3 is the schematic flow sheet of step 209 in Application Example of the present invention.Comprising steps A 01~steps A 06, wherein:
Steps A 01: check whether http stem referer is empty, enters steps A 02 processing if referer is sky, if do not process for sky enters steps A 03.
Steps A 02: set up a hash table take ip as keyword, this hash table comprises that a chained list stores the particular content of rear 20 bytes (in order to save internal memory) of URL in referer and the accessed time of URL in this referer.For example, this chain list processing can be preserved at most 5 URL and accessed times of each URL in referer, due to referer and URL long, what have even reaches 2000 multibytes, process so only kept rear 20 bytes of referer and URL, certainly, also can expand according to the needs of oneself, as the number of the referer that can preserve in chained list, perhaps the comparison length of referer and URL can lengthen or shorten.Be the hash node of keyword if there is this ip in chained list, URL with request inserts in the referer array of this ip node so, when if the number of referer storage equals 5, the URL that will newly arrive after the URL deletion that will insert at first so inserts the ip node.If not having this ip in chained list is the hash node of keyword, set up so an ip node, URL is inserted this ip node, simultaneously with this ip Knots inserting chained list.
Steps A 03: the referer array content in the referer header contents in this link and hash table compares, if whether store fully the difference of timestamp of this referer in the matching check timestamp that matches referer and chained list less than 10 seconds, if return less than 10 seconds, do not send log processing; If being not less than 10 seconds record access enters steps A 02 and processes after the time.
Certainly, before the URL of request inserts the referer array of this ip node, can also carry out keyword filtration, be described as follows:
Steps A 04: when clicking a network address, except the real URL daily record of this click, can be accompanied by and produce some unnecessary URL daily records, only carrying out top processing is that filtration is incomplete, has added at last one, if the URL of some special keyword is not again that we are needed, just these keywords are mated, if the match is successful, return so, do not send log processing; If mating the unsuccessful steps A 05 that enters processes, wherein this keyword used herein obtains by the packet capturing analysis, the keyword that will filter is added in an array, mate these keywords with remaining URL after filtering, if have on coupling, this URL does not just send daily record so, otherwise sends daily record.
Steps A 05: process through the many conditions in front the URL that also keeps, this URL is sent to database storage.
Steps A 06: return to this function and be left intact.
Method application examples provided by the invention, by analyzing the stem of http agreement: the method for the length of Content-Type, Content-Encoding, Content-Length, transfer-encoding and URL, the filtration of URL file suffixes, the filtration of URL feature, referer and ip address combination, filter out a large amount of unnecessary URL daily records, with the memory space that takes full advantage of internal memory and the URL daily record that shows the real needs of user to the user.
Fig. 4 is the structural representation that obtains system embodiment of access to netwoks behavior provided by the invention.System shown in Figure 4 embodiment comprises:
Deriving means 401 is used for obtaining web page access data;
Filter 402 is connected with described deriving means 401, is used for according to the strategy that sets in advance, the field in the http head of network access data being filtered;
Processing unit 403 is connected with described filter 402, is used for the URL information of the message after filtering is processed, and obtains user's access to netwoks behavior.
Wherein, the described strategy that sets in advance comprises that selecting the http entity is compressed entity or the not compressed entity that contains the title feature, and wherein the field in the http head of selecteed entity meets following condition:
The Content-Type field is the type of text/html;
The Content-Length field is less than or equal to 1024 bytes;
The type of transfer-encoding stem is chunked, and the physical length of this response bag ends up with " .0d0a0d0a " greater than entity zero and this response bag;
The length of URL is less than 130 bytes;
The URL file suffixes is not .js .png .css .dif .klz .ico .xml .xsl .ani or .dll.
Described system also comprises:
The first tape deck is used for recording URL information corresponding to same IP address, with the access to netwoks behavior as the user of the URL information of record;
Described processing unit comprises:
Matching module is connected with described tape deck, and this URL information corresponding to IP address of the network access data URL information of a certain IP address that is used for filtration is obtained and local record is mated;
Processing module is connected with described matching module, there is no match objects if be used for this URL information corresponding to IP address, and the URL information in network access data is increased in URL information corresponding to this IP address;
Output module is connected with described matching module, if be used for this URL information corresponding to IP address, match objects is arranged, and exports the URL information in described network access data; And, be connected with described processing module, after being used in processing module, the URL information of network access data being increased in URL information corresponding to this IP address, then export URL information in described network access data.
Optionally, the content of last N byte in described matching module Adoption Network visit data in the URL information URL information corresponding with this IP address compares, and wherein the span of N is 20~1000.
Optionally, the described record same IP corresponding URL and the accessed time of this URL of address of described the first recording device records;
Optionally, described processing module also comprises:
Delete cells, be used for when the URL of network access data information is increased in URL information corresponding to this IP address, after the number of URL information corresponding to this IP address reaches the number threshold value that sets in advance, according to accessed time of each URL in this IP address, the information of deleting URL the earliest of accessed time.
Optionally, described processing module also comprises:
Updating block is connected with described delete cells, if be used for this URL information corresponding to IP address, match objects is arranged, and obtains the accessed time of this URL; The time accessed according to this URL, initiate the operation that the access time of this URL in this IP address is upgraded.
Wherein, described updating block is used for:
If the difference of the time that the accessed time of this URL and this match objects are accessed is more than or equal to the time threshold that sets in advance, the time that match objects is accessed is updated to the initiation time of described network linking.
Optionally, described processing unit also comprises:
Filtering module, be connected with described output module, if after accessed for a certain URL, this URL is linked to one or more URL, before the URL information in the output network visit data, in URL information in the Network Search visit data, whether predefined keyword is arranged, the URL that does not comprise this keyword is carried out output function as the URL information in final network access data, the keyword of other URL that wherein said keyword is linked to for this URL.
Compared with prior art, system embodiment provided by the invention is by filtering the field in the http head of network access data, filter out a part of network access data irrelevant with network management, then obtain the access to netwoks behavior of real needs according to remaining network access data.
The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited to this, anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; can expect easily changing or replacing, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the described protection range of claim.

Claims (16)

1. the acquisition methods of an access to netwoks behavior, is characterized in that, comprising:
Obtain web page access data;
According to the strategy that sets in advance, the field in the http head of network access data is filtered;
URL information in message after filtering is processed, obtained user's access to netwoks behavior.
2. method according to claim 1, it is characterized in that, the described strategy that sets in advance comprises that selecting the http entity is compressed entity or the not compressed entity that contains the title feature, and wherein the field in the http head of selecteed entity meets following condition:
The Content-Type field is the type of text/html;
The Content-Length field is less than or equal to 1024 bytes;
The type of transfer-encoding stem is chunked, and the physical length of this response bag ends up with " .0d0a0d0a " greater than entity zero and this response bag;
The length of URL is less than 130 bytes;
The URL file suffixes is not .js .png .css .dif .klz .ico .xml .xsl .ani or .dll.
3. method according to claim 1 is characterized in that:
Described method also comprises:
Record URL information corresponding to same IP address, with the URL information of record as user's access to netwoks behavior;
Described URL information in message after filtering is processed, is obtained user's access to netwoks behavior, comprising:
In the network access data of a certain IP address that filtration is obtained, this URL information corresponding to IP address of URL information and local record is mated;
If this URL information corresponding to IP address has match objects, export the URL information in described network access data; Otherwise, first the URL information in network access data is increased in URL information corresponding to this IP address, then exports URL information in described network access data.
4. method according to claim 3, is characterized in that, in the network access data of the described a certain IP address that filtration is obtained, this URL information corresponding to IP address of URL information and local record is mated, and comprising:
The content of last N byte in the Adoption Network visit data in the URL information URL information corresponding with this IP address compares, and wherein the span of N is 20~1000.
5. method according to claim 3 is characterized in that:
Described URL information corresponding to same IP address that records also comprises:
Record described the record same IP corresponding URL and the accessed time of this URL of address;
Described URL information in network access data is increased in URL information corresponding to this IP address also comprises:
After the number of URL information corresponding to this IP address reaches the number threshold value that sets in advance, according to accessed time of each URL in this IP address, the information of deleting URL the earliest of accessed time.
6. method according to claim 5, is characterized in that, described URL information in message after filtering processed, and the access to netwoks behavior that obtains the user also comprises:
If this URL information corresponding to IP address has match objects, obtain the accessed time of this URL; The time accessed according to this URL, initiate the operation that the access time of this URL in this IP address is upgraded.
7. method according to claim 5, is characterized in that, the operation that described initiation was upgraded the access time of this URL in this IP address also comprises:
If the difference of the time that the accessed time of this URL and this match objects are accessed is more than or equal to the time threshold that sets in advance, the time that match objects is accessed is updated to the initiation time of described network linking.
8. method according to claim 3, is characterized in that, described method also comprises:
After if a certain URL is accessed, this URL is linked to one or more URL, before the URL information in the output network visit data, in URL information in the Network Search visit data, whether predefined keyword is arranged, the URL that does not comprise this keyword is carried out output function as the URL information in final network access data, the keyword of other URL that wherein said keyword is linked to for this URL.
9. the system that obtains of an access to netwoks behavior, is characterized in that, comprising:
Deriving means is used for obtaining web page access data;
Filter is connected with described deriving means, is used for according to the strategy that sets in advance, the field in the http head of network access data being filtered;
Processing unit is connected with described filter, is used for the URL information of the message after filtering is processed, and obtains user's access to netwoks behavior.
10. system according to claim 9, it is characterized in that, the described strategy that sets in advance comprises that selecting the http entity is compressed entity or the not compressed entity that contains the title feature, and wherein the field in the http head of selecteed entity meets following condition:
The Content-Type field is the type of text/html;
The Content-Length field is less than or equal to 1024 bytes;
The type of transfer-encoding stem is chunked, and the physical length of this response bag ends up with " .0d0a0d0a " greater than entity zero and this response bag;
The length of URL is less than 130 bytes;
The URL file suffixes is not .js .png .css .dif .klz .ico .xml .xsl .ani or .dll.
11. system according to claim 9 is characterized in that:
Described system also comprises:
The first tape deck is used for recording URL information corresponding to same IP address, with the access to netwoks behavior as the user of the URL information of record;
Described processing unit comprises:
Matching module is connected with described tape deck, and this URL information corresponding to IP address of the network access data URL information of a certain IP address that is used for filtration is obtained and local record is mated;
Processing module is connected with described matching module, there is no match objects if be used for this URL information corresponding to IP address, and the URL information in network access data is increased in URL information corresponding to this IP address;
Output module is connected with described matching module, if be used for this URL information corresponding to IP address, match objects is arranged, and exports the URL information in described network access data; And, be connected with described processing module, after being used in processing module, the URL information of network access data being increased in URL information corresponding to this IP address, then export URL information in described network access data.
12. system according to claim 11 is characterized in that, the content of last N byte in described matching module Adoption Network visit data in the URL information URL information corresponding with this IP address compares, and wherein the span of N is 20~1000.
13. method according to claim 11 is characterized in that:
Described the first recording device records is described to be recorded same IP corresponding URL and the accessed time of this URL of address;
Described processing module also comprises:
Delete cells, be used for when the URL of network access data information is increased in URL information corresponding to this IP address, after the number of URL information corresponding to this IP address reaches the number threshold value that sets in advance, according to accessed time of each URL in this IP address, the information of deleting URL the earliest of accessed time.
14. method according to claim 13 is characterized in that, described processing module also comprises:
Updating block is connected with described delete cells, if be used for this URL information corresponding to IP address, match objects is arranged, and obtains the accessed time of this URL; The time accessed according to this URL, initiate the operation that the access time of this URL in this IP address is upgraded.
15. method according to claim 13 is characterized in that, described updating block is used for:
If the difference of the time that the accessed time of this URL and this match objects are accessed is more than or equal to the time threshold that sets in advance, the time that match objects is accessed is updated to the initiation time of described network linking.
16. method according to claim 11 is characterized in that, described processing unit also comprises:
Filtering module, be connected with described output module, if after accessed for a certain URL, this URL is linked to one or more URL, before the URL information in the output network visit data, in URL information in the Network Search visit data, whether predefined keyword is arranged, the URL that does not comprise this keyword is carried out output function as the URL information in final network access data, the keyword of other URL that wherein said keyword is linked to for this URL.
CN201310003709.2A 2013-01-06 2013-01-06 A kind of acquisition methods of user access activity and system Active CN103118007B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310003709.2A CN103118007B (en) 2013-01-06 2013-01-06 A kind of acquisition methods of user access activity and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310003709.2A CN103118007B (en) 2013-01-06 2013-01-06 A kind of acquisition methods of user access activity and system

Publications (2)

Publication Number Publication Date
CN103118007A true CN103118007A (en) 2013-05-22
CN103118007B CN103118007B (en) 2016-02-03

Family

ID=48416281

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310003709.2A Active CN103118007B (en) 2013-01-06 2013-01-06 A kind of acquisition methods of user access activity and system

Country Status (1)

Country Link
CN (1) CN103118007B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530160A (en) * 2013-10-21 2014-01-22 迈普通信技术股份有限公司 Page loading method and device
CN103593484A (en) * 2013-12-03 2014-02-19 南京安讯科技有限责任公司 Method for filtering garbage logs during mobile phone internet surfing
CN104021143A (en) * 2014-05-14 2014-09-03 北京网康科技有限公司 Method and device for recording webpage access behavior
CN104239353A (en) * 2013-06-20 2014-12-24 上海博达数据通信有限公司 WEB classification control and log auditing method
CN104270358A (en) * 2014-09-25 2015-01-07 同济大学 Trusted network transaction system client side monitor and implementation method thereof
CN105049446A (en) * 2015-08-20 2015-11-11 中国联合网络通信集团有限公司 Method and system for filtering URL (Uniform Resource Locator)
CN105677657A (en) * 2014-11-19 2016-06-15 杭州华三通信技术有限公司 Recoding method and device for access behaviors of uniform resource locators
CN105827522A (en) * 2015-11-10 2016-08-03 广东亿迅科技有限公司 Gateway equipment for processing log files
CN105991369A (en) * 2015-03-23 2016-10-05 杭州迪普科技有限公司 Message information extracting method and device
CN106357482A (en) * 2016-11-30 2017-01-25 四川秘无痕信息安全技术有限责任公司 Method for implementing monitoring of webpage access based on network protocol
CN106411944A (en) * 2016-11-25 2017-02-15 锐捷网络股份有限公司 Network access management method and apparatus
CN107480190A (en) * 2017-07-11 2017-12-15 国家计算机网络与信息安全管理中心 A kind of filter method and device of non-artificial access log
CN108121749A (en) * 2016-11-30 2018-06-05 北京国双科技有限公司 Website user's behavior analysis method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1121792A1 (en) * 1998-10-15 2001-08-08 Computer Associates Think, Inc. Method and system for the prevention of undesirable activities of executable objects
CN102004770A (en) * 2010-11-16 2011-04-06 杭州迪普科技有限公司 Webpage auditing method and device
CN102098229A (en) * 2011-03-04 2011-06-15 北京星网锐捷网络技术有限公司 Method and device for optimizing and auditing uniform resource locator (URL) as well as network device
CN102158499A (en) * 2011-06-02 2011-08-17 国家计算机病毒应急处理中心 Trojan-embedded website detection method based on hyper text transfer protocol (HTTP) traffic analysis
CN102254004A (en) * 2011-07-14 2011-11-23 北京邮电大学 Method and system for modeling Web in weblog excavation
CN102857572A (en) * 2012-09-14 2013-01-02 北京星网锐捷网络技术有限公司 Method and device for processing HTTP (hyper text transport protocol) access request and gateway equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1121792A1 (en) * 1998-10-15 2001-08-08 Computer Associates Think, Inc. Method and system for the prevention of undesirable activities of executable objects
CN102004770A (en) * 2010-11-16 2011-04-06 杭州迪普科技有限公司 Webpage auditing method and device
CN102098229A (en) * 2011-03-04 2011-06-15 北京星网锐捷网络技术有限公司 Method and device for optimizing and auditing uniform resource locator (URL) as well as network device
CN102158499A (en) * 2011-06-02 2011-08-17 国家计算机病毒应急处理中心 Trojan-embedded website detection method based on hyper text transfer protocol (HTTP) traffic analysis
CN102254004A (en) * 2011-07-14 2011-11-23 北京邮电大学 Method and system for modeling Web in weblog excavation
CN102857572A (en) * 2012-09-14 2013-01-02 北京星网锐捷网络技术有限公司 Method and device for processing HTTP (hyper text transport protocol) access request and gateway equipment

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239353A (en) * 2013-06-20 2014-12-24 上海博达数据通信有限公司 WEB classification control and log auditing method
CN104239353B (en) * 2013-06-20 2019-12-31 上海博达数据通信有限公司 WEB classification control and log audit method
CN103530160A (en) * 2013-10-21 2014-01-22 迈普通信技术股份有限公司 Page loading method and device
CN103593484A (en) * 2013-12-03 2014-02-19 南京安讯科技有限责任公司 Method for filtering garbage logs during mobile phone internet surfing
CN104021143A (en) * 2014-05-14 2014-09-03 北京网康科技有限公司 Method and device for recording webpage access behavior
CN104270358A (en) * 2014-09-25 2015-01-07 同济大学 Trusted network transaction system client side monitor and implementation method thereof
CN104270358B (en) * 2014-09-25 2018-10-26 同济大学 Trustable network transaction system client monitor and its implementation
CN105677657A (en) * 2014-11-19 2016-06-15 杭州华三通信技术有限公司 Recoding method and device for access behaviors of uniform resource locators
CN105991369A (en) * 2015-03-23 2016-10-05 杭州迪普科技有限公司 Message information extracting method and device
CN105991369B (en) * 2015-03-23 2020-03-06 杭州迪普科技股份有限公司 Message information extraction method and device
CN105049446A (en) * 2015-08-20 2015-11-11 中国联合网络通信集团有限公司 Method and system for filtering URL (Uniform Resource Locator)
CN105827522A (en) * 2015-11-10 2016-08-03 广东亿迅科技有限公司 Gateway equipment for processing log files
CN106411944A (en) * 2016-11-25 2017-02-15 锐捷网络股份有限公司 Network access management method and apparatus
CN106357482A (en) * 2016-11-30 2017-01-25 四川秘无痕信息安全技术有限责任公司 Method for implementing monitoring of webpage access based on network protocol
CN108121749A (en) * 2016-11-30 2018-06-05 北京国双科技有限公司 Website user's behavior analysis method and device
CN106357482B (en) * 2016-11-30 2019-10-29 四川秘无痕科技有限责任公司 A method of based on network protocol implementing monitoring web page access
CN107480190A (en) * 2017-07-11 2017-12-15 国家计算机网络与信息安全管理中心 A kind of filter method and device of non-artificial access log

Also Published As

Publication number Publication date
CN103118007B (en) 2016-02-03

Similar Documents

Publication Publication Date Title
CN103118007B (en) A kind of acquisition methods of user access activity and system
CN105608134B (en) A kind of network crawler system and its web page crawl method based on multithreading
US9218482B2 (en) Method and device for detecting phishing web page
US9600470B2 (en) Method and system relating to re-labelling multi-document clusters
CN102722563B (en) Method and device for displaying page
US6910071B2 (en) Surveillance monitoring and automated reporting method for detecting data changes
US8683311B2 (en) Generating structured data objects from unstructured web pages
CN104714965B (en) Static resource De-weight method, static resource management method and device
CN102868719B (en) A kind of Network Access Method based on buffer memory and server
CN100438435C (en) Method for limiting browser access network address
CN102436564A (en) Method and device for identifying falsified webpage
CN103530429B (en) Webpage content extracting method
CN104239353B (en) WEB classification control and log audit method
CN103577482B (en) A kind of webpage collection method, device and browser
CN102970348B (en) Network application method for pushing, system and network application server
US8713368B2 (en) Methods for testing OData services
CN106874778B (en) Intelligent terminal file acquisition and data recovery system and method based on android system
CN104301304A (en) Vulnerability detection system based on large ISP interconnection port and method thereof
CN102098229A (en) Method and device for optimizing and auditing uniform resource locator (URL) as well as network device
CN109688097A (en) Website protection method, website protective device, website safeguard and storage medium
CN105302801A (en) Resource caching method and apparatus
CN101887463B (en) Virtual domain-based HTTP reduction display method
CN102130791A (en) Method, device and gateway server for detecting agent on gateway server
CN105554181A (en) DNS log compression method and device
CN103117892A (en) Method and device for adding website access record

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant