CN103530297B - A kind of automatic method and device for carrying out web analytics - Google Patents
A kind of automatic method and device for carrying out web analytics Download PDFInfo
- Publication number
- CN103530297B CN103530297B CN201210232731.XA CN201210232731A CN103530297B CN 103530297 B CN103530297 B CN 103530297B CN 201210232731 A CN201210232731 A CN 201210232731A CN 103530297 B CN103530297 B CN 103530297B
- Authority
- CN
- China
- Prior art keywords
- chained address
- catalogue
- occurrence
- value
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention provides a kind of automatic method and device for carrying out web analytics, wherein carrying out the method for web analytics automatically includes:A. more than one chained address is obtained from the request data for reaching website;B. determine the possibility value of the catalogues at different levels under each main domain that the website includes as the currency of catalogues at different levels by the use of the chained address of acquisition;C. the chained address scope that the currency of catalogues at different levels is formed is compared with the chained address scope that the history value of catalogues at different levels is formed to judge that the website whether there is newly-increased chained address or failure chained address.By the above-mentioned means, easily can be monitored to website, the efficiency of website operation is improved.
Description
【Technical field】
The present invention relates to data processing technique, more particularly to a kind of automatic method and device for carrying out web analytics.
【Background technology】
Website provides a user service, is realized by various web page files, and these web page files are in net
It is to carry out tissue by various catalogues on standing.During providing service in website, as the attendant of website, very may be used
It can need to modify to the web page files for providing service in website, including web page files are modified in itself, or to logical
To the path of web page files(Lead to each catalogue of web page files)Modify, this cause website operation a period of time with
Afterwards, structure may change a lot.In the case of in the past small in website, can also by artificial mode come
Understand the change that website structure occurs, to understand the situation of website in time, monitor the operation of website.But with Internet technology
Development, website provide service it is more and more, scale is also increasing, and this causes means only manually, is difficult to spy upon
The overall picture of whole website, therefore the unavoidable difficulty for causing web publishing, cause the efficiency that website is run to reduce.
【The content of the invention】
The technical problems to be solved by the invention are to provide a kind of automatic method and device for carrying out web analytics, to improve
The efficiency of website operation.
The present invention is to provide a kind of automatic method for carrying out web analytics to solve the technical scheme that technical problem uses,
Including:A. more than one chained address is obtained from the request data for reaching website;B. determined using the chained address obtained
Currency of the possibility value of catalogues at different levels under each main domain that the website includes as catalogues at different levels;C. by catalogues at different levels
The chained address scope that forms of the history value of chained address scope and catalogues at different levels that forms of currency be compared with judgement
The website whether there is newly-increased chained address or failure chained address.
According to one of present invention preferred embodiment, asking up to website is grabbed by bypassing mirror-image system in the step A
Data are sought, and more than one chained address is extracted from the request data.
According to one of present invention preferred embodiment, the step B includes:B1. it is main domain the chained address of acquisition to be split
With the form of catalogues at different levels;B2. the occurrence in the catalogue at the same level under the main domain is counted using the chained address in identical main domain, and
Occurrence in the catalogue at the same level counted on determines the possibility value of catalogue at the same level.
According to one of present invention preferred embodiment, the occurrence in the catalogue at the same level counted on determines catalogue at the same level
The step of possible value, includes:When the occurrence in the catalogue at the same level counted on is numeric type, the possibility of this grade of catalogue is determined
The lower limit of value is the minimum value in the occurrence counted on, and the upper limit of the possibility value of this grade of catalogue is the occurrence counted on
In maximum;When the occurrence in the catalogue at the same level counted on is enumeration type, the possibility value for determining this grade of catalogue is system
The each occurrence counted;When the occurrence in the catalogue at the same level counted on is character string type, the possibility of this grade of catalogue is determined
Value is arbitrary string.
According to one of present invention preferred embodiment, the step C includes:The chain that the currency of catalogues at different levels is formed is grounded
The chained address scope that location scope is formed with the history value of catalogues at different levels is compared;Belong to history value when existing in comparison result
The chained address scope of composition and be not belonging to currency composition chained address in the range of url history address when, gone through to described
History chained address sends access request, and when the access request can not return to accessible page, by the url history
Location is as failure chained address;When in comparison result exist belong to currency composition chained address scope and be not belonging to history value
During current link address in the range of the chained address of composition, using the current chained address as newly-increased chained address.
According to one of present invention preferred embodiment, methods described the step C take a step forward including:In the link of acquisition
Each parameter occurrence in the parameters combination transmitted identical chained address and each combination is counted in address, by parameters group
Cooperate to receive parameter for file corresponding with the chained address of acquisition in the website, and according to each parameter in each combination
The possibility value of parameter can be received described in occurrence determination;Also, methods described further connects in the step C by described
Receive parameter to be compared with history parameters, the possibility value for receiving parameter is compared to judge with historical parameter value
File in the website whether there is parameter modification.
According to one of present invention preferred embodiment, methods described further comprises after step A:By the chained address of acquisition
Matching checking is carried out with default abnormal Keyword List, and using the request data corresponding to the chained address of matching as abnormal
Data are accessed to provide early warning.
Present invention also offers a kind of automatic device for carrying out web analytics, including:Placement unit, for from reach website
Request data in obtain more than one chained address;Determining unit, for determining the net using the chained address of acquisition
Currency of the possibility value of catalogues at different levels under each main domain that station includes as catalogues at different levels;Comparing unit, for by respectively
The chained address scope that the chained address scope that the currency of level catalogue is formed is formed with the history value of catalogues at different levels is compared
To judge that the website whether there is newly-increased chained address or failure chained address.
According to one of present invention preferred embodiment, the placement unit grabs asking up to website by bypassing mirror-image system
Data are sought, and more than one chained address is extracted from the request data.
According to one of present invention preferred embodiment, the determining unit includes:Split cells, for the chain of acquisition to be grounded
Location splits the form for main domain and catalogue at different levels;First statistic unit, for counting the master using the chained address in identical main domain
The occurrence in catalogue at the same level under domain, and the occurrence in the catalogue at the same level counted on determines that the possibility of catalogue at the same level takes
Value.
According to one of present invention preferred embodiment, appearance of first statistic unit in the catalogue at the same level counted on
Value determines that the mode of the possibility value of catalogue at the same level includes:When the occurrence in the catalogue at the same level counted on is numeric type, really
The lower limit of the possibility value of fixed this grade of catalogue is the minimum value in the occurrence in this grade of catalogue counted on, this grade of catalogue can
The upper limit of energy value is the maximum of the occurrence in this grade of catalogue counted on;Occurrence in the catalogue at the same level counted on
For enumeration type when, the possibility value for determining this grade of catalogue is each occurrence in this grade of catalogue counted on;When what is counted on
When occurrence in catalogue at the same level is character string type, the possibility value for determining this grade of catalogue is arbitrary string.
According to one of present invention preferred embodiment, the comparing unit includes:Scope comparing unit, for by catalogues at different levels
The chained address scope of the history value compositions of chained address scope and catalogues at different levels that forms of currency be compared;Link is true
Order member, for when in comparison result exist belong to history value composition chained address scope and be not belonging to currency composition chain
During the url history address being grounded in the range of location, access request is sent to the url history address, and please in the access
Ask when can not return to accessible page, using the url history address as failure chained address;The link determining unit is also used
In when in comparison result exist belong to currency composition chained address scope and be not belonging to history value composition chained address model
When enclosing interior current link address, using the current chained address as newly-increased chained address.
According to one of present invention preferred embodiment, described device further comprises the second statistic unit, in acquisition
Each parameter occurrence in the parameters combination transmitted identical chained address and each combination is counted in chained address, by each ginseng
Array cooperation is that file corresponding with the chained address of acquisition receives parameter in the website, and according to each in each combination
The possibility value of parameter can be received described in the determination of parameter occurrence;Also, the comparing unit is further used for connecing described
Receive parameter to be compared with history parameters, the possibility value for receiving parameter is compared to judge with historical parameter value
File in the website whether there is parameter modification.
According to one of present invention preferred embodiment, described device further comprises:Detection unit, for by the link of acquisition
Address and default abnormal Keyword List carry out matching checking, and using the request data corresponding to the chained address of matching as
Abnormal access data are to provide early warning.
As can be seen from the above technical solutions, the request data that the present invention is sent by using user to website, can be right
Each chained address that the website provides service is effectively counted, and so as to be fully understood from the current structure of the website, and is led to
Cross compared with the 'historical structure of website, the various changes of website generation can be understood in time, website is supervised with facilitating
Control, so as to improve the efficiency of website operation.
【Brief description of the drawings】
Fig. 1 is the schematic flow sheet of the automatic method for carrying out web analytics in the present invention;
Fig. 2 is the schematic diagram that chained address scope compares in the present invention;
Fig. 3 is the structural schematic block diagram of the embodiment one of the automatic device for carrying out web analytics in the present invention;
Fig. 4 is the structural schematic block diagram of the embodiment of determining unit in the present invention;
Fig. 5 is the structural schematic block diagram of the embodiment of comparing unit in the present invention;
Fig. 6 is the structural schematic block diagram of the embodiment two of the automatic device for carrying out web analytics in the present invention;
Fig. 7 is the structural schematic block diagram of the embodiment three of the automatic device for carrying out web analytics in the present invention.
【Embodiment】
In order that the object, technical solutions and advantages of the present invention are clearer, below in conjunction with the accompanying drawings with specific embodiment pair
The present invention is described in detail.
It refer to Fig. 1.Fig. 1 is the schematic flow sheet of the automatic method for carrying out web analytics in the present invention.As shown in figure 1,
This method includes:
Step S101:More than one chained address is obtained from the request data for reaching website.
Step S102:The possibility of the catalogues at different levels under each main domain that the website includes is determined using the chained address of acquisition
Currency of the value as catalogues at different levels.
Step S103:The history value of the chained address scope that the currency of catalogues at different levels is formed and catalogues at different levels is formed
Chained address scope is compared to judge that the website whether there is newly-increased chained address or failure chained address.
Above-mentioned steps are specifically described below.
In step S101, request data up to website is grabbed by bypassing mirror-image system, and extracted from request data
Chained address.The effect of bypass mirror-image system is that the former request data for reaching website is copied as into a new data, so former
Request data may proceed to original interbehavior, and the new data replicated can be used as other processing, in the present invention, be exactly
Follow-up processing is carried out using the data for bypassing mirror-image system duplication.It is related to communications protocol that some are contained in request data
Information, further comprises asked page link address, these chained addresses can be extracted from request data in step S101.
It is appreciated that for a website, request of data is all concurrent, therefore as a rule, can be obtained in step S101
To substantial amounts of request data and extract many chained addresses.
Chained address has a hierarchical relationship, the main domain for being to provide service that the chopped-off head of chained address generally represents,
Catalogues at different levels are followed successively by after main domain, when a chained address reaches afterbody catalogue, have actually just corresponded to net
A web page files on standing.Such as a complete chained address is:Juli.baidu.com/zhuanli/jiagou shape
Formula, wherein "/" are the separator of chained address, if a complete chained address has been divided into stem portion by the separator, wherein
Part I " juli.baidu.com " represents main domain, and " zhuanli " and " jiagou " below is followed successively by the first order and second
Level catalogue.
Specifically, step S102 includes:
Step S1021:The chained address of acquisition is split to the form for main domain and catalogue at different levels;
Step S1022:The occurrence in the catalogue at the same level under the main domain, and root are counted using the chained address in identical main domain
According to statistics to catalogue at the same level in occurrence determine the possibility value of catalogue at the same level.
It is appreciated that to utilize the separator in chained address can be each link in step S1021 according to narration above
Address dividing is the form of main domain and catalogue at different levels.
Step S1022 is specifically included:
Step S1022_1:Chained address after step S1021 processing is classified according to main domain, by identical main domain
Chained address is divided into one kind.
Step S1022_2:For the chained address in identical main domain, the appearance in catalogue at the same level in these chained addresses is counted
Value.
Step S1022_3:Occurrence in the catalogue at the same level counted on determines the possibility value of catalogue at the same level.
These chained addresses below such as:
“ting.baidu.com/artist/1157”、“ting.baidu.com/artist/1107”、
“ting.baidu.com/artist/1130”、“ting.baidu.com/album/1474”、
“ting.baidu.com/album/1430”、“ting.baidu.com/album/1425”、
“zhidao.baidu.com/team/74”、“zhidao.baidu.com/team/80”、
“zhidao.baidu.com/team/65”、“zhidao.baidu.com/team/60”
In chained address above, there is two different main domains, be respectively " ting.baidu.com " and
“zhidao.baidu.com”.To " ting.baidu.com " this main domain, above-mentioned chained address is counted, obtains first order catalogue
In occurrence have " artist " and " album ", for first order catalogue " artist ", the occurrence in the catalogue of the second level has
" 1157 ", " 1107 ", " 1130 ", for first order catalogue " album ", the occurrence in the catalogue of the second level have " 1474 ",
“1430”、“1425”.To " zhidao.baidu.com " this main domain, above-mentioned chained address is counted, is obtained in first order catalogue
Occurrence have " team ", for first order catalogue " team ", the occurrence in the catalogue of the second level have " 74 ", " 80 ", " 65 ",
“60”。
Occurrence in step S1022_3 in the catalogue at the same level counted on determines the possibility value tool of catalogue at the same level
Body includes:
A. when the occurrence in the catalogue at the same level counted on is numeric type, the lower limit of the possibility value of this grade of catalogue is determined
For the minimum value in the occurrence that counts on, the upper limit of the possibility value of this grade of catalogue is the maximum in the occurrence counted on
Value.
B. when the occurrence in the catalogue at the same level counted on is enumeration type, the possibility value for determining this grade of catalogue is statistics
The each occurrence arrived.
C. when the occurrence in the catalogue at the same level counted on is character string type, the possibility value for determining this grade of catalogue is to appoint
Ideographic characters string.
Determine which kind of type is the occurrence in certain grade of catalogue be, including and be not limited to following strategy:
When occurrence is the numeral that distribution is more than preset value, it is numeric type to determine the occurrence in this grade of catalogue;
When the number in the set of occurrence is no more than preset value, it is enumeration type to determine the occurrence in this grade of catalogue;
When occurrence is character and when being not belonging to enumeration type, it is character string type to determine the occurrence in this grade of catalogue, it should
Understand that character here can be letter, or, letter and number combinatorics on words.
By step S101 and S102, the data that can be captured according to bypass mirror-image system, the at different levels of website are determined
The possibility value of catalogue, using the possibility value of catalogues at different levels as currency, then it can be determined by the currency of catalogues at different levels
The scope of one chained address, such as the mode according to the possibility value that catalogues at different levels are determined above, for what is got below
Chained address:
“ting.baidu.com/artist/1157”、“ting.baidu.com/artist/1107”、
“ting.baidu.com/artist/1130”、“ting.baidu.com/album/1474”、
“ting.baidu.com/album/1430”、“ting.baidu.com/album/1425”
The chained address scope that can be determined by the currency of catalogues at different levels has:
Ting.baidu.com/artist/ { 1107-1474 } and ting.baidu.com/album/ { 1425-1474 }
In step s 103, the history value of catalogues at different levels refers to the possibility value of the catalogues at different levels stored before, can be with
The possibility value of catalogues at different levels obtained after the step S101 and S102 of a moment execution present invention is interpreted as, it is of the invention
Method can also update history value with the currency of catalogues at different levels, to be further used as comparing when obtaining new currency at next moment
It is right.
Step S103 specifically includes:
Step S1031:The history value of the chained address scope that the currency of catalogues at different levels is formed and catalogues at different levels is formed
Chained address scope be compared.
Step S1031:Currency structure is not belonging to when comparison result has the chained address scope for belonging to history value composition
Into chained address in the range of url history address when, send access request to the url history address, and in the access
When request can not return to accessible page, using the url history address as failure chained address;Belong to when existing in comparison result
When the chained address scope that currency is formed is not belonging to the current link address in the range of the chained address of history value composition,
Using the current chained address as newly-increased chained address.
On step S1031, Fig. 2 refer to.Fig. 2 is the schematic diagram that chained address scope compares in the present invention.Such as Fig. 2 institutes
Show, region 1 is the intersection for the chained address that the chained address scope that currency is formed is formed with history value, and region 2 is category
In the chained address scope that history value is formed, but the part of the chained address scope of currency composition is not belonging to, region 3 is to belong to
The chained address scope that currency is formed, but the part of the chained address scope of history value composition is not belonging to, wherein in region 3
Chained address is newly-increased chained address, and for the chained address in region 2, modelling customer behavior is grounded by the present invention to the chain
Location sends access request, if 404 or 301 or 503 mistakes occurs in the page that the request returns(Addressable page can not be returned
Face), then illustrate that the chained address has been failed.
In another embodiment, method of the invention step S103 take a step forward including:
Step S104:The parameters combination transmitted identical chained address and each group are counted in the chained address of acquisition
Each parameter occurrence in conjunction, ginseng is received using parameters combination as corresponding with the chained address of acquisition file in website
Count, and the possibility value of parameter can be received according to determining each parameter occurrence in each combination.Here according to each combination
In each parameter occurrence possibility that determines that the method and previously described determination catalogues at different levels of the possibility value of parameter can be received take
The method of value is similar, and narration is not repeated herein.
Also, in this embodiment, further the parameter that receives is compared with history parameters in step S103,
The possibility value for receiving parameter is compared to judge that the file in website whether there is parameter with historical parameter value
Change.
Cross as described previously, what a complete chained address was pointed to is a web page files on website, such as chain
Ground connection location is:
Zhidao.baidu.com/question/227.html, it points to the text text of a html format on website
Part.In some chained addresses can also Transfer Parameters, such as:
zhidao.baidu.com/A=123&b=456, wherein "" after part show to zhidao.baidu.com this
Deliver two parameters " a " and " b " in individual chained address, and parameter value is " 123 " and " 456 " respectively, due to one it is complete
Web page files representing of chained address, therefore zhidao.baidu.com/The link of this form of a=123&b=456,
" a " and " b " two parameters exactly are delivered to the web page files represented with " zhidao.baidu.com " this chained address, and
And parameter value is respectively " 123 " and " 456 ".Step S104 is exactly the data transmitted in the chained address according to acquisition, determines net
The embodiment for receiving parameter and span of the web page files of some determination in standing.
Described parameter combination above, the parameter in a chained address while transmitted is referred to, such as lifted above
Example in, parameter combination is exactly parameter a and parameter b, it should be appreciated that corresponding file receives with the chained address of acquisition
Parameter, the parameter that receives here is also to occur in combination, i.e., if had in statistics " a=123&b=456 " and
" c=234&d=567 ", then the parameter that receives of respective file is exactly a and b occurred simultaneously, and the c and d occurred simultaneously.
In another embodiment, method of the invention can further include after step S101:
S105:The chained address of acquisition and default abnormal Keyword List are subjected to matching checking, and by the chain of matching
The request data corresponding to location is grounded as abnormal access data to provide early warning.
By step S105, the present invention, which can also be accessed user, plays a part of monitoring.
As it should be appreciated by those skilled in the art that the method for the present invention may be faced with the processing of large-scale data, because
This method of the invention can be handled by distributed platform, specifically, can obtain request data in bypass mirror-image system
Afterwards, request data is transferred to Distributed Computing Platform, utilizes the dispersed nodes of the platform(That is map nodes)Realize chained address
Data fractionation action in the action of extraction, and step S1021, further, in addition to the matching checking in step S105
Action.Step S1022, step S103, further include step S104 in action can be in aggregation node(That is reduce is saved
Point)Middle realization.
Fig. 3 is refer to, Fig. 3 is the structural schematic block diagram of the embodiment one of web analytics device in the present invention.Such as Fig. 3 institutes
Show, the embodiment includes:Placement unit 201, determining unit 202 and comparing unit 203.
Wherein placement unit 201, for obtaining more than one chained address from the request data for reaching website.
Determining unit 202, for determining the mesh at different levels under each main domain that the website includes using the chained address of acquisition
Currency of the possibility value of record as catalogues at different levels.
Comparing unit 203, for the chained address scope and the history of catalogues at different levels for forming the currency of catalogues at different levels
The chained address scope that value is formed is compared to judge that the website whether there is newly-increased chained address or failure chained address.
In the embodiment, placement unit 201 grabs request data up to website by bypassing mirror-image system, and please from this
Ask the more than one chained address of extracting data.
Fig. 4 is refer to, Fig. 4 is the structural schematic block diagram of the embodiment of determining unit in the present invention.As shown in figure 4, determine
Unit 202 includes the statistic unit 2022 of split cells 2021 and first.Wherein split cells 2021, for the chain of acquisition to be grounded
Location splits the form for main domain and catalogue at different levels.First statistic unit 2022, for being counted using the chained address in identical main domain
The occurrence in catalogue at the same level under the main domain, and the occurrence in the catalogue at the same level counted on determine catalogue at the same level can
Can value.
Occurrence of wherein the first statistic unit 2022 in the catalogue at the same level counted on determines the possibility of catalogue at the same level
The method of value includes:
A. when the occurrence in the catalogue at the same level counted on is numeric type, the lower limit of the possibility value of this grade of catalogue is determined
For the minimum value in the occurrence in this grade of catalogue counting on, the upper limit of the possibility value of this grade of catalogue is the level counted on
The maximum of occurrence in catalogue.B. when the occurrence in the catalogue at the same level counted on is enumeration type, this grade of catalogue is determined
Possibility value be each occurrence in this grade of catalogue counted on.C. when the occurrence in the catalogue at the same level counted on is word
When according with serial type, the possibility value for determining this grade of catalogue is arbitrary string.
Fig. 5 is refer to, Fig. 5 is the structural schematic block diagram of the embodiment of comparing unit in the present invention.As shown in figure 5, compare
Unit 203 includes scope comparing unit 2031 and link determining unit 2032.Wherein scope comparing unit 2031, for will be at different levels
The chained address scope that the chained address scope that the currency of catalogue is formed is formed with the history value of catalogues at different levels is compared.Chain
Connect determining unit 2032, for when in comparison result exist belong to history value composition chained address scope and be not belonging to currency
During url history address in the range of the chained address of composition, access request is sent to the url history address, and in the visit
When asking that request can not return to accessible page, using the url history address as failure chained address.In addition, link determining unit
2032, it is additionally operable to work as the chain in comparison result existing and belonging to the chained address scope that currency is formed and being not belonging to history value composition
During the current link address being grounded in the range of location, using the current chained address as newly-increased chained address.
Fig. 6 is refer to, Fig. 6 is the structural schematic block diagram of the embodiment two of web analytics device in the present invention.Such as Fig. 6 institutes
Show, embodiment two further comprises the second statistic unit 204 on the basis of embodiment one.Second statistic unit 204 is used
Each parameter in the parameters combination transmitted identical chained address and each combination is counted in the chained address in acquisition to occur
Value, parameter is received using parameters combination as corresponding with the chained address of acquisition file in the website, and according to every
Each parameter occurrence determines that the possibility value of parameter can be received in individual combination, also, in embodiment two, comparing unit 203 enters
One step is compared for that can receive parameter with history parameters, and the possibility value and historical parameter value that will can receive parameter are carried out
Compare to judge that the file in the website whether there is parameter modification.
Fig. 7 is refer to, Fig. 7 is the structural schematic block diagram of the embodiment three of web analytics device in the present invention.Such as Fig. 7 institutes
Show, embodiment three further comprises detection unit 205 on the basis of embodiment one.Detection unit 205 is used to obtain
Chained address and default Keyword List carry out matching checking, and the request data corresponding to the chained address of matching is made
It is abnormal access data to provide early warning.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
God any modification, equivalent substitution and improvements done etc., should be included within the scope of protection of the invention with principle.
Claims (14)
1. a kind of automatic method for carrying out web analytics, including:
A. more than one chained address is obtained from the request data for reaching website;
B. determined by the use of the chained address of acquisition the possibility values of the catalogues at different levels under each main domain that the website includes as
The currency of catalogues at different levels;
C. the chained address model history value of the chained address scope that the currency of catalogues at different levels is formed and catalogues at different levels formed
Enclose and be compared to judge that the website whether there is newly-increased chained address or failure chained address.
2. according to the method for claim 1, it is characterised in that reached in the step A by bypassing mirror-image system and grabbing
The request data of website, and more than one chained address is extracted from the request data.
3. according to the method for claim 1, it is characterised in that the step B includes:
B1., the chained address of acquisition is split to the form for main domain and catalogue at different levels;
B2. the occurrence in the catalogue at the same level under the main domain is counted using the chained address in identical main domain, and according to counting on
Occurrence in catalogue at the same level determines the possibility value of catalogue at the same level.
4. according to the method for claim 3, it is characterised in that the occurrence in the catalogue at the same level counted on determines same
The step of possibility value of level catalogue, includes:
When the occurrence in the catalogue at the same level counted on is numeric type, the lower limit for determining the possibility value of this grade of catalogue is statistics
To occurrence in minimum value, the upper limit of the possibility value of this grade of catalogue is the maximum in the occurrence counted on;
When the occurrence in the catalogue at the same level counted on is enumeration type, the possibility value that determines this grade of catalogue be count on it is every
Individual occurrence;
When the occurrence in the catalogue at the same level counted on is character string type, the possibility value for determining this grade of catalogue is any character
String.
5. according to the method for claim 1, it is characterised in that the step C includes:
The chained address scope that the history value of the chained address scope that the currency of catalogues at different levels is formed and catalogues at different levels is formed
It is compared;
When in comparison result exist belong to history value composition chained address scope and be not belonging to currency composition chained address
In the range of url history address when, send access request to the url history address, and can not in the access request
When returning to accessible page, using the url history address as failure chained address;
When in comparison result exist belong to currency composition chained address scope and be not belonging to history value composition chained address
In the range of current link address when, using the current chained address as newly-increased chained address.
6. according to the method for claim 1, it is characterised in that methods described the step C take a step forward including:Obtaining
Each parameter occurrence in the parameters combination transmitted identical chained address and each combination is counted in the chained address taken, will be each
Individual parameter combination receives parameter as corresponding with the chained address of acquisition file in the website, and according to each combination
In each parameter occurrence determine described in can receive the possibility value of parameter;Also, methods described further will in the step C
The parameter that receives is compared with history parameters, and the possibility value for receiving parameter is compared with historical parameter value
To judge that the file in the website whether there is parameter modification.
7. according to the method for claim 1, it is characterised in that methods described further comprises after step A:
The chained address of acquisition and default abnormal Keyword List are subjected to matching checking, and the chained address institute of matching is right
The request data answered is as abnormal access data to provide early warning.
8. a kind of automatic device for carrying out web analytics, including:
Placement unit, for obtaining more than one chained address from the request data for reaching website;
Determining unit, can for determine the catalogues at different levels under each main domain that the website includes using the chained address of acquisition
Can currency of the value as catalogues at different levels;
Comparing unit, for form the chained address scope of currency composition of catalogues at different levels with the history value of catalogues at different levels
Chained address scope is compared to judge that the website whether there is newly-increased chained address or failure chained address.
9. device according to claim 8, it is characterised in that the placement unit is reached by bypassing mirror-image system and grabbing
The request data of website, and more than one chained address is extracted from the request data.
10. device according to claim 8, it is characterised in that the determining unit includes:
Split cells, for the chained address of acquisition to be split to the form for main domain and catalogue at different levels;
First statistic unit, for counting the occurrence in the catalogue at the same level under the main domain using the chained address in identical main domain,
And the occurrence in the catalogue at the same level counted on determines the possibility value of catalogue at the same level.
11. device according to claim 10, it is characterised in that first statistic unit is according to the mesh at the same level counted on
Occurrence in record determines that the mode of the possibility value of catalogue at the same level includes:
When the occurrence in the catalogue at the same level counted on is numeric type, the lower limit for determining the possibility value of this grade of catalogue is statistics
To this grade of catalogue in occurrence in minimum value, the upper limit of the possibility value of this grade of catalogue is in this grade of catalogue counted on
Occurrence maximum;
When the occurrence in the catalogue at the same level counted on is enumeration type, the possibility value that determines this grade of catalogue be count on should
Each occurrence in level catalogue;
When the occurrence in the catalogue at the same level counted on is character string type, the possibility value for determining this grade of catalogue is any character
String.
12. device according to claim 8, it is characterised in that the comparing unit includes:
Scope comparing unit, for the chained address scope and the history value structure of catalogues at different levels for forming the currency of catalogues at different levels
Into chained address scope be compared;
Determining unit is linked, for being not belonging to current when the chained address scope for belonging to history value composition in comparison result being present
When being worth the url history address in the range of the chained address formed, access request is sent to the url history address, and
When the access request can not return to accessible page, using the url history address as failure chained address;
The link determining unit, which is additionally operable to work as, has the chained address scope for belonging to currency composition without belonging in comparison result
In history value form chained address in the range of current link address when, be grounded the current chained address as newly-increased chain
Location.
13. device according to claim 8, it is characterised in that described device further comprises the second statistic unit, is used for
Each parameter occurrence in the parameters combination transmitted identical chained address and each combination is counted in the chained address of acquisition,
Parameter is received using parameters combination as corresponding with the chained address of acquisition file in the website, and according to each
The possibility value of parameter can be received in combination described in each parameter occurrence determination;Also,
The comparing unit is further used for the parameter that receives being compared with history parameters, and parameter can be received by described in
Possibility value be compared with historical parameter value to judge that the file in the website whether there is parameter modification.
14. device according to claim 8, it is characterised in that described device further comprises:
Detection unit, for the chained address of acquisition and default abnormal Keyword List to be carried out into matching checking, and will matching
Chained address corresponding to request data as abnormal access data to provide early warning.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210232731.XA CN103530297B (en) | 2012-07-05 | 2012-07-05 | A kind of automatic method and device for carrying out web analytics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210232731.XA CN103530297B (en) | 2012-07-05 | 2012-07-05 | A kind of automatic method and device for carrying out web analytics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103530297A CN103530297A (en) | 2014-01-22 |
CN103530297B true CN103530297B (en) | 2018-02-02 |
Family
ID=49932319
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210232731.XA Active CN103530297B (en) | 2012-07-05 | 2012-07-05 | A kind of automatic method and device for carrying out web analytics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103530297B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103823883B (en) * | 2014-03-06 | 2015-06-10 | 焦点科技股份有限公司 | Analysis method and system for website user access path |
CN106844389B (en) * | 2015-12-07 | 2021-05-04 | 阿里巴巴集团控股有限公司 | Method and device for processing URL (Uniform resource locator) |
CN106992981B (en) * | 2017-03-31 | 2020-04-07 | 北京知道创宇信息技术股份有限公司 | Website backdoor detection method and device and computing equipment |
CN110347955B (en) * | 2019-05-30 | 2023-03-03 | 华为云计算技术有限公司 | Resource detection method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102467633A (en) * | 2010-11-19 | 2012-05-23 | 奇智软件(北京)有限公司 | Method and system for safely browsing webpage |
CN102521295A (en) * | 2011-11-30 | 2012-06-27 | 深圳市五巨科技有限公司 | Method and device for automatically acquiring content updating on designated page |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101510217B (en) * | 2009-03-09 | 2013-06-05 | 阿里巴巴集团控股有限公司 | Image updating method in image database, server and system |
-
2012
- 2012-07-05 CN CN201210232731.XA patent/CN103530297B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102467633A (en) * | 2010-11-19 | 2012-05-23 | 奇智软件(北京)有限公司 | Method and system for safely browsing webpage |
CN102521295A (en) * | 2011-11-30 | 2012-06-27 | 深圳市五巨科技有限公司 | Method and device for automatically acquiring content updating on designated page |
Also Published As
Publication number | Publication date |
---|---|
CN103530297A (en) | 2014-01-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109145934B (en) | User behavior data processing method, medium, equipment and device based on log | |
CN105812177B (en) | A kind of network failure processing method and processing equipment | |
CN103299304B (en) | Classifying rules generating means and classifying rules generate method | |
CN107819783A (en) | A kind of network security detection method and system based on threat information | |
CN106844132A (en) | The fault repairing method and device of cluster server | |
CN103530297B (en) | A kind of automatic method and device for carrying out web analytics | |
CN107094208A (en) | Worksheet method and device | |
EP2988230A1 (en) | Data processing method and computer system | |
CN105227405B (en) | monitoring method and system | |
CN106202569A (en) | A kind of cleaning method based on big data quantity | |
CN106815254A (en) | A kind of data processing method and device | |
CN105701097A (en) | Social-network-platform-based public opinion analysis method and system | |
CN103631967B (en) | A kind of processing method and processing device of the tables of data with independent increment identification field | |
CN109669795A (en) | Crash info processing method and processing device | |
CN111131304A (en) | Cloud platform-oriented large-scale virtual machine fine-grained abnormal behavior detection method and system | |
CN106856439A (en) | The method and server of a kind of scheme test | |
CN105872127B (en) | A kind of IP address management system | |
CN106209920A (en) | The safety protecting method of a kind of dns server and device | |
WO2022142013A1 (en) | Artificial intelligence-based ab testing method and apparatus, computer device and medium | |
Pan et al. | Resilience of and recovery strategies for weighted networks | |
CN107222511A (en) | Detection method and device, computer installation and the readable storage medium storing program for executing of Malware | |
CN103795592B (en) | Online water navy detection method and device | |
CN107070645A (en) | Compare the method and system of the data of tables of data | |
Tang et al. | Community structure detection based on the neighbor node degree information | |
CN105468699B (en) | Duplicate removal data statistical approach and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |