CN103530297B - A kind of automatic method and device for carrying out web analytics - Google Patents

A kind of automatic method and device for carrying out web analytics Download PDF

Info

Publication number
CN103530297B
CN103530297B CN201210232731.XA CN201210232731A CN103530297B CN 103530297 B CN103530297 B CN 103530297B CN 201210232731 A CN201210232731 A CN 201210232731A CN 103530297 B CN103530297 B CN 103530297B
Authority
CN
China
Prior art keywords
chained address
catalogue
occurrence
value
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210232731.XA
Other languages
Chinese (zh)
Other versions
CN103530297A (en
Inventor
石靖岚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201210232731.XA priority Critical patent/CN103530297B/en
Publication of CN103530297A publication Critical patent/CN103530297A/en
Application granted granted Critical
Publication of CN103530297B publication Critical patent/CN103530297B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a kind of automatic method and device for carrying out web analytics, wherein carrying out the method for web analytics automatically includes:A. more than one chained address is obtained from the request data for reaching website;B. determine the possibility value of the catalogues at different levels under each main domain that the website includes as the currency of catalogues at different levels by the use of the chained address of acquisition;C. the chained address scope that the currency of catalogues at different levels is formed is compared with the chained address scope that the history value of catalogues at different levels is formed to judge that the website whether there is newly-increased chained address or failure chained address.By the above-mentioned means, easily can be monitored to website, the efficiency of website operation is improved.

Description

A kind of automatic method and device for carrying out web analytics
【Technical field】
The present invention relates to data processing technique, more particularly to a kind of automatic method and device for carrying out web analytics.
【Background technology】
Website provides a user service, is realized by various web page files, and these web page files are in net It is to carry out tissue by various catalogues on standing.During providing service in website, as the attendant of website, very may be used It can need to modify to the web page files for providing service in website, including web page files are modified in itself, or to logical To the path of web page files(Lead to each catalogue of web page files)Modify, this cause website operation a period of time with Afterwards, structure may change a lot.In the case of in the past small in website, can also by artificial mode come Understand the change that website structure occurs, to understand the situation of website in time, monitor the operation of website.But with Internet technology Development, website provide service it is more and more, scale is also increasing, and this causes means only manually, is difficult to spy upon The overall picture of whole website, therefore the unavoidable difficulty for causing web publishing, cause the efficiency that website is run to reduce.
【The content of the invention】
The technical problems to be solved by the invention are to provide a kind of automatic method and device for carrying out web analytics, to improve The efficiency of website operation.
The present invention is to provide a kind of automatic method for carrying out web analytics to solve the technical scheme that technical problem uses, Including:A. more than one chained address is obtained from the request data for reaching website;B. determined using the chained address obtained Currency of the possibility value of catalogues at different levels under each main domain that the website includes as catalogues at different levels;C. by catalogues at different levels The chained address scope that forms of the history value of chained address scope and catalogues at different levels that forms of currency be compared with judgement The website whether there is newly-increased chained address or failure chained address.
According to one of present invention preferred embodiment, asking up to website is grabbed by bypassing mirror-image system in the step A Data are sought, and more than one chained address is extracted from the request data.
According to one of present invention preferred embodiment, the step B includes:B1. it is main domain the chained address of acquisition to be split With the form of catalogues at different levels;B2. the occurrence in the catalogue at the same level under the main domain is counted using the chained address in identical main domain, and Occurrence in the catalogue at the same level counted on determines the possibility value of catalogue at the same level.
According to one of present invention preferred embodiment, the occurrence in the catalogue at the same level counted on determines catalogue at the same level The step of possible value, includes:When the occurrence in the catalogue at the same level counted on is numeric type, the possibility of this grade of catalogue is determined The lower limit of value is the minimum value in the occurrence counted on, and the upper limit of the possibility value of this grade of catalogue is the occurrence counted on In maximum;When the occurrence in the catalogue at the same level counted on is enumeration type, the possibility value for determining this grade of catalogue is system The each occurrence counted;When the occurrence in the catalogue at the same level counted on is character string type, the possibility of this grade of catalogue is determined Value is arbitrary string.
According to one of present invention preferred embodiment, the step C includes:The chain that the currency of catalogues at different levels is formed is grounded The chained address scope that location scope is formed with the history value of catalogues at different levels is compared;Belong to history value when existing in comparison result The chained address scope of composition and be not belonging to currency composition chained address in the range of url history address when, gone through to described History chained address sends access request, and when the access request can not return to accessible page, by the url history Location is as failure chained address;When in comparison result exist belong to currency composition chained address scope and be not belonging to history value During current link address in the range of the chained address of composition, using the current chained address as newly-increased chained address.
According to one of present invention preferred embodiment, methods described the step C take a step forward including:In the link of acquisition Each parameter occurrence in the parameters combination transmitted identical chained address and each combination is counted in address, by parameters group Cooperate to receive parameter for file corresponding with the chained address of acquisition in the website, and according to each parameter in each combination The possibility value of parameter can be received described in occurrence determination;Also, methods described further connects in the step C by described Receive parameter to be compared with history parameters, the possibility value for receiving parameter is compared to judge with historical parameter value File in the website whether there is parameter modification.
According to one of present invention preferred embodiment, methods described further comprises after step A:By the chained address of acquisition Matching checking is carried out with default abnormal Keyword List, and using the request data corresponding to the chained address of matching as abnormal Data are accessed to provide early warning.
Present invention also offers a kind of automatic device for carrying out web analytics, including:Placement unit, for from reach website Request data in obtain more than one chained address;Determining unit, for determining the net using the chained address of acquisition Currency of the possibility value of catalogues at different levels under each main domain that station includes as catalogues at different levels;Comparing unit, for by respectively The chained address scope that the chained address scope that the currency of level catalogue is formed is formed with the history value of catalogues at different levels is compared To judge that the website whether there is newly-increased chained address or failure chained address.
According to one of present invention preferred embodiment, the placement unit grabs asking up to website by bypassing mirror-image system Data are sought, and more than one chained address is extracted from the request data.
According to one of present invention preferred embodiment, the determining unit includes:Split cells, for the chain of acquisition to be grounded Location splits the form for main domain and catalogue at different levels;First statistic unit, for counting the master using the chained address in identical main domain The occurrence in catalogue at the same level under domain, and the occurrence in the catalogue at the same level counted on determines that the possibility of catalogue at the same level takes Value.
According to one of present invention preferred embodiment, appearance of first statistic unit in the catalogue at the same level counted on Value determines that the mode of the possibility value of catalogue at the same level includes:When the occurrence in the catalogue at the same level counted on is numeric type, really The lower limit of the possibility value of fixed this grade of catalogue is the minimum value in the occurrence in this grade of catalogue counted on, this grade of catalogue can The upper limit of energy value is the maximum of the occurrence in this grade of catalogue counted on;Occurrence in the catalogue at the same level counted on For enumeration type when, the possibility value for determining this grade of catalogue is each occurrence in this grade of catalogue counted on;When what is counted on When occurrence in catalogue at the same level is character string type, the possibility value for determining this grade of catalogue is arbitrary string.
According to one of present invention preferred embodiment, the comparing unit includes:Scope comparing unit, for by catalogues at different levels The chained address scope of the history value compositions of chained address scope and catalogues at different levels that forms of currency be compared;Link is true Order member, for when in comparison result exist belong to history value composition chained address scope and be not belonging to currency composition chain During the url history address being grounded in the range of location, access request is sent to the url history address, and please in the access Ask when can not return to accessible page, using the url history address as failure chained address;The link determining unit is also used In when in comparison result exist belong to currency composition chained address scope and be not belonging to history value composition chained address model When enclosing interior current link address, using the current chained address as newly-increased chained address.
According to one of present invention preferred embodiment, described device further comprises the second statistic unit, in acquisition Each parameter occurrence in the parameters combination transmitted identical chained address and each combination is counted in chained address, by each ginseng Array cooperation is that file corresponding with the chained address of acquisition receives parameter in the website, and according to each in each combination The possibility value of parameter can be received described in the determination of parameter occurrence;Also, the comparing unit is further used for connecing described Receive parameter to be compared with history parameters, the possibility value for receiving parameter is compared to judge with historical parameter value File in the website whether there is parameter modification.
According to one of present invention preferred embodiment, described device further comprises:Detection unit, for by the link of acquisition Address and default abnormal Keyword List carry out matching checking, and using the request data corresponding to the chained address of matching as Abnormal access data are to provide early warning.
As can be seen from the above technical solutions, the request data that the present invention is sent by using user to website, can be right Each chained address that the website provides service is effectively counted, and so as to be fully understood from the current structure of the website, and is led to Cross compared with the 'historical structure of website, the various changes of website generation can be understood in time, website is supervised with facilitating Control, so as to improve the efficiency of website operation.
【Brief description of the drawings】
Fig. 1 is the schematic flow sheet of the automatic method for carrying out web analytics in the present invention;
Fig. 2 is the schematic diagram that chained address scope compares in the present invention;
Fig. 3 is the structural schematic block diagram of the embodiment one of the automatic device for carrying out web analytics in the present invention;
Fig. 4 is the structural schematic block diagram of the embodiment of determining unit in the present invention;
Fig. 5 is the structural schematic block diagram of the embodiment of comparing unit in the present invention;
Fig. 6 is the structural schematic block diagram of the embodiment two of the automatic device for carrying out web analytics in the present invention;
Fig. 7 is the structural schematic block diagram of the embodiment three of the automatic device for carrying out web analytics in the present invention.
【Embodiment】
In order that the object, technical solutions and advantages of the present invention are clearer, below in conjunction with the accompanying drawings with specific embodiment pair The present invention is described in detail.
It refer to Fig. 1.Fig. 1 is the schematic flow sheet of the automatic method for carrying out web analytics in the present invention.As shown in figure 1, This method includes:
Step S101:More than one chained address is obtained from the request data for reaching website.
Step S102:The possibility of the catalogues at different levels under each main domain that the website includes is determined using the chained address of acquisition Currency of the value as catalogues at different levels.
Step S103:The history value of the chained address scope that the currency of catalogues at different levels is formed and catalogues at different levels is formed Chained address scope is compared to judge that the website whether there is newly-increased chained address or failure chained address.
Above-mentioned steps are specifically described below.
In step S101, request data up to website is grabbed by bypassing mirror-image system, and extracted from request data Chained address.The effect of bypass mirror-image system is that the former request data for reaching website is copied as into a new data, so former Request data may proceed to original interbehavior, and the new data replicated can be used as other processing, in the present invention, be exactly Follow-up processing is carried out using the data for bypassing mirror-image system duplication.It is related to communications protocol that some are contained in request data Information, further comprises asked page link address, these chained addresses can be extracted from request data in step S101. It is appreciated that for a website, request of data is all concurrent, therefore as a rule, can be obtained in step S101 To substantial amounts of request data and extract many chained addresses.
Chained address has a hierarchical relationship, the main domain for being to provide service that the chopped-off head of chained address generally represents, Catalogues at different levels are followed successively by after main domain, when a chained address reaches afterbody catalogue, have actually just corresponded to net A web page files on standing.Such as a complete chained address is:Juli.baidu.com/zhuanli/jiagou shape Formula, wherein "/" are the separator of chained address, if a complete chained address has been divided into stem portion by the separator, wherein Part I " juli.baidu.com " represents main domain, and " zhuanli " and " jiagou " below is followed successively by the first order and second Level catalogue.
Specifically, step S102 includes:
Step S1021:The chained address of acquisition is split to the form for main domain and catalogue at different levels;
Step S1022:The occurrence in the catalogue at the same level under the main domain, and root are counted using the chained address in identical main domain According to statistics to catalogue at the same level in occurrence determine the possibility value of catalogue at the same level.
It is appreciated that to utilize the separator in chained address can be each link in step S1021 according to narration above Address dividing is the form of main domain and catalogue at different levels.
Step S1022 is specifically included:
Step S1022_1:Chained address after step S1021 processing is classified according to main domain, by identical main domain Chained address is divided into one kind.
Step S1022_2:For the chained address in identical main domain, the appearance in catalogue at the same level in these chained addresses is counted Value.
Step S1022_3:Occurrence in the catalogue at the same level counted on determines the possibility value of catalogue at the same level.
These chained addresses below such as:
“ting.baidu.com/artist/1157”、“ting.baidu.com/artist/1107”、
“ting.baidu.com/artist/1130”、“ting.baidu.com/album/1474”、
“ting.baidu.com/album/1430”、“ting.baidu.com/album/1425”、
“zhidao.baidu.com/team/74”、“zhidao.baidu.com/team/80”、
“zhidao.baidu.com/team/65”、“zhidao.baidu.com/team/60”
In chained address above, there is two different main domains, be respectively " ting.baidu.com " and “zhidao.baidu.com”.To " ting.baidu.com " this main domain, above-mentioned chained address is counted, obtains first order catalogue In occurrence have " artist " and " album ", for first order catalogue " artist ", the occurrence in the catalogue of the second level has " 1157 ", " 1107 ", " 1130 ", for first order catalogue " album ", the occurrence in the catalogue of the second level have " 1474 ", “1430”、“1425”.To " zhidao.baidu.com " this main domain, above-mentioned chained address is counted, is obtained in first order catalogue Occurrence have " team ", for first order catalogue " team ", the occurrence in the catalogue of the second level have " 74 ", " 80 ", " 65 ", “60”。
Occurrence in step S1022_3 in the catalogue at the same level counted on determines the possibility value tool of catalogue at the same level Body includes:
A. when the occurrence in the catalogue at the same level counted on is numeric type, the lower limit of the possibility value of this grade of catalogue is determined For the minimum value in the occurrence that counts on, the upper limit of the possibility value of this grade of catalogue is the maximum in the occurrence counted on Value.
B. when the occurrence in the catalogue at the same level counted on is enumeration type, the possibility value for determining this grade of catalogue is statistics The each occurrence arrived.
C. when the occurrence in the catalogue at the same level counted on is character string type, the possibility value for determining this grade of catalogue is to appoint Ideographic characters string.
Determine which kind of type is the occurrence in certain grade of catalogue be, including and be not limited to following strategy:
When occurrence is the numeral that distribution is more than preset value, it is numeric type to determine the occurrence in this grade of catalogue;
When the number in the set of occurrence is no more than preset value, it is enumeration type to determine the occurrence in this grade of catalogue;
When occurrence is character and when being not belonging to enumeration type, it is character string type to determine the occurrence in this grade of catalogue, it should Understand that character here can be letter, or, letter and number combinatorics on words.
By step S101 and S102, the data that can be captured according to bypass mirror-image system, the at different levels of website are determined The possibility value of catalogue, using the possibility value of catalogues at different levels as currency, then it can be determined by the currency of catalogues at different levels The scope of one chained address, such as the mode according to the possibility value that catalogues at different levels are determined above, for what is got below Chained address:
“ting.baidu.com/artist/1157”、“ting.baidu.com/artist/1107”、
“ting.baidu.com/artist/1130”、“ting.baidu.com/album/1474”、
“ting.baidu.com/album/1430”、“ting.baidu.com/album/1425”
The chained address scope that can be determined by the currency of catalogues at different levels has:
Ting.baidu.com/artist/ { 1107-1474 } and ting.baidu.com/album/ { 1425-1474 }
In step s 103, the history value of catalogues at different levels refers to the possibility value of the catalogues at different levels stored before, can be with The possibility value of catalogues at different levels obtained after the step S101 and S102 of a moment execution present invention is interpreted as, it is of the invention Method can also update history value with the currency of catalogues at different levels, to be further used as comparing when obtaining new currency at next moment It is right.
Step S103 specifically includes:
Step S1031:The history value of the chained address scope that the currency of catalogues at different levels is formed and catalogues at different levels is formed Chained address scope be compared.
Step S1031:Currency structure is not belonging to when comparison result has the chained address scope for belonging to history value composition Into chained address in the range of url history address when, send access request to the url history address, and in the access When request can not return to accessible page, using the url history address as failure chained address;Belong to when existing in comparison result When the chained address scope that currency is formed is not belonging to the current link address in the range of the chained address of history value composition, Using the current chained address as newly-increased chained address.
On step S1031, Fig. 2 refer to.Fig. 2 is the schematic diagram that chained address scope compares in the present invention.Such as Fig. 2 institutes Show, region 1 is the intersection for the chained address that the chained address scope that currency is formed is formed with history value, and region 2 is category In the chained address scope that history value is formed, but the part of the chained address scope of currency composition is not belonging to, region 3 is to belong to The chained address scope that currency is formed, but the part of the chained address scope of history value composition is not belonging to, wherein in region 3 Chained address is newly-increased chained address, and for the chained address in region 2, modelling customer behavior is grounded by the present invention to the chain Location sends access request, if 404 or 301 or 503 mistakes occurs in the page that the request returns(Addressable page can not be returned Face), then illustrate that the chained address has been failed.
In another embodiment, method of the invention step S103 take a step forward including:
Step S104:The parameters combination transmitted identical chained address and each group are counted in the chained address of acquisition Each parameter occurrence in conjunction, ginseng is received using parameters combination as corresponding with the chained address of acquisition file in website Count, and the possibility value of parameter can be received according to determining each parameter occurrence in each combination.Here according to each combination In each parameter occurrence possibility that determines that the method and previously described determination catalogues at different levels of the possibility value of parameter can be received take The method of value is similar, and narration is not repeated herein.
Also, in this embodiment, further the parameter that receives is compared with history parameters in step S103, The possibility value for receiving parameter is compared to judge that the file in website whether there is parameter with historical parameter value Change.
Cross as described previously, what a complete chained address was pointed to is a web page files on website, such as chain Ground connection location is:
Zhidao.baidu.com/question/227.html, it points to the text text of a html format on website Part.In some chained addresses can also Transfer Parameters, such as:
zhidao.baidu.com/A=123&b=456, wherein "" after part show to zhidao.baidu.com this Deliver two parameters " a " and " b " in individual chained address, and parameter value is " 123 " and " 456 " respectively, due to one it is complete Web page files representing of chained address, therefore zhidao.baidu.com/The link of this form of a=123&b=456, " a " and " b " two parameters exactly are delivered to the web page files represented with " zhidao.baidu.com " this chained address, and And parameter value is respectively " 123 " and " 456 ".Step S104 is exactly the data transmitted in the chained address according to acquisition, determines net The embodiment for receiving parameter and span of the web page files of some determination in standing.
Described parameter combination above, the parameter in a chained address while transmitted is referred to, such as lifted above Example in, parameter combination is exactly parameter a and parameter b, it should be appreciated that corresponding file receives with the chained address of acquisition Parameter, the parameter that receives here is also to occur in combination, i.e., if had in statistics " a=123&b=456 " and " c=234&d=567 ", then the parameter that receives of respective file is exactly a and b occurred simultaneously, and the c and d occurred simultaneously.
In another embodiment, method of the invention can further include after step S101:
S105:The chained address of acquisition and default abnormal Keyword List are subjected to matching checking, and by the chain of matching The request data corresponding to location is grounded as abnormal access data to provide early warning.
By step S105, the present invention, which can also be accessed user, plays a part of monitoring.
As it should be appreciated by those skilled in the art that the method for the present invention may be faced with the processing of large-scale data, because This method of the invention can be handled by distributed platform, specifically, can obtain request data in bypass mirror-image system Afterwards, request data is transferred to Distributed Computing Platform, utilizes the dispersed nodes of the platform(That is map nodes)Realize chained address Data fractionation action in the action of extraction, and step S1021, further, in addition to the matching checking in step S105 Action.Step S1022, step S103, further include step S104 in action can be in aggregation node(That is reduce is saved Point)Middle realization.
Fig. 3 is refer to, Fig. 3 is the structural schematic block diagram of the embodiment one of web analytics device in the present invention.Such as Fig. 3 institutes Show, the embodiment includes:Placement unit 201, determining unit 202 and comparing unit 203.
Wherein placement unit 201, for obtaining more than one chained address from the request data for reaching website.
Determining unit 202, for determining the mesh at different levels under each main domain that the website includes using the chained address of acquisition Currency of the possibility value of record as catalogues at different levels.
Comparing unit 203, for the chained address scope and the history of catalogues at different levels for forming the currency of catalogues at different levels The chained address scope that value is formed is compared to judge that the website whether there is newly-increased chained address or failure chained address.
In the embodiment, placement unit 201 grabs request data up to website by bypassing mirror-image system, and please from this Ask the more than one chained address of extracting data.
Fig. 4 is refer to, Fig. 4 is the structural schematic block diagram of the embodiment of determining unit in the present invention.As shown in figure 4, determine Unit 202 includes the statistic unit 2022 of split cells 2021 and first.Wherein split cells 2021, for the chain of acquisition to be grounded Location splits the form for main domain and catalogue at different levels.First statistic unit 2022, for being counted using the chained address in identical main domain The occurrence in catalogue at the same level under the main domain, and the occurrence in the catalogue at the same level counted on determine catalogue at the same level can Can value.
Occurrence of wherein the first statistic unit 2022 in the catalogue at the same level counted on determines the possibility of catalogue at the same level The method of value includes:
A. when the occurrence in the catalogue at the same level counted on is numeric type, the lower limit of the possibility value of this grade of catalogue is determined For the minimum value in the occurrence in this grade of catalogue counting on, the upper limit of the possibility value of this grade of catalogue is the level counted on The maximum of occurrence in catalogue.B. when the occurrence in the catalogue at the same level counted on is enumeration type, this grade of catalogue is determined Possibility value be each occurrence in this grade of catalogue counted on.C. when the occurrence in the catalogue at the same level counted on is word When according with serial type, the possibility value for determining this grade of catalogue is arbitrary string.
Fig. 5 is refer to, Fig. 5 is the structural schematic block diagram of the embodiment of comparing unit in the present invention.As shown in figure 5, compare Unit 203 includes scope comparing unit 2031 and link determining unit 2032.Wherein scope comparing unit 2031, for will be at different levels The chained address scope that the chained address scope that the currency of catalogue is formed is formed with the history value of catalogues at different levels is compared.Chain Connect determining unit 2032, for when in comparison result exist belong to history value composition chained address scope and be not belonging to currency During url history address in the range of the chained address of composition, access request is sent to the url history address, and in the visit When asking that request can not return to accessible page, using the url history address as failure chained address.In addition, link determining unit 2032, it is additionally operable to work as the chain in comparison result existing and belonging to the chained address scope that currency is formed and being not belonging to history value composition During the current link address being grounded in the range of location, using the current chained address as newly-increased chained address.
Fig. 6 is refer to, Fig. 6 is the structural schematic block diagram of the embodiment two of web analytics device in the present invention.Such as Fig. 6 institutes Show, embodiment two further comprises the second statistic unit 204 on the basis of embodiment one.Second statistic unit 204 is used Each parameter in the parameters combination transmitted identical chained address and each combination is counted in the chained address in acquisition to occur Value, parameter is received using parameters combination as corresponding with the chained address of acquisition file in the website, and according to every Each parameter occurrence determines that the possibility value of parameter can be received in individual combination, also, in embodiment two, comparing unit 203 enters One step is compared for that can receive parameter with history parameters, and the possibility value and historical parameter value that will can receive parameter are carried out Compare to judge that the file in the website whether there is parameter modification.
Fig. 7 is refer to, Fig. 7 is the structural schematic block diagram of the embodiment three of web analytics device in the present invention.Such as Fig. 7 institutes Show, embodiment three further comprises detection unit 205 on the basis of embodiment one.Detection unit 205 is used to obtain Chained address and default Keyword List carry out matching checking, and the request data corresponding to the chained address of matching is made It is abnormal access data to provide early warning.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention God any modification, equivalent substitution and improvements done etc., should be included within the scope of protection of the invention with principle.

Claims (14)

1. a kind of automatic method for carrying out web analytics, including:
A. more than one chained address is obtained from the request data for reaching website;
B. determined by the use of the chained address of acquisition the possibility values of the catalogues at different levels under each main domain that the website includes as The currency of catalogues at different levels;
C. the chained address model history value of the chained address scope that the currency of catalogues at different levels is formed and catalogues at different levels formed Enclose and be compared to judge that the website whether there is newly-increased chained address or failure chained address.
2. according to the method for claim 1, it is characterised in that reached in the step A by bypassing mirror-image system and grabbing The request data of website, and more than one chained address is extracted from the request data.
3. according to the method for claim 1, it is characterised in that the step B includes:
B1., the chained address of acquisition is split to the form for main domain and catalogue at different levels;
B2. the occurrence in the catalogue at the same level under the main domain is counted using the chained address in identical main domain, and according to counting on Occurrence in catalogue at the same level determines the possibility value of catalogue at the same level.
4. according to the method for claim 3, it is characterised in that the occurrence in the catalogue at the same level counted on determines same The step of possibility value of level catalogue, includes:
When the occurrence in the catalogue at the same level counted on is numeric type, the lower limit for determining the possibility value of this grade of catalogue is statistics To occurrence in minimum value, the upper limit of the possibility value of this grade of catalogue is the maximum in the occurrence counted on;
When the occurrence in the catalogue at the same level counted on is enumeration type, the possibility value that determines this grade of catalogue be count on it is every Individual occurrence;
When the occurrence in the catalogue at the same level counted on is character string type, the possibility value for determining this grade of catalogue is any character String.
5. according to the method for claim 1, it is characterised in that the step C includes:
The chained address scope that the history value of the chained address scope that the currency of catalogues at different levels is formed and catalogues at different levels is formed It is compared;
When in comparison result exist belong to history value composition chained address scope and be not belonging to currency composition chained address In the range of url history address when, send access request to the url history address, and can not in the access request When returning to accessible page, using the url history address as failure chained address;
When in comparison result exist belong to currency composition chained address scope and be not belonging to history value composition chained address In the range of current link address when, using the current chained address as newly-increased chained address.
6. according to the method for claim 1, it is characterised in that methods described the step C take a step forward including:Obtaining Each parameter occurrence in the parameters combination transmitted identical chained address and each combination is counted in the chained address taken, will be each Individual parameter combination receives parameter as corresponding with the chained address of acquisition file in the website, and according to each combination In each parameter occurrence determine described in can receive the possibility value of parameter;Also, methods described further will in the step C The parameter that receives is compared with history parameters, and the possibility value for receiving parameter is compared with historical parameter value To judge that the file in the website whether there is parameter modification.
7. according to the method for claim 1, it is characterised in that methods described further comprises after step A:
The chained address of acquisition and default abnormal Keyword List are subjected to matching checking, and the chained address institute of matching is right The request data answered is as abnormal access data to provide early warning.
8. a kind of automatic device for carrying out web analytics, including:
Placement unit, for obtaining more than one chained address from the request data for reaching website;
Determining unit, can for determine the catalogues at different levels under each main domain that the website includes using the chained address of acquisition Can currency of the value as catalogues at different levels;
Comparing unit, for form the chained address scope of currency composition of catalogues at different levels with the history value of catalogues at different levels Chained address scope is compared to judge that the website whether there is newly-increased chained address or failure chained address.
9. device according to claim 8, it is characterised in that the placement unit is reached by bypassing mirror-image system and grabbing The request data of website, and more than one chained address is extracted from the request data.
10. device according to claim 8, it is characterised in that the determining unit includes:
Split cells, for the chained address of acquisition to be split to the form for main domain and catalogue at different levels;
First statistic unit, for counting the occurrence in the catalogue at the same level under the main domain using the chained address in identical main domain, And the occurrence in the catalogue at the same level counted on determines the possibility value of catalogue at the same level.
11. device according to claim 10, it is characterised in that first statistic unit is according to the mesh at the same level counted on Occurrence in record determines that the mode of the possibility value of catalogue at the same level includes:
When the occurrence in the catalogue at the same level counted on is numeric type, the lower limit for determining the possibility value of this grade of catalogue is statistics To this grade of catalogue in occurrence in minimum value, the upper limit of the possibility value of this grade of catalogue is in this grade of catalogue counted on Occurrence maximum;
When the occurrence in the catalogue at the same level counted on is enumeration type, the possibility value that determines this grade of catalogue be count on should Each occurrence in level catalogue;
When the occurrence in the catalogue at the same level counted on is character string type, the possibility value for determining this grade of catalogue is any character String.
12. device according to claim 8, it is characterised in that the comparing unit includes:
Scope comparing unit, for the chained address scope and the history value structure of catalogues at different levels for forming the currency of catalogues at different levels Into chained address scope be compared;
Determining unit is linked, for being not belonging to current when the chained address scope for belonging to history value composition in comparison result being present When being worth the url history address in the range of the chained address formed, access request is sent to the url history address, and When the access request can not return to accessible page, using the url history address as failure chained address;
The link determining unit, which is additionally operable to work as, has the chained address scope for belonging to currency composition without belonging in comparison result In history value form chained address in the range of current link address when, be grounded the current chained address as newly-increased chain Location.
13. device according to claim 8, it is characterised in that described device further comprises the second statistic unit, is used for Each parameter occurrence in the parameters combination transmitted identical chained address and each combination is counted in the chained address of acquisition, Parameter is received using parameters combination as corresponding with the chained address of acquisition file in the website, and according to each The possibility value of parameter can be received in combination described in each parameter occurrence determination;Also,
The comparing unit is further used for the parameter that receives being compared with history parameters, and parameter can be received by described in Possibility value be compared with historical parameter value to judge that the file in the website whether there is parameter modification.
14. device according to claim 8, it is characterised in that described device further comprises:
Detection unit, for the chained address of acquisition and default abnormal Keyword List to be carried out into matching checking, and will matching Chained address corresponding to request data as abnormal access data to provide early warning.
CN201210232731.XA 2012-07-05 2012-07-05 A kind of automatic method and device for carrying out web analytics Active CN103530297B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210232731.XA CN103530297B (en) 2012-07-05 2012-07-05 A kind of automatic method and device for carrying out web analytics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210232731.XA CN103530297B (en) 2012-07-05 2012-07-05 A kind of automatic method and device for carrying out web analytics

Publications (2)

Publication Number Publication Date
CN103530297A CN103530297A (en) 2014-01-22
CN103530297B true CN103530297B (en) 2018-02-02

Family

ID=49932319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210232731.XA Active CN103530297B (en) 2012-07-05 2012-07-05 A kind of automatic method and device for carrying out web analytics

Country Status (1)

Country Link
CN (1) CN103530297B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103823883B (en) * 2014-03-06 2015-06-10 焦点科技股份有限公司 Analysis method and system for website user access path
CN106844389B (en) * 2015-12-07 2021-05-04 阿里巴巴集团控股有限公司 Method and device for processing URL (Uniform resource locator)
CN106992981B (en) * 2017-03-31 2020-04-07 北京知道创宇信息技术股份有限公司 Website backdoor detection method and device and computing equipment
CN110347955B (en) * 2019-05-30 2023-03-03 华为云计算技术有限公司 Resource detection method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467633A (en) * 2010-11-19 2012-05-23 奇智软件(北京)有限公司 Method and system for safely browsing webpage
CN102521295A (en) * 2011-11-30 2012-06-27 深圳市五巨科技有限公司 Method and device for automatically acquiring content updating on designated page

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510217B (en) * 2009-03-09 2013-06-05 阿里巴巴集团控股有限公司 Image updating method in image database, server and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467633A (en) * 2010-11-19 2012-05-23 奇智软件(北京)有限公司 Method and system for safely browsing webpage
CN102521295A (en) * 2011-11-30 2012-06-27 深圳市五巨科技有限公司 Method and device for automatically acquiring content updating on designated page

Also Published As

Publication number Publication date
CN103530297A (en) 2014-01-22

Similar Documents

Publication Publication Date Title
CN109145934B (en) User behavior data processing method, medium, equipment and device based on log
CN105812177B (en) A kind of network failure processing method and processing equipment
CN103299304B (en) Classifying rules generating means and classifying rules generate method
CN107819783A (en) A kind of network security detection method and system based on threat information
CN106844132A (en) The fault repairing method and device of cluster server
CN103530297B (en) A kind of automatic method and device for carrying out web analytics
CN107094208A (en) Worksheet method and device
EP2988230A1 (en) Data processing method and computer system
CN105227405B (en) monitoring method and system
CN106202569A (en) A kind of cleaning method based on big data quantity
CN106815254A (en) A kind of data processing method and device
CN105701097A (en) Social-network-platform-based public opinion analysis method and system
CN103631967B (en) A kind of processing method and processing device of the tables of data with independent increment identification field
CN109669795A (en) Crash info processing method and processing device
CN111131304A (en) Cloud platform-oriented large-scale virtual machine fine-grained abnormal behavior detection method and system
CN106856439A (en) The method and server of a kind of scheme test
CN105872127B (en) A kind of IP address management system
CN106209920A (en) The safety protecting method of a kind of dns server and device
WO2022142013A1 (en) Artificial intelligence-based ab testing method and apparatus, computer device and medium
Pan et al. Resilience of and recovery strategies for weighted networks
CN107222511A (en) Detection method and device, computer installation and the readable storage medium storing program for executing of Malware
CN103795592B (en) Online water navy detection method and device
CN107070645A (en) Compare the method and system of the data of tables of data
Tang et al. Community structure detection based on the neighbor node degree information
CN105468699B (en) Duplicate removal data statistical approach and equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant