CN108270637A - A kind of Website quality multilayer drills through system and method - Google Patents

A kind of Website quality multilayer drills through system and method Download PDF

Info

Publication number
CN108270637A
CN108270637A CN201611269768.4A CN201611269768A CN108270637A CN 108270637 A CN108270637 A CN 108270637A CN 201611269768 A CN201611269768 A CN 201611269768A CN 108270637 A CN108270637 A CN 108270637A
Authority
CN
China
Prior art keywords
website
quality
referer
url
template library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611269768.4A
Other languages
Chinese (zh)
Other versions
CN108270637B (en
Inventor
郭天晨
程路
陈建平
王易风
范东东
潘梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Zhejiang Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Zhejiang Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Zhejiang Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201611269768.4A priority Critical patent/CN108270637B/en
Publication of CN108270637A publication Critical patent/CN108270637A/en
Application granted granted Critical
Publication of CN108270637B publication Critical patent/CN108270637B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • H04L43/0829Packet loss
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Abstract

The present invention provides a kind of Website quality multilayer trephination and system, the described method comprises the following steps:The intelligent template library of website is established and updated to S1, all elements for periodically obtaining targeted website;Full dose user data in S2, acquisition network link to be analyzed, generates the record of each element in website, and according to intelligent template library by elements correlation to corresponding website;S3, Key Quality Indicator and KPI Key Performance Indicator are formulated, set up a web site Evaluation Model on Quality end to end, and according to collected magnanimity element, objective evaluation is carried out to Website quality from SP, Host, Url and server ip multi-layer.Pass through Referer and intelligent template storehouse matching technology correlation website element, solve the associated integrality of website element and accuracy problem, network message in-depth analysis is carried out by DPI equipment, by KQI indexs and KPI indexs, establishes a set of vertical multi-level website/element quality evaluation system.

Description

A kind of Website quality multilayer drills through system and method
Technical field
The present invention relates to technical field of network security, and system and side are drilled through more particularly, to a kind of Website quality multilayer Method.
Background technology
The Website quality analytical plan of mainstream has active testing technology and passive monitoring to analyze two ways at present.
Actively monitoring analysis records relevant data by way of pseudo-terminal testing, is organized into according to corresponding algorithm Corresponding index, can conveniently obtain the index situation of a website, and index is close to the subjective feeling of user.Actively monitoring System, the auto-dial testing system largely built such as various regions.
Mainly by technological means such as message analysis, the data message for acquiring network egress is analyzed for passive monitoring, because Message data is scattered, does not correspond to a website directly, and most systems can not associating websites element, can only analyze and adopt The index of each URL collected, peer machine remove associating websites element, but because Referer technologies lack in itself by Referer means It falls into, can not realize the accurate correlation of website element.Therefore, existing passive monitoring system mainly still analyzes each element or each The index of correlation of server ip, such as time delay, success rate.For example general internet behavior analysis system of passive monitoring mode.
Although active testing technology can provide part and perceive index, do not have generality, by testing content and testing Environment is affected, and can not accurately reflect network quality, can not further drilling analysis, can only be from macroscopically obtaining a website It is good and bad, reference data can not be provided for fault location.Passive monitoring technology can provide relevant index situation, but can only The index of each element in website is provided, the overall recognition index of entire website can not be provided.
Invention content
The present invention provides a kind of Website quality multilayer for overcoming the above problem or solving the above problems at least partly and bores Method and system is taken, by Referer and intelligent template storehouse matching technology correlation website element, it is associated complete to solve website element It is deep to carry out network message by DPI (Deep Packet Inspection deep-packet detections) equipment for whole property and accuracy problem Enter analysis, by KQI indexs and KPI indexs, establish a set of vertical multi-level website/element quality evaluation system, be based on Complete website element incidence relation drills through technology by multilayer and realizes failure precise positioning.
According to an aspect of the present invention, a kind of Website quality multilayer trephination is provided, is included the following steps:
The intelligent template library of website is established and updated to S1, all elements for periodically obtaining targeted website;
Full dose user data in S2, acquisition network link to be analyzed generates the record of each element in website, and according to intelligence Template library is by elements correlation to corresponding website;
S3, Key Quality Indicator and KPI Key Performance Indicator are formulated, set up a web site Evaluation Model on Quality end to end, and according to Collected magnanimity element carries out objective evaluation from SP, Host, Url and server ip multi-layer to Website quality.
As preference, the step S1 is specifically included:
S11, scans web sites and its homepage HOST records, sign-on access website homepage and load all website homepages one by one Page elements;
S12, using multithreading, all page elements that host process loads one by one are obtained by monitoring thread;
Incidence relation between S13, the SP that sets up a web site, homepage HOST values and these page elements URL;
S14, fresh web resource URL information is periodically obtained, be updated in intelligent template library.
As preference, the step S13 is specifically included:What incidence relation first preserved is used as intelligent template library, passes through conjunction And update mode, be inserted into without incidence relation, update existing incidence relation, delete expired incidence relation record, merge Update records final updating temporal information simultaneously.
As preference, the step S14 is specifically included:When analyzing flow control website visiting ticket, if finding ticket writing Referer is sky, then compares ticket writing Referer and intelligent template library, completion ticket writing during successful match Referer be website homepage HOST values, corrigendum part resource element statistics ownership.
As preference, the step S2 is specifically included:
S21, using DPI equipment, by concatenation or mirror-image fashion access link, acquiring full dose user data in link, The record of each element in the business record of generation standard, as website;
Above-mentioned element is carried out website and member by S22, the Referer recorded with reference to intelligent template library by Referer technologies Element association.
As preference, the step S22 is specifically included:
S221, using five-tuple and user's access time, each conversation recording for accessing website of each user is classified It extracts, and passes through Referer technologies matching website, whole conversation recordings of some user's local IP access are associated, Obtain the specific website of this access of user, and the element included in the preliminary analysis website;
S222, the conversation recording for no Referer fields search intelligent template by the url in the conversation recording The record preserved in library, and the Referer by being recorded in intelligent template library can find the url higher level url it is final Determine its affiliated web site;
S223, when a url corresponds to multiple Referer, analyzes flow control ticket, confined based on user identifier+period A plurality of ticket writing, a plurality of ticket writing are considered as same session Session, initiate to access in this group of ticket writing it is earliest i.e. Headed by GET, the website SP homepage HOST values in more first GET and intelligent template library simultaneously when matching intelligent template library, if one Successful match is then thought in cause.
As preference, in the step S3, in the step S3, the quality model is used for each Key Quality Indicator Index is weighted according to certain weight, obtains the TOP SCORES of a website, is additionally operable to drill through technology by multilayer, is realized From the quality analysis of the multi-layers such as SP, Host, Url and server ip, failure is provided for final fault location and increased quality It excludes and resource introduces and provides instruction.
A kind of Website quality multilayer drills through system, including data collection layer, data mining layer and operation layer;
The data collection layer accesses website for simulation browser, and all Response are obtained by network packet capturing mode All conversation recordings in get website are put in storage, form the intelligent template library of the website by response;
The data mining layer is used for using DPI equipment, is entered web in link by concatenation or mirror-image fashion, acquisition link Full dose user data in road generates the record of each element in website, and combines the intelligent template library of website and Referer technologies By obtained elements correlation to corresponding website;And pass through formulation Key Quality Indicator, set up a web site quality evaluation end to end Model carries out multi-level simulation tool to Website quality;
The operation layer carries out task definition, model definition, user's identification for carrying out system administration, and assessment is tied Fruit is showed with report form.
As preference, the data collection layer is adopted including bill record collection module, Nat log acquisition modules and site resource Collect module;
The site resource acquisition module goes to access the website homepage of website for simulation browser, one by one sign-on access net Homepage of standing and the page elements for loading all website homepages, and the intelligent template library to set up a web site;
The bill record collection module is used to, using multithreading, start the page member of other thread real time monitoring host processes Plain loading action obtains the URL link address of the HTTP Request requests of all page elements of host process loading;
The Nat log acquisition modules are updated to for periodically obtaining fresh web resource URL information in being locally configured.
As preference, the data mining layer include data extraction module, data statistics module, data analysis module and Quality assessment modules;
The data extraction module will collect use for acquiring customer flow by DPI the and DFI technologies of DPI equipment Family flow is parsed, and generates conversation recording one by one;
The data statistics module was used for using five-tuple and user's access time, and each by each user accesses website Conversation recording classification extract;
The data analysis module is used for the element being associated with by Referer and the member by intelligent template storehouse matching Element combines, and realizes the complete and accurate correlation of website element;
The quality assessment modules are used for the Evaluation Model on Quality end to end that sets up a web site, and according to collected magnanimity member Element carries out objective evaluation from SP, Host, Url and server ip multi-layer to Website quality.
The application proposes that a kind of Website quality multilayer trephination and system, taken at regular intervals related web site form website Static resource library later during actual website quality drilling analysis, using carrier class DPI equipment, passes through concatenation or mirror image In mode access link, full dose user data in link is acquired, generates the business record of standard, i.e., the note of each element in website Record.Again by these elements by Referer technology correlations to corresponding website during, can combine website static resource library The Referer of record, to realize the associated accuracy of website associated element and integrality.And vertical multilayer is provided and drills through technology It realizes the depth analysis of Website quality, not only the quality of a website can be evaluated from macroscopic view and subjective perception, but also basis can be passed through TCP layer index, be accurately positioned that website is good and bad basic reason;Scientific and reasonable Website quality assessment models can be established, are carried The multilayers score-systems such as overall quality scoring, KQI indexs and KPI indexs for reflecting entire website.Overall quality scoring is main It is to be weighted various KQI indexs according to certain weight, obtains the TOP SCORES of a website.Skill is drilled through by multilayer Art is final fault location and increased quality, it can be achieved that from the quality analyses of the multi-layers such as SP, Host, Url and server ip Troubleshooting is provided and resource introduces and provides instruction.
Description of the drawings
Fig. 1 is the Website quality multilayer trephination flow chart according to the embodiment of the present invention 1;;
Fig. 2 is the website comprehensive score comparison schematic diagram according to the embodiment of the present invention 1;
Fig. 3 is the website end-to-end time delay index schematic diagram comparison schematic diagram according to the embodiment of the present invention 1;
Fig. 4 is to press end-to-end time delay Bit-reversed table schematic diagram according to the web page browsing of the embodiment of the present invention 1;
Fig. 5 is according to the host end-to-end time delay of the embodiment of the present invention 1 sequence table schematic diagram;
The system structure diagram of Fig. 6 positions according to embodiments of the present invention 2.
Specific embodiment
With reference to the accompanying drawings and examples, the specific embodiment of the present invention is described in further detail.Implement below Example is used to illustrate the present invention, but be not limited to the scope of the present invention.
Embodiment 1
Fig. 1 shows a kind of Website quality multilayer trephination, includes the following steps:
The intelligent template library of website is established and updated to S1, all elements for periodically obtaining targeted website;
Full dose user data in S2, acquisition network link to be analyzed generates the record of each element in website, and according to intelligence Template library is by elements correlation to corresponding website;
S3, Key Quality Indicator and KPI Key Performance Indicator are formulated, set up a web site Evaluation Model on Quality end to end, and according to Collected magnanimity element carries out objective evaluation from SP, Host, Url and server ip multi-layer to Website quality.
HTTP Referer are a parts of header, when browser is sent to web server asks, general meeting Take Referer, Tell server I come from which page link.
But based on Referer technologies can not 100% associating websites element, there are a small number of conversation recordings not carry Referer fields, we can not just judge this part record which website belonged to, that is to say, that only by Referer technologies without The complete association user of method once accesses whole elements of website, also just accurately can not evaluate website matter by these information Amount.
In view of the defects of Referer technologies, the present invention carries out website element by intelligent template storehouse matching technology and is associated with skill Art.Website is accessed by simulation browser, obtaining all Response by network packet capturing mode responds, and can get a net All conversation recordings stood, such as picture, advertisement, content.By all conversation recording storages in get website, it is possible to Form the intelligent template library of the website.For example, to obtain the intelligent template library of Sina, it can timing (five minutes are primary) access Sina gets all conversation recordings of Sina, forms Sina's intelligent template library, and the time that the intelligent template library is marked to be applicable in Section.
As preference, the step S1 is specifically included:
S11, scans web sites and its homepage HOST records, sign-on access website homepage and load all website homepages one by one Page elements;By program software, emphasis website and its homepage HOST records in scan database, one by one sign-on access these The homepage of emphasis website, program then can load all page elements such as the identical behavior of IE browser;
S12, using multithreading, all page elements that host process loads one by one are obtained by monitoring thread;Using Multithreading starts the page elements loading action of other thread real time monitoring host processes, obtains all of host process loading The URL link address (all Get requests in addition to main Get) of the HTTP Request requests of page elements.These element packets Resource of advertisement, picture and the other websites of reference etc. is included, is mainly analyzed in these HTTP Request request messages The Request URL values of General.The complete URL address informations of page elements are can extract from these information, to HTTP The Referer information of Request Heads is not investigated in Request request messages because these Referer some can be directed toward Emphasis website homepage, some are then directed toward other websites such as advertisement link, cross-domain resource etc., and a small number of Referer information are sky.
Incidence relation between S13, the SP that sets up a web site, homepage HOST values and these page elements URL;
S14, fresh web resource URL information is periodically obtained, be updated in intelligent template library.
General such as Sina's homepage portal website may correspond to hundreds of page elements URL (many-one relationship), as excellent Choosing, the step S13 is specifically included:What incidence relation first preserved is used as intelligent template library, by merging update mode, is inserted into Without incidence relation, update existing incidence relation, delete expired incidence relation record, merge update record simultaneously it is last Renewal time information.
As preference, the step S14 is specifically included:When analyzing flow control website visiting ticket, if finding ticket writing Referer is sky, then compares ticket writing Referer and intelligent template library, completion ticket writing during successful match Referer be website homepage HOST values, corrigendum part resource element statistics ownership.It periodically obtains newest in database Site resource URL information, updates in being locally configured, and renewal frequency is primary for update in one hour.In analysis flow control website visiting It during ticket, if find ticket one to record Referer be empty, is compared with intelligent template library, completion should during successful match The Referer of ticket writing is the HOST values of website homepage, thus corrects the statistics ownership of part resource element.Such as Sina website His net advertisement of homepage of standing loading can be also counted under Sina SP and under the page quality of Sina's homepage, otherwise these his net advertisement Other websites can only be charged under one's name, so can more accurately investigate the quality such as the time delay end to end of website and the specific page Index.
As preference, the step S2 is specifically included:
S21, using DPI equipment, by concatenation or mirror-image fashion access link, acquiring full dose user data in link, The record of each element in the business record of generation standard, as website;We acquire customer flow by DPI equipment, pass through DPI and DFI (Deep Flow Inspection, deep stream detection) technology of DPI equipment will collect customer flow and solve Analysis, generates conversation recording one by one.Because DPI equipment generally use concatenation or mirror-image fashion are connected in link, collected flow For the flow of full dose in the link, the conversation recording of all user's wholes in the conversation recording generated namely link.This It is the data of a magnanimity, we neither know which user every record is, also do not know which every record belongs to These records are only taken in website, and practical significance is simultaneously little, and therefore, it is necessary to be associated these records;
Above-mentioned element is carried out website and member by S22, the Referer recorded with reference to intelligent template library by Referer technologies Element association.
As preference, the step S22 is specifically included:
S221, using five-tuple and user's access time, each conversation recording for accessing website of each user is classified It extracts, for the situation of NAT conversions, NAT daily records can be combined and Radius daily records carry out IP address and account reduction;And lead to Referer technologies matching website is crossed, whole conversation recordings of some user's local IP access are associated, obtains this visit of user The specific website asked, and the element included in the preliminary analysis website;Match website then can first by Referer technologies, Because there are Referer fields in most of conversation recording, teach which website this record belongs to, by some user This whole conversation recording accessed is associated, it is possible to which obtaining the user, which website this has accessed, and preliminary analysis is arrived Which element is included in this website;
S222, the conversation recording for no Referer fields search intelligent template by the url in the conversation recording The record preserved in library, and the Referer by being recorded in intelligent template library can find the url higher level url it is final Determine its affiliated web site;Such as when DPI equipment collects the record that user accesses Sina website, and after being associated with by Referer, together Sample can generate a dynamic Sina resources bank, when some conversation recording of the collected Sina websites of DPI does not have Referer words Section, and being searched in static library based on the url in the conversation recording has a record, and by being recorded in intelligent template library The higher level url that Referer can find the url finally determines that it belongs to Sina website, we can be using the record as Sina One element of website forms the element of complete Sina website.
S223, multiple Referer are corresponded to as a url, it is possible to when belonging to multiple websites, need other assistant analysis hands Section carries out accurate URL ownership positioning;When analyzing flow control ticket, based on user identifier (Radius certifications account or NAT conversions Daily record)+the period confines a plurality of ticket writing, and which is considered as same session Session, in this group of ticket writing Initiate to access it is earliest i.e. headed by GET, the website SP in more first GET and intelligent template library simultaneously when match intelligent template library Homepage HOST values think successful match if consistent.
When carrying out quality analysis, the element being associated with by Referer and the element knot by intelligent template storehouse matching It closes, so that it may realize the complete and accurate correlation of website element, supplement Referer is associated with incomplete problem.One website of cause Element is continually changing, and template library is also required to constantly be updated, and therefore, site resource acquisition module every five minutes can Website library is acquired, template library is automatically updated after acquisition and records the applicable time of the template library, and is closed in real time with Referer The website element of connection combines (or being combined afterwards with the template applicable time collected website conversation recording).
In the present embodiment, specifically, in the step S3, we can be several based on the objective of perception by formulating Index, that is, KQI (Key Quality Indicators Key Quality Indicator) index.Because the good and bad intuitive sense of website Whether can be complete by the speed and the content of display that are opening website.In this regard, we establish web page browsing end-to-end time delay, net The end-to-end rate of page browsing, the end-to-end success rate of web page browsing and web page browsing 4 KQI indexs of end-to-end percentage of head rice, wherein, net Page browsing end-to-end time delay and the reflection of web page browsing end-to-end rate are the speeds for opening website, the end-to-end success of web page browsing Rate and the whether successful and complete display of the end-to-end percentage of head rice reflection web site contents of web page browsing.These key indexes are carried out one Fixed weighting, it is possible to objective evaluation be carried out to the quality of a website based on KQI indexs and magnanimity conversation recording, obtain one Conclusion says that Sina is good, Sohu is not so good.
Because a website (Sp) is made of multiple Host, a host is made of again multiple url, and each Url corresponds to batch server IP again, formulates corresponding index to these dimensions respectively, carries out Quality Evaluation Analysis, it is possible to It realizes the vertical multi-level quality analysis to a website, to the last analyzes the quality of Website server IP, it is just true Just it can position and optimize for the Website quality of one " bad " and decision-making foundation is provided.
It is similar with the analysis based on SP based on the analysis of Host, it can equally be assessed by KQI indexs end to end.But Analysis based on Url and server ip, just only KPI (Key Performance Indicator, KPI Key Performance Indicator) index , such as answer delay, response success rate, re-transmission packet loss.Because these dimensions have hierarchical relationship, and by above-mentioned Step has been obtained for the level incidence relation of Sp, Host, Url of most url, therefore when we have found that a website When KQI scores are relatively low, it is possible to by drilling through layer by layer, from Sp, Host, Url, get Website server IP by drilling always (downward Using KPI indexs, such as answer delay, response success rate, re-transmission packet loss in drilling operation), eventually finding influences website matter The basic reason of amount is positioned to some or multiple server ips.It, can be effective by the drilling analysis of this vertical multi-layer time It realizes website fault location, provide decision-making foundation for Website quality.
The whole process of the analysis of vertical multi-layer drilling analysis is introduced with reference to this programme function and example:
1st, Website quality totality marking/website key index quality analysis
When giving a mark to Website quality, complete set Rating Model system can be established, each website is beaten Point, code of points is that 4 KQI indexs of website are carried out Comprehensive Assessment according to certain weight, provides 0-100 points of score. Marking helps to carry out visual evaluation to quality, can be based on some websites, whole websites or be carried out respectively according to regional relation Scoring.Below figure is exactly the score of several common websites.As can be seen from Figure 2, respectively to 163, sina, Baidu, Tencent, Sohu and wash in a pan Precious website carries out comprehensive score, obtains the comprehensive score highest of Taobao in these websites, overall quality is best.
When giving a mark to Website quality, in addition to can be with comprehensive score, we can also refer to some key of website Mark is analyzed, and multiple websites are compared and analyzed, find out second-rate website, then further divide by drilling through technology Analysis.Figure below is exactly the end-to-end time delay index for comparing these websites, from figure 3, it can be seen that the end-to-end time delay of Taobao is most Small, the index is best.
2nd, vertical multi-layer drilling analysis process
After quality evaluation is carried out to certain website based on KQI Rating Models or key index, second-rate net is selected It stands and carries out drilling analysis.Drilling analysis is carried out by taking the worst website of web page browsing end-to-end time delay as an example, we can from report To press end-to-end time delay Bit-reversed, the eastmoney (east wealth) for selecting end-to-end time delay larger is as shown in Figure 4 Form Image.
Host is further got by drilling, as can be seen that host is in the form Image described in Fig. 5 The end-to-end time delay of " hqguba1.eastmoney.com " is maximum:
It further gets server ip by drilling, checks each index of server ip, it is as shown in the table:
From the graph as can be seen that IP is the server of 140.207.213.99, answer delay has 302.72 seconds, and normal model It encloses generally at 0.3-1 seconds or so, it is possible thereby to which the reason of judging, causing the website time delay larger is the server in the period Answer delay is larger.
It can be seen that after being evaluated based on KQI website, when drilling through the server ip of the specific influence network of positioning, need Standby Sp, Host, Url level incidence relation, otherwise can not be accurately positioned the concrete reason for leading to website matter difference.
Embodiment 2
As shown in fig. 6, the present invention, which also shows a kind of Website quality multilayer, drills through system, including data collection layer, data Tap layer and operation layer;
The data collection layer accesses website for simulation browser, and all Response are obtained by network packet capturing mode All conversation recordings in get website are put in storage, form the intelligent template library of the website by response;
The data mining layer is used for using DPI equipment, is entered web in link by concatenation or mirror-image fashion, acquisition link Full dose user data in road generates the record of each element in website, and combines the intelligent template library of website and Referer technologies By obtained elements correlation to corresponding website;And pass through formulation Key Quality Indicator, set up a web site quality evaluation end to end Model carries out multi-level simulation tool to Website quality;
The operation layer carries out task definition, model definition, user's identification for carrying out system administration, and assessment is tied Fruit is showed with report form.It specifically includes system management module, data definition module and report and module is presented;The system administration Module carries out system operational administrative for user, and the data definition module is for defining task, strategy, model, districts and cities match It puts and is identified with user;The Reports module is used to show analysis result with report form.
As preference, the data collection layer is adopted including bill record collection module, Nat log acquisition modules and site resource Collect module;
The site resource acquisition module goes to access the website homepage of website for simulation browser, one by one sign-on access net Homepage of standing and the page elements for loading all website homepages, and the intelligent template library to set up a web site;
The bill record collection module is used to, using multithreading, start the page member of other thread real time monitoring host processes Plain loading action obtains the URL link address of the HTTP Request requests of all page elements of host process loading;
The Nat log acquisition modules are updated to for periodically obtaining fresh web resource URL information in being locally configured.
As preference, the data mining layer include data extraction module, data statistics module, data analysis module and Quality assessment modules;
The data extraction module will collect use for acquiring customer flow by DPI the and DFI technologies of DPI equipment Family flow is parsed, and generates conversation recording one by one;
The data statistics module was used for using five-tuple and user's access time, and each by each user accesses website Conversation recording classification extract;
The data analysis module is used for the element being associated with by Referer and the member by intelligent template storehouse matching Element combines, and realizes the complete and accurate correlation of website element;
The quality assessment modules are used for the Evaluation Model on Quality end to end that sets up a web site, and according to collected magnanimity member Element carries out objective evaluation from SP, Host, Url and server ip multi-layer to Website quality.
The application proposes that a kind of Website quality multilayer trephination and system, taken at regular intervals related web site form website Static resource library later during actual website quality drilling analysis, using carrier class DPI equipment, passes through concatenation or mirror image In mode access link, full dose user data in link is acquired, generates the business record of standard, i.e., the note of each element in website Record.Again by these elements by Referer technology correlations to corresponding website during, can combine website static resource library The Referer of record, to realize the associated accuracy of website associated element and integrality.And vertical multilayer is provided and drills through technology It realizes the depth analysis of Website quality, not only the quality of a website can be evaluated from macroscopic view and subjective perception, but also basis can be passed through TCP layer index, be accurately positioned that website is good and bad basic reason;Scientific and reasonable Website quality assessment models can be established, are carried The multilayers score-systems such as overall quality scoring, KQI indexs and KPI indexs for reflecting entire website.Overall quality scoring is main It is to be weighted various KQI indexs according to certain weight, obtains the TOP SCORES of a website.Skill is drilled through by multilayer Art is final fault location and increased quality, it can be achieved that from the quality analyses of the multi-layers such as SP, Host, Url and server ip Troubleshooting is provided and resource introduces and provides instruction.
Finally, the present processes are only preferable embodiment, are not intended to limit the scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent replacement, improvement and so on should be included in the protection of the present invention Within the scope of.

Claims (10)

1. a kind of Website quality multilayer trephination, which is characterized in that include the following steps:
The intelligent template library of website is established and updated to S1, all elements for periodically obtaining targeted website;
Full dose user data in S2, acquisition network link to be analyzed generates the record of each element in website, and according to intelligent template Library is by elements correlation to corresponding website;
S3, Key Quality Indicator and KPI Key Performance Indicator are formulated, set up a web site Evaluation Model on Quality end to end, and according to acquisition The magnanimity element arrived carries out objective evaluation from SP, Host, Url and server ip multi-layer to Website quality.
2. Website quality multilayer trephination according to claim 1, which is characterized in that the step S1 is specifically included:
S11, scans web sites and its homepage HOST records sign-on access website homepage and load the pages of all website homepages one by one Element;
S12, using multithreading, all page elements that host process loads one by one are obtained by monitoring thread;
Incidence relation between S13, the SP that sets up a web site, homepage HOST values and these page elements URL;
S14, fresh web resource URL information is periodically obtained, be updated in intelligent template library.
3. Website quality multilayer trephination according to claim 2, which is characterized in that the step S13 is specifically included: Incidence relation first preserve as intelligent template library, by merging update mode, be inserted into without incidence relation, update existing Incidence relation deletes expired incidence relation record, merges update and record final updating temporal information simultaneously.
4. Website quality multilayer trephination according to claim 2, which is characterized in that the step S14 is specifically included: When analyzing flow control website visiting ticket, if finding ticket writing Referer as sky, by ticket writing Referer and intelligent mould Plate library is compared, and the Referer of the completion ticket writing is the HOST values of website homepage during successful match, corrects part resource The statistics ownership of element.
5. Website quality multilayer trephination according to claim 2, which is characterized in that the step S2 is specifically included:
S21, using DPI equipment, by concatenation or mirror-image fashion access link, acquiring full dose user data in link, generation The record of each element in the business record of standard, as website;
Above-mentioned element is carried out website by Referer technologies and element closes by S22, the Referer recorded with reference to intelligent template library Connection.
6. Website quality multilayer trephination according to claim 5, which is characterized in that the step S22 is specifically included:
S221, using five-tuple and user's access time, each conversation recording for accessing website of each user is classified extraction Out, and pass through Referer technologies matching website, whole conversation recordings of some user's local IP access are associated, are obtained The specific website of this access of user, and the element included in the preliminary analysis website;
S222, the conversation recording for no Referer fields are searched by the url in the conversation recording in intelligent template library The record preserved, and the Referer by being recorded in intelligent template library can find the higher level url of the url and finally determine Its affiliated web site;
S223, when a url corresponds to multiple Referer, analyzes flow control ticket, confined based on user identifier+period a plurality of Ticket writing, a plurality of ticket writing are considered as same session Session, initiate to access in this group of ticket writing it is earliest i.e. headed by GET, the website SP homepage HOST values when matching intelligent template library while in more first GET and intelligent template library, if consistent Think successful match.
7. Website quality multilayer trephination according to claim 1, which is characterized in that in the step S3, the matter Amount model is used to each Key Quality Indicator index being weighted according to certain weight, obtains the TOP SCORES of a website, It is additionally operable to drill through technology by multilayer, realizes the quality analysis from the multi-layers such as SP, Host, Url and server ip, be final Fault location and increased quality provide troubleshooting and resource introduces and provides instruction.
8. a kind of Website quality multilayer drills through system, which is characterized in that including data collection layer, data mining layer and operation layer;
The data collection layer accesses website for simulation browser, and obtaining all Response by network packet capturing mode rings Should, all conversation recordings in get website are put in storage, form the intelligent template library of the website;
The data mining layer is used for using DPI equipment, is entered web in link, is acquired in link by concatenation or mirror-image fashion Full dose user data generates the record of each element in website, and the intelligent template library of combination website and Referer technologies are incited somebody to action The elements correlation arrived is to corresponding website;And pass through formulation Key Quality Indicator, set up a web site Evaluation Model on Quality end to end, Multi-level simulation tool is carried out to Website quality;
The operation layer carries out task definition, model definition, user's identification for carrying out system administration, by assessment result with Report form shows.
9. Website quality multilayer according to claim 8 drills through system, which is characterized in that the data collection layer includes words Single acquisition module, Nat log acquisition modules and site resource acquisition module;
The site resource acquisition module goes to access the website homepage of website for simulation browser, and sign-on access website is first one by one Page and the page elements for loading all website homepages, and the intelligent template library to set up a web site;
The page elements that the bill record collection module is used to, using multithreading, start other thread real time monitoring host processes add Load acts, and obtains the URL link address of the HTTP Request requests of all page elements of host process loading;
The Nat log acquisition modules are updated to for periodically obtaining fresh web resource URL information in being locally configured.
10. Website quality multilayer trephination according to claim 9, which is characterized in that the data mining layer includes Data extraction module, data statistics module, data analysis module and quality assessment modules;
The data extraction module will collect user for acquiring customer flow, by DPI the and DFI technologies of DPI equipment and flow Amount is parsed, and generates conversation recording one by one;
The data statistics module was used for using five-tuple and user's access time, by each meeting for accessing website of each user Words record sort extracts;
The data analysis module is used for the element being associated with by Referer and the element knot by intelligent template storehouse matching It closes, realizes the complete and accurate correlation of website element;
The quality assessment modules are for setting up a web site Evaluation Model on Quality end to end, and according to collected magnanimity element, Objective evaluation is carried out to Website quality from SP, Host, Url and server ip multi-layer.
CN201611269768.4A 2016-12-30 2016-12-30 Website quality multi-layer drilling system and method Active CN108270637B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611269768.4A CN108270637B (en) 2016-12-30 2016-12-30 Website quality multi-layer drilling system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611269768.4A CN108270637B (en) 2016-12-30 2016-12-30 Website quality multi-layer drilling system and method

Publications (2)

Publication Number Publication Date
CN108270637A true CN108270637A (en) 2018-07-10
CN108270637B CN108270637B (en) 2020-12-22

Family

ID=62770449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611269768.4A Active CN108270637B (en) 2016-12-30 2016-12-30 Website quality multi-layer drilling system and method

Country Status (1)

Country Link
CN (1) CN108270637B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110535684A (en) * 2019-07-24 2019-12-03 武汉绿色网络信息服务有限责任公司 A kind of method and apparatus that web-browsing service perception assessment is realized based on DPI
CN113065078A (en) * 2021-03-16 2021-07-02 赛尔新技术(北京)有限公司 Statistical analysis method for simulating user behavior to dial and test multistage domain names of WEB sites
CN113722631A (en) * 2020-05-20 2021-11-30 中国移动通信集团河北有限公司 Page synthesis method and device
CN113723720B (en) * 2020-05-20 2023-08-18 中国移动通信集团河北有限公司 Page browsing quality evaluation method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073960A (en) * 2010-09-15 2011-05-25 江苏仕德伟网络科技股份有限公司 Method for assessing operation effect in website marketing process
CN102243661A (en) * 2011-07-21 2011-11-16 中国科学院计算机网络信息中心 Website content quality assessment method and device
CN102752792A (en) * 2011-12-26 2012-10-24 华为技术有限公司 Method, device and system for monitoring internet service quality of mobile terminal
CN103218431A (en) * 2013-04-10 2013-07-24 金军 System and method for identifying and automatically acquiring webpage information
CN103229479A (en) * 2012-12-28 2013-07-31 华为技术有限公司 Website identification method and device and network system
US20160277259A1 (en) * 2013-11-18 2016-09-22 Beijing Gridsum Technology Co., Ltd. Traffic quality analysis method and apparatus
CN106055716A (en) * 2016-07-13 2016-10-26 北京智网易联科技有限公司 Method and equipment for automatically generating website

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073960A (en) * 2010-09-15 2011-05-25 江苏仕德伟网络科技股份有限公司 Method for assessing operation effect in website marketing process
CN102243661A (en) * 2011-07-21 2011-11-16 中国科学院计算机网络信息中心 Website content quality assessment method and device
CN102752792A (en) * 2011-12-26 2012-10-24 华为技术有限公司 Method, device and system for monitoring internet service quality of mobile terminal
CN103229479A (en) * 2012-12-28 2013-07-31 华为技术有限公司 Website identification method and device and network system
CN103218431A (en) * 2013-04-10 2013-07-24 金军 System and method for identifying and automatically acquiring webpage information
US20160277259A1 (en) * 2013-11-18 2016-09-22 Beijing Gridsum Technology Co., Ltd. Traffic quality analysis method and apparatus
CN106055716A (en) * 2016-07-13 2016-10-26 北京智网易联科技有限公司 Method and equipment for automatically generating website

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110535684A (en) * 2019-07-24 2019-12-03 武汉绿色网络信息服务有限责任公司 A kind of method and apparatus that web-browsing service perception assessment is realized based on DPI
CN113722631A (en) * 2020-05-20 2021-11-30 中国移动通信集团河北有限公司 Page synthesis method and device
CN113723720B (en) * 2020-05-20 2023-08-18 中国移动通信集团河北有限公司 Page browsing quality evaluation method and device
CN113722631B (en) * 2020-05-20 2023-11-21 中国移动通信集团河北有限公司 Page synthesis method and device
CN113065078A (en) * 2021-03-16 2021-07-02 赛尔新技术(北京)有限公司 Statistical analysis method for simulating user behavior to dial and test multistage domain names of WEB sites
CN113065078B (en) * 2021-03-16 2022-11-11 赛尔新技术(北京)有限公司 Statistical analysis method for simulating user behavior to dial and test multistage domain names of WEB sites

Also Published As

Publication number Publication date
CN108270637B (en) 2020-12-22

Similar Documents

Publication Publication Date Title
Kallepalli et al. Measuring and modeling usage and reliability for statistical web testing
CN107087001B (en) distributed internet important address space retrieval system
US7664732B2 (en) Method of managing websites registered in search engine and a system thereof
CN108270637A (en) A kind of Website quality multilayer drills through system and method
US20120191502A1 (en) System & Method For Analyzing & Predicting Behavior Of An Organization & Personnel
EP4319054A2 (en) Identifying legitimate websites to remove false positives from domain discovery analysis
US7051046B2 (en) System for managing environmental audit information
Kott et al. The promises and challenges of continuous monitoring and risk scoring
US20090198724A1 (en) System and method for conducting network analytics
CN107404495A (en) A kind of device based on IP address portrait
CN103279516B (en) Web spider identification method
CN108712426A (en) Reptile recognition methods and system a little are buried based on user behavior
CN106779278A (en) The evaluation system of assets information and its treating method and apparatus of information
CN103218431A (en) System and method for identifying and automatically acquiring webpage information
CN107592305A (en) A kind of anti-brush method and system based on elk and redis
CN107888602A (en) A kind of method and device for detecting abnormal user
Dominic et al. Performance evaluation on quality of Asian e-government websites–an AHP approach
CN106982251A (en) Project field work data reporting method and system are reconnoitred based on mobile device
Stermsek et al. A User Profile Derivation Approach based on Log-File Analysis.
Sujatha Improved user navigation pattern prediction technique from web log data
Thelwall Methods for reporting on the targets of links from national systems of university Web sites
CN113688905A (en) Harmful domain name verification method and device
US20040025055A1 (en) Online recognition of robots
CN103605735B (en) website data analysis method and device
CN108038490A (en) A kind of P2P enterprises automatic identifying method and system based on internet data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: No. 19, Jiefang East Road, Hangzhou, Zhejiang Province, 310016

Patentee after: CHINA MOBILE GROUP ZHEJIANG Co.,Ltd.

Patentee after: CHINA MOBILE COMMUNICATIONS GROUP Co.,Ltd.

Address before: No. 19, Jiefang East Road, Hangzhou, Zhejiang Province, 310016

Patentee before: CHINA MOBILE GROUP ZHEJIANG Co.,Ltd.

Patentee before: CHINA MOBILE COMMUNICATIONS Corp.

CP01 Change in the name or title of a patent holder